Posts Tagged ‘bioinformatics’

BioHDF

Monday, August 17th, 2009

Hat tip FinchTalk.


229-234=-5

Automating scientific grunt-work

Thursday, March 19th, 2009

Monday, John Hawks asked: Will Wolfram make bioinformatics obsolete?

I was talking with a scientist last week who is in charge of a massive dataset. He told me he had heard complaints from many of his biologist friends that today’s students are trained to be computer scientists, not biologists. Why, he asked, would we want to do that when the amount of data we handle is so trivial?

Now personally I wouldn’t call the amount of data trivial, exactly, but it does pale compared to some physics experiments.

Yesterday, Daniel MacAuthor (Genetic Future) responded with this:

I’d agree that biological data-sets can’t compete with particle physicists in terms of sheer scale, although the speed with which they are accumulating is alarming. Where biological data-sets really become intimidating is in their diversity, in the complexity of the underlying processes, and in the levels of noise and bias. I suspect a lot of people used to dealing with extremely large data-sets would still balk at the complexity of computational biology once they dug a little deeper, particularly in a few years’ time.

Which I fully agree with.

Anyway, back to John:

Now, you have to understand, to this person a dataset of 1000 whole genomes is trivial. He said, don’t these students understand that in a few years all the software they wrote to handle these data will be obsolete? They certainly aren’t solving interesting problems in computer science, and in a short time, they won’t be able to solve interesting problems in biology.

He then turns to Wolfram Alpha as an example of a computer system that could replace the need for programming skills with just plain English queries, thus alleviating the need for programming for biologists.

Now personally, I am very sceptical about this.  It sounds too much like a full AI to be true, but that is not the point I’m aiming at here.

Daniel brings up the points that an expert system like this will only help so far:

That said, such tools and databases, however powerful, will always lag substantially behind the science. For young biologists who want to work right at the cutting edge – which will require dealing directly with rapidly changing technologies, generating biological data at an increasingly dizzying pace and in constantly evolving formats - solid informatic skills, including at least basic programming and sound statistical knowledge, will make you a far more productive scientist.

Of course programming languages will change and the scripts you write as a grad student will be forgotten within a year or two – that’s the nature of science (how many molecular biologists still run Southern blots?). The important thing is learning how to think about large-scale biological data: how to access, filter and manipulate it. Having basic programming expertise will make you more effective as a scientist right now, and it will also prepare you for a career in an increasingly data-driven field.

Yes, the important thing is to learn how to think about large-scale biological data! More importantly, how to think about it in a structured way.

And “in a structured way” essentially means with a healthy mix of biological insight, mathematical modelling and statistical evaluation of the data.

With large-scale data, this cannot be done “by hand” but requires computer support.

Getting a computer to analyse your data really requires structured thinking. Nothing punishes fuzzy thinking quite like a computer.

Of course, our computer systems improves year by year, and the kind of basic programming skills you might have learned five or ten years ago are now obsolete.  If you attack basic statistical modelling with C or assembly programming, you are just doing it wrong.

This doesn’t mean that the basic skills you learn, when you learn how to program, are obsolete.  With improved computer systems and improved programming languages, you can work at a much higher level, but the essential structured thinking (plus basic testing and validation) is still just as important.

Just because we now have very powerful calculators doesn’t mean that it is a waste of time to study math.

I strongly feel that a little bit of computer science should be taught to all scientists, just as a bit of math and a bit of stats should be taught.  Not the low-level stuff.  Not C or Perl programming, but “essentials” of programming.  Just like you shouldn’t do hours and hours of sums to learn math.  That is just grunt work that should be left to our computer systems.

The basic computer science could be a bit on complexity (what can be done by a computer and what cannot; what can be efficiently done and what cannot); some basic programming (a single high-level language, just to get the feeling for programming; how to test programs); some numerical analysis (it doesn’t matter if your math is correct for real numbers if it is completely unstable when you work with floating point numbers); and some basic data structures and algorithms for every day work.

If you have a computer system already, that meets all your needs, you do not need this of course.  But what are the chances of having such a system available throughout your career?  What happens when you get new types of data or new kinds of experiments?

With just a bit of computer skills, you can update your system and get back to your science.  You can get the computer to do the grunt work again.

Without computer skills, it is all or nothing.  Either you get all the answers you want from the system, or you have to do it all manually if the system doesn’t quite meet your need.

78-96=-18

This week in the blogs

Sunday, February 22nd, 2009

Here’s my weekly list of interesting blog posts in the past week.  I don’t know, I found this week a bit of a slim picking, maybe because I didn’t pay good enough attention… anyway, at least there were a few posts I enjoyed.

Science

Programming

Math and the web

Neandertals

Next gen sequencing

53-74=-21

Bioinformatics and Computational Biology

Wednesday, February 18th, 2009

Russ Altman has an interesting post on his blog:Bioinformatics & Computational Biology = same? No.

I spent the first 15 years of my professional life unwilling to recognize a difference between bioinformatics and computational biology.  It was not because I didn’t think that there was or could be a difference, but because I thought the difference was not significant.  I have changed my position on this.  I now believe that they are quite different and worth distinguishing.  For me,

  • Computational biology = the study of biology using computational techniques.  The goal is to learn new biology, knowledge about living sytems.  It is about science.
  • Bioinformatics = the creation of tools (algorithms, databases) that solve problems.  The goal is to build useful tools that work on biological data.  It is about engineering.

Personally, I have made the same distinguishing, but for some reason with the terms somewhat reversed.  For me, computational biology has always been about the development of methods and tools, while bioinformatics has been about appyling methods to study biology.

I suppose someone can argue with the my use of the term “bioinformatics” as an engineering discipline.  That’s fine–I’m open to a different term.  But I would ask why bioinformatics isn’t good.   I think computational biology is more solid–the ‘biology’ is clearly the noun and the ‘computational’ is clearly the adjective.

Good point for that use of the terms.  My reasoning for the other use was that computational biology clearly had a focus on the “computational” and isn’t just studying biology by running computer programs.

Anyway, the actual terms are not so important, but I completely agree that the mix of mathematics/statistics, computer science and biology — whatever we call that mix — consists of several disciplines:

  • Tools and methods development
  • Applying tools and methods in data analysis

I wouldn’t put the first item, tools and methods development, entirely in an “engineering” box, though.  Some tools development is just an engineering exercise, implementing existing well known methods, but some method development involves formalising new hypothesis and implementing ways of checking them into computer tools.

The same goes for the second point; applying tools can range from running data through a pipeline with very little other user involvement to detailed and careful analysis of all computational results compared to the underlying biological hypothesis.

49-69=-20