Why do we need a separate class for programming in bioinformatics?

Why do we have a special class for programming in bioinformatics? Because it is a very different task to teach students who consider programming a means to an end, than it is to teach students who consider programming the actual goal.

In a previous post I asked “Why are we teaching an introductory programming class for bioinformatics, where there is already an introductory programming class in the Dept. of Computer Science?”  Below, I’ll try to answer that question.

A different approach to programming

The short answer is that the approach to programming is very different between computer science students and (real) science students. Computer science students consider programming something worth learning in itself, whereas other students often consider it a necessary evil they have to learn in order to work with the material they are really interested in.

This is perfectly understandable. If your interest is in biology, then it is the biological questions that you are interested in.  Statistics and programming is necessary for analysing your data — more and more so as the types and the quantity of data changes — but your main interest is not the statistics or the programming; it is the biology.

Bioinformatics students are probably somewhere in between computer science students and biology/medicine students.  If you do not enjoy working with computers, bioinformatics is not the topic for you.  If you do not care about the biological questions but only the algorithm design, software engineering, etc. you are better of in computer science than bioinformatics.

Anyway, in the class I will teach next term, about 60 of the students are not bioinformatics students nor computer science students.  They are studying medicine and just need some basic programming to be able to solve bioinformatics tasks in their “real” work.

Showing then “neat tricks” or clever design patterns is not the way to go.

One size doesn’t fit all

The kind of programming you need to learn depends a lot on what you want to do with your programs. If you are doing number crunching, you want to worry about numerical algorithms and such. If you are building real-time systems, time constraints and response time is everything.  If you are building large software systems with millions of lines of code, the key thing is proper software engineering.

In Aarhus, we teach the computer science students to be a mix of “classical” computer scientists and software engineers / software designers.  We have a lot of classes that are pure theoretical computer science — everything is done on blackboards and implementing anything is frowned upon — and we have a lot of classes concerning software architecture and such.

There isn’t really a market for pure theoretical computer science outside of academia here, so most of our students end up in jobs where designing and implementing large software systems is the main focus.  The introductory programming class reflects this.  There is the necessary basic programming, such as learning the control structures and a bit about data structures, and on top of that it is design patterns and the type system and such.  The programming language is Java, probably because it is popular, statically typed and OO.

This is fine for computer science students.  It is just their first programming class, and they will specialise in other classes.

I don’t think it is the right choice if it is the only programming class you take, and you want to use the programming for bioinformatics.

It isn’t the right choice for the physics or chemistry students that really should worry more about numerical algorithms (which is not covered in this class) and would probably be better off with a Matlab tutorial and some numerical analysis.

But physics and chemistry students are not my concern and not my problem…

Scripting and programming

Ignoring spreadsheets — which might be the most important tool for many analyses — I would guess that 90%+ of the programming tasks a bioinformatician needs to solve are what I would call “script programming”.

You write a program to automate a work-flow.  You need to parse simple text files to extract relevant information.  You combine programs in pipelines with small converter programs in between them, to translate the output format of one program into the input format of the next.

There is very little focus on this in the computer science programming class.  There it is all about “proper” programming: designing the right class hierarchies, combining the right data structures, choosing the right algorithms for the task at hand… Worrying about IO is only a necessary evil, and one that is mostly ignored, and I doubt that there is any communication with other programs.

In scripting, the right data structure and the right algorithm is rarely much of a problem.  If your scripts are much too slow, you worry about it, but more often than not, you are happy if they can do what they do in reasonable time.  It is not worth the effort to speed them up.

The right structuring of the code isn’t that much of an issue either.  Of course the code should be readable when you return to it after a few weeks or months, but you never worry about the grand design, since the program is pretty small anyway.

Sure, there are some applications where you need all the canons from computer science, but it is pretty rare in day to day life.  If you need it, take a class at that time, or just give a computer scientist a Mars bar and a Pizza to do it for you.

Learning it, just in case, is most likely just wasting your time.

The programming tasks in bioinformatics simply do not align with the skills taught in the introductory programming class in the Department of Computer Science, and that is why we need our own.

As for what goes into it, that is a topic for another day…

That’s it, no more Linux on my desktop

Maybe it’s just me, but I am getting more and more frustrated with Linux as a desktop computer. There is always some small problem that you have to struggle with, and I am getting fed up with it.

I’m sitting in a train right now. There is a wireless network here, so I can be on the Net while I’m travelling. All good and well, except that connecting to it through my Ubuntu machine is a major struggle.

Well, in itself it doesn’t sound like a lot: I have to open a dialog to find the network and then connect. For some reason I have to be root do do this, but okay, it doesn’t take that long to type in the password. I have to open just the right dialog to connect, of course, because if I just click the icon in my menubar I am politely told that the wireless interface does not exist. If I go through the Network item in the menu, that doesn’t seem to bother the computer and I can connect there.

So it is a little work but nothing to complain too much about, I guess. The problem is just that network is a bit unstable, so I loose connection to it for short periods every so often. Then I have to do it all again. Forget about keeping the dialogue open. For some reason it automatically closes if you leave it alone more than a few seconds. I suppose it is to be friendly.

I have spent more time trying to connect to the network now than I have spent time using the network.

Anyway, so now I rebooted into my Windows partition. Here, I am informed of the presence of the network, I connect, and whenever the network is dropped, I just right-click on the network icon in the taskbar and repair the connection.

It “just works”.

Of course, there isn’t much else that I enjoy about Windows. I find Outlook an exercise in frustration, and the lack of virtual desktops / “spaces” makes the desktop a mess, but I think my problems with this is simply that I am not as used to Windows as I am to Linux.

I’ve gotten used to Mac OS X since I bought an iMac for the office, so now I’ve decided to buy a Mac laptop as well. Maybe a Mac Air, just for the coolness factor.

For computations, Linux will probably still be my choice in the future. A Linux cluster is the right choice for number crunching. But as a desktop computer, I just can’t be bothered with it any more.

Too bad, since the eye candy on Linux is getting really cool. Better even than the Mac. Eye candy just isn’t enough, though, if it means you have to struggle with drivers and shit whenever you want to do the simplest little thing…