Programming for (non-computer) scientists
A few days ago, bbgm wrote about Programming and Science Education and how people advocating that students use Excel (or similar) instead of “real” programming languages. I quote:
As a computational scientist (and with both a physical and a life science background), that such arguments still happen is appalling. IMO, all scientists, especially those remotely connected to theory and/or computational science should be given the opportunity to learn some formal software engineering and computer science principles (for physical chemists, bioinformaticians, etc is should be mandatory to do some courses).
I guess I agree, but I haven’t made up my mind to which degree I agree.
How much programming do you really need?
You can do a lot with spreadsheets and they are a really powerful tool once you get used to them. I have never bothered myself, but I have seen what my colleagues manage to do with them. So they may be sufficient for what you need and then it makes sense to learn how to get the most of them.
That being said, it is not a substitute for a programming language, and when you need a proper programming language you need a lot of hacking to do the same tasks in a spreadsheet.
If you need a complex statistical analysis of your data, you are better off using a language/environment such as R than trying to do the analysis in Excel. Sure, Excel can do a lot of statistics, but only the most common types of analysis. If you need more, you need something like R.
Depending on how much manipulation of your data — and how much mathematical modelling you need to implement yourself — R might not be enough. If you need to combine a lot of output of various programs, do a few manipulations on the output, and analyse that, you probably want to consider a scripting language like Python or Perl.
For heavy duty scientific programming, you want MATLAB or Octave or such. When you need the extra speed, and your language does not already have an optimised solution, C or C++ is useful (either for a full application or for a module in your scripting or statistical language — they usually will allow you to make extensions in C/C++ that you can then use as any other module).
How much programming you need to do in scientific work varies a lot and there is not really any reason to spend time learning something that you will never use.
Maybe a spreadsheet is all you will ever need. Just make sure that you are doing the analysis you want to do, not the one your tools limits you to, and if you need more, then spend the time learning how to program!
Learning how to program
Remember, though, that learning to program a computer takes time. It can take a lot of time before you do it well.
There is a lot more to it than learning the syntax of a programming language. You need to know the right way to solve given problems in your language of choice — the optimal way in one programming language can be very different from the optimal way in another, depending on libraries, language features, underlying philosophy of the language, etc. — and in general you need to learn how to think like a programmer.
It might be a different kind of problem solving skills than what you are used to.
Most of all it takes practise. This can be frustrating if you pick up a language to solve a specific problem that you are more interested in than learning how to program.
If you only very rarely need to program, don’t bother. Find someone to help you out. This is how we bioinformaticians get involved in various projects so although we complain about being over worked, we don’t really mind so much. It is not different from having to consult a statistician from time to time to make sure that you are analysing your data correctly. If you do not need it that often, your time might be better spend on other tasks.
If you often need to solve programming problems, you should learn how to do it properly. You get a better feeling for the problems and for the data when you work with it, and you don’t want to out-source that to other people.
If you decide you need programming in your research — and I think more and more sciences rely on IT so maybe you do before you know it — then practice practice practice. Experiment with your programming language. Read discussion fora and user groups. Read other peoples code and see how they solve similar problems.
Just as it takes time to learn how to use a spreadsheet properly — and just learning all the features it offers — it takes time to learn how to program. Usually more so, as programming languages are much more powerful. Spend the time! There is no fast way around this, there really isn’t.
June 6th, 2008 at 9:52 am
Hi Thomas,
Nice thoughts, echoes with Norvig’s piece at http://norvig.com/21-days.html .
Ani
June 6th, 2008 at 12:38 pm
Indeed, Ani.
A little learning is a dangerous thing;
drink deep, or taste not the Pierian spring:
there shallow draughts intoxicate the brain,
and drinking largely sobers us again.
Alexander Pope, An essay on Criticism
June 8th, 2008 at 10:27 pm
[...] Mailund on the Internet writes about when scientists will want to learn how to program instead of using [...]