Horizon 2020

ok, I didn’t actually want to write about this but provoked by a few tweets I feel provoked.

I was at my grandmother’s 85 years birthday yesterday and sat across from one of my cousins. You go through the usual “what are you doing these days” and as usual I tell some story about trying to figure out something basic. In this case “I want to figure out how the great apes diverged into different species”.

I do want to figure that out, it wasn’t a lie, but that is besides the point.

Anyway, he asks me if it would be better to focus science on something applied like curing diseases.

I answered “no”.

He looked at me, like he expected something more, so I felt I had to elaborate…

The thing is, the longer answer is rather complex, but rather important.

Let me make a cartoon of science, and I apologise for making science seem so simple. But you can think of two kinds of science. The “figuring out how the world works” science, and the “figuring out how to make cool stuff” science. We call the first “basic science” and the second “applied science”.

Your fancy new phone was developed based on the second kind of science. Most medicine is based on the second type. Pretty much everything you touch that is science based is based on the second kind.

So it seems like the second type is pretty important.

The thing is, it is all contingent on the first kind.

You can’t do “figure out how to make cool stuff” if you haven’t “figured out how the world works” first.

Quantum mechanics was an attempt to figure out how the world works that people worked on at the beginning of the 20th century. That lead to transistors. In the 21st century you can’t move about without touching a computer.

People didn’t invent quantum theory to build computers, that was just a lucky coincidence.

A lot, if not most, of the technology you have today is based on basic science. Science that was not based on any goal to develop cool gadgets. Science that was focused on figuring out how the world actually works.

It turns out that if you understand how the world works you can work out ways of making cool gadgets. If you don’t understand how the world works you are left with the option of steam driven mobile phones.

Curiosity driven research pays for itself, even if most of the discoveries have no applications what so ever. The few important discoveries will pay for  the rest, and you have no way of knowing what is worthwhile examining and what is a waste of time.

With the EU’s new Horizon 2020 we have to have companies involved. I don’t know if I read this incorrectly, but it looks to me like you have to have a working prototype in a few years and that is just not how important science work.

You would never have a computer if you wanted to go from basic research to a working laptop in three years…

Grid workflow system

The last week, since last Saturday afternoon, I’ve worked on and off on a small utility for specifying workflows on our computer grid here at BiRC.

I’m used to using Make files to keep track of workflows and making sure that the files I’m manipulating are up to date, but now I need to run my workflows on computer grids more and more – with the size of data I work on these days it is just not feasible to run it on my desktop – and that doesn’t quite work with Make.

You can of course use Make to keep track of time stamps and such, but when running jobs you want to submit them to be run in parallel and you need them to be scheduled so you know that the dependent tasks are done before you start running the next set of jobs.

I tried to google around for something to solve that problem but couldn’t find anything – perhaps my google-fu just isn’t strong enough – but I figured it couldn’t be that hard to write it myself, so I did.

After all, it just boils down to specifying some tasks, figuring out which input files they need and which output files they produce, and then building a dependency graph between tasks. The tasks can then be submitted to the computer grid with their dependencies and the queue system takes care of the rest.

I programmed up the dependency graph and the specification language last weekend and the beginning of this week, and then the latter half of the week and this weekend I rewrote the scripts for my current projects to use the workflow system.

Now I have some simple workflow files for my projects and I can submit jobs, with dependencies, with a single shell command. If I need to see which commands will be run, I can run the submission as a “dry run” that shows what will be run, or I can get a list of tasks with status (showing what is up to date or what needs to be run and why), and I can even get a graph showing all the tasks using Graphviz.

After writing a bit of documentation I will find some testers for a beta release, and I hope others will find it as useful as I am finding it right now …

You can get the code at githup.