Grid workflow system
The last week, since last Saturday afternoon, I've worked on and off on a small utility for specifying workflows on our computer grid here at BiRC.
I'm used to using Make files to keep track of workflows and making sure that the files I'm manipulating are up to date, but now I need to run my workflows on computer grids more and more - with the size of data I work on these days it is just not feasible to run it on my desktop - and that doesn't quite work with Make.
You can of course use Make to keep track of time stamps and such, but when running jobs you want to submit them to be run in parallel and you need them to be scheduled so you know that the dependent tasks are done before you start running the next set of jobs.
I tried to google around for something to solve that problem but couldn't find anything - perhaps my google-fu just isn't strong enough - but I figured it couldn't be that hard to write it myself, so I did.
After all, it just boils down to specifying some tasks, figuring out which input files they need and which output files they produce, and then building a dependency graph between tasks. The tasks can then be submitted to the computer grid with their dependencies and the queue system takes care of the rest.
I programmed up the dependency graph and the specification language last weekend and the beginning of this week, and then the latter half of the week and this weekend I rewrote the scripts for my current projects to use the workflow system.
Now I have some simple workflow files for my projects and I can submit jobs, with dependencies, with a single shell command. If I need to see which commands will be run, I can run the submission as a "dry run" that shows what will be run, or I can get a list of tasks with status (showing what is up to date or what needs to be run and why), and I can even get a graph showing all the tasks using Graphviz.
After writing a bit of documentation I will find some testers for a beta release, and I hope others will find it as useful as I am finding it right now ...
You can get the code at githup.