Some thoughs on grid computing…
Earlier this week, the LHC Computing Grid went online. A description of the system can be found here, and blog posts about it here, here and here.
This got me thinking about grid computing for small scale scientists like myself.
I’ve had some experience with grid computing (see an old post about it here) but mostly I have found it too much trouble to be worth the effort.
Our typical computer use
For large projects that require years of CPU time, it is well worth the effort to set up the infrastructure to run computations on grids. You really need it to get your the computations done, and the overhead is very small in comparison with the actual computation time.
Most of my projects — and most of the projects we do at BiRC — are a bit different.
We do need the computation power, but we are usually tinkering with our programs for most of a project — since we rarely know exactly how to analyse our data until we are mostly done with it — so we cannot just distribute a fixed version of our software and then start distributing the computations.
The typical work flow is that we write a program for our analysis, then we run the analysis and when we look at the results we find some strange results here and there. Then we extend the software to either extract more information from the data, or to fix a bug that caused the weird results.
We then need to run the analysis again, and repeat the process.
The analysis might take a few CPU days to a few CPU months — so it is small scale for grid applications — but between each analysis we spend a week or so modifying and testing our software.
We have a small cluster of Linux computers for this, and it is always in one of two states: completely overloaded or burning idle cycles.
This is the situation grid computing could fix. Theoretically we should be able to get CPU cycles off the grid when we need it, and sell it to the grid when we are not running computations ourselves.
In practice, our work pattern makes this difficult.
The problems with small scale grid computing
If you are changing your software all the time, you need to distribute it together with the data you analyse.
This means you either send compiled binaries with the job submissions, or you compile the software as part of the job.
The former is fine if you have a program you can compile — and you’d better link it statically ’cause there is no guarantees about the libraries you can find on the resources that will run it.
If you have a bunch of scripts, you are not so lucky.
There are no guarantees that the computer that will run the computations has the script interpreter — or if it does that it is a version that can run your script — and even if it does, what about the modules you need?
You don’t want to have to compile BioPython or SciPy on a grid machine just to run your scripts. The overhead in CPU time is going to be several percentage of your actual run (at least if you parallelise your computations to high enough a degree to be worth the grid in the first place), and how can you even know that there is a compiler to compile it at the other end? You can’t, and there probably isn’t unless you are very lucky.
It is a major pain to see your jobs aborted after slowly making their way through the job queue, just because the host computer cannot even setup the environment you need for your computations.
What can we do about it?
If we want to use the grid for even smaller scale computations, at the very least we need an easier way to distribute new versions of our programs.
I have an idea for this.
Some grids, at least, are already dealing with “runtime environments” where you can specify that your job needs to run in a certain runtime environment, and the scheduler will only send your jobs to resources that can provide that environment.
This sounds like just the thing, but the catch is that it is up to the resource administrators to set up these environments and to tell the grid system that they provide them.
For something like LHC, it is probably not a problem to convince administrators to provide the right environment, but for Thomas Mailund it is.
What we need is a way for the grid users to be able to install environments on the resources!
So how about this: we introduce the concept of “runtime environment packages” that we can upload to the grid system. They consists of a setup script (configure ; make) and a test suite, for example.
When a resource is idle, it tests if there are new environments available in queue, downloads these, and tries to build and test them. If it succeeds, it informs the grid system that it can run the new type of environment. The scheduler only sends jobs to resources that have the right environments, so if your environment tests are working properly, you never end up on a resource that cannot run your jobs.
We could even add environment requirements on the environment packages, so they don’t have to be self-contained. E.g. to install SciPy, you don’t want to have to install Python itself, and there is no reason for resources without Python to try to install it only to give up.
To prevent resources to be filled up with old environment, we can add a time out period to environements, so they are deleted when they haven’t been used for a couple of days/weeks/months.
It shouldn’t be that hard to implement. I am sure I could do it, but I don’t have my own grid infrastructure to work with, so I guess I’ll have to intimidate persuade someone else to do it…
October 8th, 2008 at 9:31 pm
But what if your software depends on the newest version of the Linux kernel? Or depends on a commercial or closed-source package? Or if your software runs on Windows/Mac/Whatever?
Another idea would be to create a working minimum setup including the operating system, required packages (commercial, open source, …) and data, and pack it as a virtual machine. The grids should then be running thin (native) virtual machine hypervisors, who would fetch and execute pending vm images. For distributing and parallelizing, any existing protocol could be used, since it could just be distributed as part of the VM. The overhead of downloading a complete VM would not be large (a complete Ubuntu server fits in a 250MB VMWare image), and probably faster than trying to compile an application and its dependencies from source.
October 8th, 2008 at 9:40 pm
Dependencies like those you mention would get caught by the installation script / testing of the package.
Installing virtual machines instead of packages is not a bad idea, but you would need to install lots of them to support more than one environment, and then they start to fill up the resources…
As for the compilation time: if the environments can have dependencies, then most of them won’t take that long to compile, and they don’t have to be for each job — only each time a new environment needs to be installed.
October 8th, 2008 at 10:12 pm
Oh, but the VM’s should not be considered to be environments which would be kept on the server – an VM image would correspond to one job, so you would just submit a complete VM image for processing to the grid hypervisors who would discard the image after use.
The grid should have no pre-existing environments of any kind. This way you would be sure to get exactly the setup you want.
October 8th, 2008 at 11:21 pm
In that case, sending an entire OS along with job might work, but it is a large IO overhead for most of my jobs…
October 10th, 2008 at 2:22 pm
Configuring and packing a Grid Job as a virtual machine image is something we are currently working on in the Minimum intrusion Grid (MiG) Grid infrastructure.
As a part of that project we are looking at diff mechanisms to ensure that only changed data are actually transmitted when transferring the images.
October 10th, 2008 at 2:25 pm
How far have you gotten in this, Martin? Is it something I can try out? How much work is it in setting up?
October 11th, 2008 at 12:04 pm
Well, it’s still in the student project state, so I think you will have to ask Brian :)
October 11th, 2008 at 1:21 pm
In that case, I guess it is vaporware … Of the project I’ve suggested that Brian has put students on, there is a success rate of 0%, so I am not holding my breath on this one ;-)