I don’t want to get rid of the desktop computer. I like it. It is a nice interface for communicating with my computation tasks, not to mention messaging, emailing, blogging, image editing etc. It is just that I don’t want my desktop computer to run most of my computation tasks. The desktop computer should be for interacting with computations, but there is really no reason why it should also carry out those computations.
A typical situation for me is that either I have very light computational requirements — what is needed for text processing or maybe compiling a TeX document — or I need a lot of computer power — when I am running my data analysis in my research. I don’t think that this is atypical for scientists, at least it is a situation I share with most of the people at BiRC.
My desktop computers are not powerful enough to deal with my scientific computing — well, they are, but it takes ages to run on them — but they have plenty of power to spare when I am just doing “office work”.
Grid computing
The solution has been around for years and is called grid computing. Even way back when I was teaching networks and distributed systems at the computer science department, we would cover the ideas behind grid computing (back then it was really just client/server architectures, RPC and later RMI, distributed file systems etc., but the ideas that are now called grid computing were around).
What we really want is to connect all the computers on the net into one big honking system where we can get the computer power we need, when we need it, from all those machines that are idle anyway. On the rare occasions where we need a super computer, we want to be connected to that as well. Of course, we do not want to pay for having a whole network of computers just standing around waiting for us to need them — much less having a super computer sitting idle waiting for us — but when we need the computer power, we want to be connected to it.
SUN tried to sell the idea with the slogan the network is the computer. I don’t really know how well that went, but I haven’t heard the slogan for years, so that successful it can’t have been.
The grid is such a great idea, so why isn’t it widespread already? Why am I still using my personal desktop computer to run my computations?
Personal experiences
I’ve had a bit experience with grid computing myself. While developing GeneRecon, I needed a lot of computers to test the software — pretty time consuming in itself and I needed to explore a large parameter space — so I got access to NorduGrid. It was a horrible experience. Setting up the grid to run my own software was such a hassle and never really worth the (limited) CPU cycles I got out of it.
Then I got access to the new Minimal intrusion Grid (MiG) developed by Brian Vinter’s group. That was an improvement over NorduGrid, and good enough to finish the GeneRecon experiments. See
Experiences with GeneRecon on MiG
T. Mailund, C.N.S. Pedersen, J. Bardino, B. Vinter, and H.H. Karlsen
Future Generation Computer Systems 2007 23 580–586. doi:10.1016/j.future.2006.09.003.
for details.
It was an improvement, but it wasn’t a great experience.
Running programs on MiG requires a lot of extra work. First input files must be uploaded to the grid. Also the executable for the program, if I haven’t uploaded it already (and I’m ignoring problems with figuring out where the executable can actually be executed and such). The the job must be specified through a configuration language and submitted to the grid. When the job is executing I have to poll it from time to time to get its status. When done, I have to download the output files and clean up after the job.
Compare that to just running the program on my own computer.
I have used MiG for a couple of projects now, but for day to day work, it is just too much of a hassle.
Does it have to be so hard?
Why shouldn’t it be just as easy to run programs on the grid as on the desktop computer?
I know, if I want a distributed system with all the bells and whistles, then it is a more complicated problem than writing single machine applications, but for the cases where I just want sufficient computer power to fire off a few independent computations in parallel, there shouldn’t be any problem.
There is, but there shouldn’t be!
To access files, why should I need to up- and download? I should just mount a file system in some appropriate way, right?
To run a program on the grid, couldn’t I just distribute it to another node when loading the program?
It probably isn’t quite that easy, but by wrapping my programs in proxy executables, I should be able to achieve something very similar, at least. I’ve actually played with such a system for MiG — called MyMiG — so I know that at least something in that direction can be achieved. It just needs a bit more work (which is reasonable, since my solution took a weekend to cook up).
I realize that more complex distributed applications will need more work, but with XML-RPC and SOAP and whatnot, it shouldn’t be that much of a problem to get there.
With a proper grid setup, I could get the computer resources I need, and my desktop computer would only be needed to interact with my programs, not run them. Actually, with a proper setup, I should be able to access my computations from any computer — desktop, laptop or even smart phone — everwhere.
Can we get there?
What will it take to get to that point? Does there already exist systems out there that works this way? I’ve heard Xgrid mentioned, but do not really know anything about it, does anyone know how it works?
Last week, google annouced that they would offer free storage of scientific data. Would it be too optimistic to think that within a year, some company would offer free grid computation resources? It doesn’t even have to be completely free, you could imagine a setup where you provide your “screensaver” CPU cycles — like seti@home etc. — for access to the grid for your own tasks. With an open platform for this, shouldn’t the open source community then be able to build a great interface to it?