Let’s kill desktop computing

I don’t want to get rid of the desktop computer. I like it. It is a nice interface for communicating with my computation tasks, not to mention messaging, emailing, blogging, image editing etc. It is just that I don’t want my desktop computer to run most of my computation tasks. The desktop computer should be for interacting with computations, but there is really no reason why it should also carry out those computations.

A typical situation for me is that either I have very light computational requirements — what is needed for text processing or maybe compiling a TeX document — or I need a lot of computer power — when I am running my data analysis in my research. I don’t think that this is atypical for scientists, at least it is a situation I share with most of the people at BiRC.

My desktop computers are not powerful enough to deal with my scientific computing — well, they are, but it takes ages to run on them — but they have plenty of power to spare when I am just doing “office work”.

Grid computing

The solution has been around for years and is called grid computing. Even way back when I was teaching networks and distributed systems at the computer science department, we would cover the ideas behind grid computing (back then it was really just client/server architectures, RPC and later RMI, distributed file systems etc., but the ideas that are now called grid computing were around).

What we really want is to connect all the computers on the net into one big honking system where we can get the computer power we need, when we need it, from all those machines that are idle anyway. On the rare occasions where we need a super computer, we want to be connected to that as well. Of course, we do not want to pay for having a whole network of computers just standing around waiting for us to need them — much less having a super computer sitting idle waiting for us — but when we need the computer power, we want to be connected to it.

SUN tried to sell the idea with the slogan the network is the computer. I don’t really know how well that went, but I haven’t heard the slogan for years, so that successful it can’t have been.

The grid is such a great idea, so why isn’t it widespread already? Why am I still using my personal desktop computer to run my computations?

Personal experiences

I’ve had a bit experience with grid computing myself. While developing GeneRecon, I needed a lot of computers to test the software — pretty time consuming in itself and I needed to explore a large parameter space — so I got access to NorduGrid. It was a horrible experience. Setting up the grid to run my own software was such a hassle and never really worth the (limited) CPU cycles I got out of it.

Then I got access to the new Minimal intrusion Grid (MiG) developed by Brian Vinter’s group. That was an improvement over NorduGrid, and good enough to finish the GeneRecon experiments. See

Experiences with GeneRecon on MiG
T. Mailund, C.N.S. Pedersen, J. Bardino, B. Vinter, and H.H. Karlsen
Future Generation Computer Systems 2007 23 580–586. doi:10.1016/j.future.2006.09.003.

for details.

It was an improvement, but it wasn’t a great experience.

Running programs on MiG requires a lot of extra work. First input files must be uploaded to the grid. Also the executable for the program, if I haven’t uploaded it already (and I’m ignoring problems with figuring out where the executable can actually be executed and such). The the job must be specified through a configuration language and submitted to the grid. When the job is executing I have to poll it from time to time to get its status. When done, I have to download the output files and clean up after the job.

Compare that to just running the program on my own computer.

I have used MiG for a couple of projects now, but for day to day work, it is just too much of a hassle.

Does it have to be so hard?

Why shouldn’t it be just as easy to run programs on the grid as on the desktop computer?

I know, if I want a distributed system with all the bells and whistles, then it is a more complicated problem than writing single machine applications, but for the cases where I just want sufficient computer power to fire off a few independent computations in parallel, there shouldn’t be any problem.

There is, but there shouldn’t be!

To access files, why should I need to up- and download? I should just mount a file system in some appropriate way, right?

To run a program on the grid, couldn’t I just distribute it to another node when loading the program?

It probably isn’t quite that easy, but by wrapping my programs in proxy executables, I should be able to achieve something very similar, at least. I’ve actually played with such a system for MiG — called MyMiG — so I know that at least something in that direction can be achieved. It just needs a bit more work (which is reasonable, since my solution took a weekend to cook up).

I realize that more complex distributed applications will need more work, but with XML-RPC and SOAP and whatnot, it shouldn’t be that much of a problem to get there.

With a proper grid setup, I could get the computer resources I need, and my desktop computer would only be needed to interact with my programs, not run them. Actually, with a proper setup, I should be able to access my computations from any computer — desktop, laptop or even smart phone — everwhere.

Can we get there?

What will it take to get to that point? Does there already exist systems out there that works this way? I’ve heard Xgrid mentioned, but do not really know anything about it, does anyone know how it works?

Last week, google annouced that they would offer free storage of scientific data. Would it be too optimistic to think that within a year, some company would offer free grid computation resources? It doesn’t even have to be completely free, you could imagine a setup where you provide your “screensaver” CPU cycles — like seti@home etc. — for access to the grid for your own tasks. With an open platform for this, shouldn’t the open source community then be able to build a great interface to it?

Tags: , ,

8 Responses to “Let’s kill desktop computing”

  1. Jason Says:

    Do you think that somehow making it easier to apapt code for BOINC {Seti|Folding|etc}@home) so that you can tap into those types of screensaver CPU cycles?

  2. Lamenting Grid access « Stajichlog Says:

    [...] Grid access 22 01 2008 Thomas makes some good points about his experiences and the still greater need for GRID computing.  I am all for people writing [...]

  3. Thomas Mailund Says:

    Jason: Yeah, that might be the way to go.

    A distributed computing setup like BOINC is a truly great way to utilise otherwise idle CPU cycles, so I am all for having more of that. For my day to day work, though, how would it work? I am showing my lack of knowledge about BOINC here, so please correct me if I am talking nonsense here :)

    I don’t have large scale data analysis projects that can keep screen savers running for months or years, but in peaks of a project I have maybe weeks of CPU jobs that I’d like to finish within days (or hours).

    No one will install a screen saver on their box that sits idle and wait for my jobs to arrive every few months. So I will not have much success with making a BOINC project for myself.

    What might work, though, is a shared BIONC project where several people input jobs to the system. It would require that job resources could be updated (maybe some plugins or something) to new job types as needed, but if it could deal with that it would allow several people to use the screen savers as a computer resource when they need the computer power. I could submit my jobs to it every other month and they would be served once they made it through the queue, and when I do not have jobs to run, someone else surely will.

    I’ve just been discussing this very setup with a guy from the MiG group the last hour on Jabber, so I am not sure how well I am explaining myself now with the other conversation running through my head, but I hope you understand me.

    Anyway, I wasn’t so much ranting about the underlying grid as the way I interact with it. I still want to hide the grid from my computer interaction. The grid should take and process my jobs, but to me the programs should run as if on my own machine.

    A way to wrap executables or an API to interact with something like BOINC might be all that is needed, though.

    Well, that and a way to distribute new versions of my programs to the grid, ’cause in most of my research I need to modify the analysis code several times before it works the way I want it to, and I don’t discover all the bugs on the small data sets I can run in my test suites.

  4. Mailund on the Internet » Google cluster computing Says:

    [...] Yeah!  Let’s have more of that, but remember to make it easy to use for scientists.  Integrate cluster computers with the desktop! [...]

  5. Mailund on the Internet » Google cluster computing Says:

    [...] Yeah! Let’s have more of that, but remember to make it easy to use for scientists. Integrate cluster computers with the desktop! [...]

  6. Suman Says:

    At brown university, we run a system called cohoq which salvages spare cycles on all the department machines. AI and CompBio people use it to run their experiments. I have heard that open source alternatives exist, but I am not sure what they are.

    Also, does your work involve embarrassingly parallel computation? If so, something like hadoop might work for you, if you have a set of commodity machines instead of a super computer. I have also heard that people are using xbox for getting more computing power than commodity CPUs.

  7. Thomas Mailund Says:

    Most of what I do is embarrassingly parallel computation, so something like that would work, yes. It is not that far off from what I run on the grid system I have access to — MiG — where “screensaver science” is possible. MiG even has PS3 (but not xbox) machines on the grid, but I am not running on those yet. To fully utilise them, you need to change your algorithms. I have some students working on that, though.

    In any case, what I was complaining about in this post was mainly that it is very different as a user of the system to use a grid system than to use your desktop computer, and that is really what keeps me from using grid resources for all my computations. There is just too much of an overhead that I can be bothered to do so…

  8. Mailund on the Internet » Blog Archive » Some thoughs on grid computing… Says:

    [...] had some experience with grid computing (see an old post about it here) but mostly I have found it too much trouble to worth the [...]

Leave a Reply