Ok, I don’t really have a lot of Macs but I’m planning on getting a Mac Pro or two at the office for some of my computations, so I wanted to figure out how Xgrid works so I can use that for those computations. So I wanted to try it out on my macbook and iMac, just as a proof of principle.
Just getting the grid up and running, I ran into a few problems, so I’m going to write down how I finally manged to get it going here, so I can reproduce it later.
Setting up the grid
Step one, I downloaded the Xgrid Admin tool from here. I had also installed it earlier but without getting around to playing with it, but that installation disappeared with my upgrade to Snow Leopard and I had to install it again.
Starting the tool up, it asks for a controller. I told it to just use my iMac and gave it a password. All well and good, but so far no Agents to actually run any jobs.
Step two, I enabled Xgrid in the Sharing Systems Preferences.
Under Configure I picked the controller I had set up, and again I gave it a password. Now comes the first problem I ran into. I mistyped the password here – I wanted the same as for the controller just to make it easier on myself, but got it wrong. The agent started up fine, didn’t complain about the password or anything, but it didn’t show up as an agent in the Admin tool.
I tried adding it there, but was told it was unavailable. I mocked about with this for a while but just couldn’t get it to work at all.
It wasn’t until I tried connecting the macbook instead I figured it out. There I got the password right, and it pop’ed up in the Admin tool. So I made a wild guess about the password being the problem, retyped it in the Sharing dialogue and now the iMac finally connected as an agent
and the Admin tool told me I had 5.46 GHz to compute with
I’m a bit miffed that there was no authentication steps that could have told me what was going wrong, but I guess the trick is to just pick the same password for the controller and all the agents or something like that, ’cause that at least seems to work for me now.
Running a job
To submit jobs, you have to use the xgrid command.
Just running it gives you this:
$ xgrid
xgrid
usage: xgrid <options> <action> <parameters>
Any number of the following <options> may be specified:
-h[ostname] <hostname-or-IP-address>
-auth {Password | Kerberos}
-p[assword] <password>
-f[ormat] xml
A single <action> and its <parameters> must be specified:
-grid list
-grid rename -gid <grid-identifier> <new-name>
-grid add <grid-name>
-grid {delete | attributes} -gid <grid-identifier>
-job list [-gid <grid-identifier>]
-job {attributes | specification | log | wait} -id <job-identifier>
-job submit [-gid <grid-identifier>] [-si <stdin>] [-in <indir>] \
[-dids jobid[,jobid]*] [-email address] \
[-art <art-path> -artid <art-identifier] [-artequal <art-value>] \
[-artmin <art-value>] [-artmax <art-value>] <cmd> <arg1> ...
-job batch [-gid <grid-identifier>] <xml-batch-submission-file>
-job results -id <job-identifier> [-tid <task-identifier>] \
[-so <stdout>] [-se <stderr>] [-out <outdir>]
-job {stop | suspend | resume | delete | restart} -id <job-identifier>
-job run [-gid <grid-identifier>] [-si <stdin>] [-in <indir>] \
[-so <stdout>] [-se <stderr>] [-out <outdir>] [-email address] \
[-art <art-path> -artid <art-identifier] [-artequal <art-value>] \
[-artmin <art-value>] [-artmax <art-value>] <cmd> <arg1> ...
xgrid -?, or xgrid with no arguments, will print this usage message.
I don’t really know what the options mean, so I tried firing off a few, and I just kept getting the same output. A bit disappointing.
I guessed that the hostname option was needed, but -hlocalhost just didn’t work for me, but eventually I found out that “-h localhost” would. Well, not exactly work, but at least it complained that I needed to authenticate the command:
$ xgrid -h localhost -grid list
{
error = "could not connect to localhost (Authentication failed)";
}
Adding a password with “-p password” did the trick. Again, you do need the space between -p and the password.
$ xgrid -h localhost -p password -grid list
{
gridList = (
0
);
}
I don’t know what the output means here, but at least I was making progress.
Asking for a job list (I’m guessing here) gave me an empty list:
$ xgrid -h localhost -p password -job list
{
jobList = (
);
}
which I expected since I haven’t submitted any jobs, so I tried sending a simple “ls” command.
$ xgrid -h localhost -p password -job submit ls
{
jobIdentifier = 0;
}
In the Xgrid Admin tool I saw that the job had failed, so I figured it could be a path thing and gave it the full path of the job
$ xgrid -h localhost -p password -job submit /bin/ls
{
jobIdentifier = 1;
}
and that seemed to do the trick:
Asking for a job list with xgrid shows me two jobs
$ xgrid -h localhost -p password -job list{
jobList = (
0,
1
);
}
I figured that -job results should give me the states of the jobs, like I could see them in the Admin tool, but I don’t get any output when I run that command, so I don’t know how that is supposed to work.
I can delete a job, though:
$ xgrid -h localhost -p password -job delete -id 0
{
}
but I still haven’t figured out how to get the status or output of the job from xgrid.
I guess it is time to stop experimenting and read the manual…
Update: Ok, I did just one more experiment. If I run a program that is guaranteed to give some output I do get that output when I ask for the result. I tried just running xgrid and I got the help text. I guess the ls command I tried before was run in an empty directory and that is why it didn’t produce any output.
I still haven’t found a nice way to get the status, but the -job attributes command at least gives me a lot of info about the job including the jobStatus.
I still have some experimenting and reading to do before I get the grid up and running on some of the computations I am actually interested in, but I am optimistic now at least.
–
261-290=-29