Boat race

Today, Aarhus University holds its traditional boat race at our university park.

This is a rowing competition where student groups from all over the university competes.  It isn’t quite like the traditional Oxford/Cambridge boat race.  For one thing, the race is across a lake that is only a few meters wide (but you have to cross it more than once). Another difference is that only inflatable boats are allowed in the race.

Well, with one exception.  The group that I belong to (although no longer a student I am still a proud member), TAAGEKAMMERET, can compete in any home made boat we want as long as we promise not to win.  So far we have kept that promise and I am sure we will keep up this proud tradition.

Another thing that differs a bit is that it is not just a rowing competition, but also a drinking competition.

Each time you reach the other shore, you have to drink a beer before you can get back in the boat for the next lap.

Beer sale starts at noon, the race is two to five, and after that it breaks into different parties at the various departments around the university.

Tomorrow is a holiday here, which is probably good since those parties tend to last all night.

Textile plots of LD

There’s a paper that came out yesterday in PLoS ONE on visualising LD structure:

The Textile Plot: A New Linkage Disequilibrium Display of Multiple-Single Nucleotide Polymorphism Genotype Data

Kumasaka, Nakamure and Kamatani

Linkage disequilibrium (LD) is a major concern in many genetic studies because of the markedly increased density of SNP (Single Nucleotide Polymorphism) genotype markers. This dramatic increase in the number of SNPs may cause problems in statistical analyses, such as by introducing multiple comparisons in hypothesis testing and colinearity in logistic regression models, because of the presence of complex LD structures. Inferences must be made about the underlying genetic variation through the LD structure before applying statistical models to the data. Therefore, we introduced the textile plot to provide a visualization of LD to improve the analysis of the genetic variation present in multiple-SNP genotype data. The plot can accentuate LD by displaying specific geometrical shapes, and allowing for the underlying haplotype structure to be inferred without any haplotype-phasing algorithms. Application of this technique to simulated and real data sets illustrated the potential usefulness of the textile plot as an aid to the interpretation of LD in multiple-SNP genotype data. The initial results of LD mapping and haplotype analyses of disease genes are encouraging, indicating that the textile plot may be useful in disease association studies.

An example of this new kind of plots looks like this:

At a quick glance it looks like it is displaying haplotype blocks, like you can get in HaploView (although in a nicer graphics).

It isn’t quite that, though.

The textile plot is showing LD between genotypes and not haplotype blocks, so you always have three “blocks” per column, and so you don’t know the phase of the genotypes you are looking at.

The plot simply visualises the genotype LD structure, and I am sure that with a bit of practice they can be used to explore that.

I don’t have that practice, though, so I find them a bit hard to interpret.  They are beautiful, though.

Configuring Xgrid … again!

The replacement for my broken office machine came this morning.  I got a nice Mac Pro this time, to get some more computing power to add to our Xgrid.

It’s a rather nice machine, but the screen, although 24″ as the iMac, seems a bit small, though.  Probably just because there is not much of a border around it, so to compensate I connected two screens… which reminds me that I need to go and get a new converter for the display port before I need one for connecting my MacBook to a projector…

It was also a rather nice surprise when iStat Menus showed 16 cores instead of the previous two.

There’s actually only 8 cores (it is two quad core CPUs) but with hyper threading that is what it looks like.

So far so good.

I configured it by extracting everything from my Time Machine backup from the crashed iMac.  That turned out to be a mistake, though.

When I tried to configure it for Xgrid – the reason why I got a Mac Pro rather than another iMac – I ran into trouble.

I need this machine to run a controller (because my iMac ran as the controller for our grid earlier, and the grid had been down since it was smashed), but I just couldn’t start the controller daemon!  It flatly refused to read the database file (/var/xgrid/controller/datastore.db).  I was under the impression that if I deleted this file it would just create a new one, but no such luck for me.  There was absolutely nothing I could do to get it to accept this file (or the absence of it) in the hours I worked with this…

I gave up late afternoon and decided to just reinstall everything from scratch, so I reformatted the disk and installed again.  This time I extracted Applications and Users from Time Machine only (which is all I need anyway), and finally I could start the Xgrid controller.

Now I was ready for the next problem.  Configuring the controller.

I don’t remember exactly how I managed to do this the last time, but I seem to recall that I could do it with Xgrid Admin, so I downloaded that.  I couldn’t set up the authentication that way this time around, though.

As a side note, configuring agents – the machines that can run jobs on the grid – is pretty easy.  It is all built in, and you just go to Sharing > Xgrid, pick a controller and set a password.

There is nothing similar for the controller.  There might be for the Server OS, but I couldn’t find anything on my machine.

For telling the controller which password to use, I found this blog post.  Basically, you need to copy the password file you created when you configured the agent over to the controller.

That just wasn’t enough.

I still needed to tell the controller to actually use password authentication rather than any other option.  Googling for an hour or more finally let me to the file /Library/Preferences/ for configuring the controller.  Now I just needed to figure out how to tell it to use password.

In the corresponding file for agents, /Library/Preferences/, there’s the field


so I tried setting the same in the controller configuration.  That didn’t work, so I tried


and that did the trick.

Finally, the controller was up and running.

My machine, as an agent, only provided four cores to the grid, though, but I knew what to do about that, so I updated the agent configuration to provide 16 cores (there’s really only 8, but with hyperthreading that should probably be considered 16).

As soon as I get the other agents configured with a new controller (the new machine has a different IP address than the old one), our grid should be back up and running.

All in all I wasted an entire day getting this up and running, but without the grid there really isn’t that much of my current data analysis I can get done, so it had to be done.

On non-genic disease polymorphism

Daniel MacArthur discusses genome-wide association studies which has so far mainly found disease associated polymorphisms outside of genes.

The claim in question is that the tendency of GWAS to find disease associations outside of protein-coding genes is somehow a problem; but, as p-ter notes, there’s perfectly plausible reasons for disease risk variants to be found in non-coding regions.

Indeed, I think most of us working in genomics have seen the proliferation of non-coding hits in GWAS studies as a positive, in that it seems to be teaching us something new and unexpected about the underlying biology of human variation.

There is a problem with polymorphisms outside of genes.  We generally have no idea how they functionally affect us to increase or decrease the disease risk.  If we have no idea what a given polymorphism means in terms of function, it is harder to work out; we don’t really know where to start with figuring it out.

As far as I can see, though, that is the only problem with that.

That’s it, though, as far as I can see.  If the polymorphism is statistically significant associated with the disease, and we can replicate this in independent data, then that is what the data is saying.  It might be inconvenient, but tough luck!  No one promised us that this would be easy.

Quoting from Gene Expression:

Their answer to this rhetorical question is that common SNPs (used on current genotyping platforms) are generally nonfunctional. The alternative, the evidence for which I’ll present here, is that our ability to predict functional SNPs is poor. In the phrase “no known function”, the emphasis should be on the word “known”.

GWA studies have been a great success in locating polymorphisms associated with disease, that we can actually replicate.

Sure, we are working with very large data sets here, and false positives is a major problem (see e.g. here and here), but this is a problem we can handle.

And sure, GWA lets us find only the CD/CV type of disease associations and not all diseases will follow this pattern, but with the success of GWA studies so far, I think it is fair to say that there are enough to be found here to make it worthwhile!