Virtual PSB 2009

Pacific Symposium on Biocomputing is a great conference.  I’ve attended it twice, giving tutorials on association mapping, in 2005 and 2006.

Unfortunately, it is in Hawaii.

Unfortunately because it is pretty expensive for me to get there, not because of Hawaii :)

The 2009 is running this week, ending tomorrow, and I have followed it only as suggested by nsaunders. It is not quite the same thing, though.

Still, since the proceeding is online, you can follow it a bit.

Personally, I’ve just finished reading this paper:

TreeQA: Quantitative Genome Wide Association Mapping Using Local Perfect Phylogeny Trees
Feng Pan, Leonard Mcmillan, Fernando Pardo-Manuel De Villena, David Threadgill, and Wei Wang; Pacific Symposium on Biocomputing 14:415-426(2009)

I also reviewed the paper when it was submitted, so I knew it already, but it was interesting to read it again. It is a Blossoc like approach, and similar in basics to a paper we have in Genetics:

Local Phylogeny Mapping of Quantitative Traits: Higher Accuracy and Better Ranking Than Single Marker Association in Genomewide Scans
Søren Besenbacher, Thomas Mailund and Mikkel H. Schierup

Genetics. Published Articles Ahead of Print: December 8, 2008, Copyright © 2008
doi:10.1534/genetics.108.092643

I’ve downloaded a few more papers to read for tomorrow.

8-6 = 2

Supplemental material

In this post John Hawks complains about the important information that is left out of papers and hidden in supplemental material.

I mention this, because Asger Hobolth and I were just discussing this yesterday.

In the olden days, ten years ago, I would simply put the two papers side by side and find the discrepancies. But nooooo, we can’t do that any more. Now, all the relevant parameters from one of the papers (you guessed it, the one published by the Nature Publishing Group) are hidden away in a supplement.

You’d think that might not be so bad, since I have the supplement. But I have to keep tracking the cross references to the paper to find out where the methods apply. It’s a pain in the neck. Nobody else ever seems to complain. But that’s because they simply don’t read the papers! AAARGGGH!

Trust me, we complain!

Asger has just spent the last week trying to reproduce a result from a paper, only to find out that a lot of crucial info was left out of both paper and the supplemental material (that contained the data, but not the filtering that was done on the data). He had to get that info from one of the authors.

Personally, I spent a couple of weeks in December trying to reconstruct a method hidden deep in the supplemental material of a Nature paper — on a project very related to the rest of John’s post, by the way — but never managed to reproduce the results from the paper.  I got close, but never quite there.

It might be taking it a bit too far to say that people don’t complain, because they don’t read the papers, but I think very few people read the supplemental material.  At least in any kind of details.

I know that I only read the supplemental material for very few of the papers I read; only those where I want to reconstruct a method, or where I don’t really believe the results and want to see how the data really support it.

Sadly, very often the supplemental information doesn’t help much there either.

Is the supplemental material even reviewed?

Post score: 8-4 = 4

Old code + old data = death by a thousand cuts

Again today I am struggling with data files I worked on a year ago.

My current API and library doesn’t like the old files, so I’ve tried checking out old code from my source code repository.

In theory that should work, however at the time I was doing this analysis I had several versions of my libraries installed, ’cause we were in the middle of developing the new version of SNPFile, and I don’t know which versions of the tools, linked to which version of the libraries, I was using…

I really need more data discipline!

On a different project we are desperately looking for log files from a data filtering script.  We need some info that we probably should have stored with the data, but that we didn’t think about at the time and so we didn’t.  From the filtering script we can see that we can reconstruct it from the log files from the script, but we cannot find these logs.

Chances are that they are not backed up either, since they were probably stored with the primary data which we store on separate drives — there are gigs and gigs of it — and these are not backed up, since we have the primary data backed up elsewhere.

We really need more data discipline!

Post score: 8-3 = 5