Post-exam evaluation of genome analysis

We have just completed the exams for genome analysis. Half the students decided not to show up, but of those who did show up, the vast majority got top grades. This is telling me that we did something wrong with the course.

If only half the people show up, and no one manage to get an average grade, it is telling me that we have made the class too hard. Since the average grades are missing, bets are that it is the people who would normally get those who decided to give up all together.

I haven’t been too happy with this class myself. We didn’t structure it that well and we probably included too much in each particular topic.

I hope we can do it better next time.

BiRC Blog

About a month ago I suggested that we started a blog at the BiRC homepage to show the outside work a bit more of the activities going on at BiRC. I don’t think we do enough of that at our current web pages. The suggestion got some mixed responses, but mainly it was just ignored, so I went ahead and added the blog just to try it out. There is not much work involved in setting it up. Skeletonz already has a plugin for it.

Throughout December, the blog ran at our pages, but could only be accessed internally at BiRC. This defeats the purpose, of course, but it served as an experiment for seeing if there was enough to blog about to make it worth doing at all.

The blog hasn’t exactly been flodded with posts, but there is at least some activity, so now I’ve made it public here.

What I have in mind for the blog is just reporting new papers, interesting seminars, releases of software and such. Stuff that ought to be reported somewhere, but that doesn’t deserve being shown on the announcement list on the front page.

The blog is just a small part of an update of the entire web-pages. Generally, I don’t think the pages show enough of what is really going on at BiRC and I’d like to change that. This is very hard to do, of course, since everyone has a different opinion about how the pages should be. Last summer we had a long discussion about the pages, the entire BiRC group, but nothing came out of it. Now my hope is that by actually making prototypes of what I have in mind, we can have a more productive discussion about it. Discussing abstract page changes gets us nowhere, but maybe discussing concrete suggestions will.

I’ve made a couple of other updates to the pages, but they are still only avaialble internally to BiRC. We will all discuss it at a meeting early February. I’ll blog more about it by then.


I’ve been using Todoist for the last couple of weeks.

I got into it a bit late. At first, I didn’t much see the point of having an online TODO list, but as my work is changing more and more into managing projects rather than doing them myself, keeping track of things to do becomes more and more important, and since I’m working on several different computers at different locations now, having the todo list online is really nice.

Anyway, knowing Todoist from Amir, I tried it out, and within days I was addicted. It is a really nice web application. Simple but powerful interface. Very “google like” actually. The project grouping and powerful query language makes it very easy to manage the list, and there is practically no overhead in using it, compared to the “pen and paper” approach I was used to.

On Mac there is even a dashboard widget, so there Todoist integrates wonderfully with my desktop. When I get around to it ™ I’d love to program something similar to GNOME so I have the same integration with my Linux desktop.

I wrote to Amir today to give my praise for his application. He answered by making me premium member for life. Wonderful! I guess flattery can get you everywhere…

Probabillistic whole-genome alignments reveal high indel rates in the human and mouse genomes

Today, while preparing for a thesis meeting with Ricky, I read Gerton’s paper

Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes

G. A. Lunter

Bioinformatics 2007; DOI: 10.1093/bioinformatics/btm185


Motivation: The two mutation processes that have the largest impact on genome evolution at small scales are substitutions, and sequence insertions and deletions (indels). While the former have been studied extensively, indels have received less attention, and in particular, the problem of inferring indel rates between pairs of divergent sequence remains unsolved. Here, I describe a novel and accurate method for estimating neutral indel rates between divergent pairs of genomes.

Results: Simulations suggest that new method for estimating indel rates is accurate to within 2%, at divergences corresponding to that of human and mouse. Applying the method to these species, I show that indel rates are up to twice higher than is apparent from alignments, and depend strongly on the local G + C content. These results indicate that at these evolutionary distances, the contribution of indels to sequence divergence is much larger than hitherto appreciated. In particular, the ratio of substitution to indel rates between human and mouse appears to be around gamma = 8, rather than the currently accepted value of about gamma = 14.

I knew the results before, from discussions with Gerton, but this is the first time I’ve actually read it.The paper concerns the biases in placing gaps in alignment algorithms (whether probabilistic or parsimony based) and how these will tend to underestimate the number of indels in the true alignment and thus the indel rate.

Gap errors

The problem with gaps is that it is almost always better to have a few extra substitutions compared to a few extra gaps, since indels are less frequent and so the occurrence of them are less likely. When maximising the likelihood of the alignment, we therefore tend to remove gaps that should be there (even unlikely events do occur from time to time) and instead adds substitutions that should be there.

Unbiased estimator

Using statistical alignment and posterior decoding Gerton derives another estimator for the indel rate and shows that this essentially removes the bias. The essential idea is that when the alignment is derived through the statistical alignment algorithm, areas where gaps are misplaced will have a lower posterior certainty. The optimal alignment that is derived is not significantly more likely than several others, so the posterior probability of that exact alignment is less than it would be if placement of the gaps was more certain.

The new estimator is the red line on the plot on the right. The blue is what you would get if you just trusted the most likely alignment. The green line you get by fitting the neutral indel model from his earlier paper Genome-Wide Identification of Human functional DNA Using a Neutral Indel Model Lunter, Ponting and Hein, Plos Computational Biology 2006.

The reason the bias only shows when the substitutation rate is rather high is, of course, that you are less likely to mistake non-homologous sequences as homologous when mis-placing a gap if you have a low sequence identity on the true alignment compared to when you have a high sequence identity, i.e. when you have a low substitution rate.

The citation, for Research Blogging:
Lunter, G. (2007). Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes. Bioinformatics, 23(13), i289-i296. DOI: 10.1093/bioinformatics/btm185


Yesterday I got my new iMac. This is my first Mac, and so far I am pretty impressed. The Mac OS X Leopard is really cool. Of course, so far I have only played with the machine, so I don’t know how it is for actual work, but I am looking forward to trying that out.

I decided to get a Mac partly because Storm is speaking so highly of them, and because I want to port my software to OS X. I already have a Windows machine, and I haven’t ported my software to that, something that would be a lot more useful, so I am not sure how real that argument is. I don’t know how to develop software on Windows. Since OS X is essentially a UNIX system I think I should be able to develop my software there, though. At BiRC there’s a few people working with Macs to help me, anyway, so that will help.