I’ve been using Todoist for the last couple of weeks.

I got into it a bit late. At first, I didn’t much see the point of having an online TODO list, but as my work is changing more and more into managing projects rather than doing them myself, keeping track of things to do becomes more and more important, and since I’m working on several different computers at different locations now, having the todo list online is really nice.

Anyway, knowing Todoist from Amir, I tried it out, and within days I was addicted. It is a really nice web application. Simple but powerful interface. Very “google like” actually. The project grouping and powerful query language makes it very easy to manage the list, and there is practically no overhead in using it, compared to the “pen and paper” approach I was used to.

On Mac there is even a dashboard widget, so there Todoist integrates wonderfully with my desktop. When I get around to it ™ I’d love to program something similar to GNOME so I have the same integration with my Linux desktop.

I wrote to Amir today to give my praise for his application. He answered by making me premium member for life. Wonderful! I guess flattery can get you everywhere…

Probabillistic whole-genome alignments reveal high indel rates in the human and mouse genomes

Today, while preparing for a thesis meeting with Ricky, I read Gerton’s paper

Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes

G. A. Lunter

Bioinformatics 2007; DOI: 10.1093/bioinformatics/btm185


Motivation: The two mutation processes that have the largest impact on genome evolution at small scales are substitutions, and sequence insertions and deletions (indels). While the former have been studied extensively, indels have received less attention, and in particular, the problem of inferring indel rates between pairs of divergent sequence remains unsolved. Here, I describe a novel and accurate method for estimating neutral indel rates between divergent pairs of genomes.

Results: Simulations suggest that new method for estimating indel rates is accurate to within 2%, at divergences corresponding to that of human and mouse. Applying the method to these species, I show that indel rates are up to twice higher than is apparent from alignments, and depend strongly on the local G + C content. These results indicate that at these evolutionary distances, the contribution of indels to sequence divergence is much larger than hitherto appreciated. In particular, the ratio of substitution to indel rates between human and mouse appears to be around gamma = 8, rather than the currently accepted value of about gamma = 14.

I knew the results before, from discussions with Gerton, but this is the first time I’ve actually read it.The paper concerns the biases in placing gaps in alignment algorithms (whether probabilistic or parsimony based) and how these will tend to underestimate the number of indels in the true alignment and thus the indel rate.

Gap errors

The problem with gaps is that it is almost always better to have a few extra substitutions compared to a few extra gaps, since indels are less frequent and so the occurrence of them are less likely. When maximising the likelihood of the alignment, we therefore tend to remove gaps that should be there (even unlikely events do occur from time to time) and instead adds substitutions that should be there.

Unbiased estimator

Using statistical alignment and posterior decoding Gerton derives another estimator for the indel rate and shows that this essentially removes the bias. The essential idea is that when the alignment is derived through the statistical alignment algorithm, areas where gaps are misplaced will have a lower posterior certainty. The optimal alignment that is derived is not significantly more likely than several others, so the posterior probability of that exact alignment is less than it would be if placement of the gaps was more certain.

The new estimator is the red line on the plot on the right. The blue is what you would get if you just trusted the most likely alignment. The green line you get by fitting the neutral indel model from his earlier paper Genome-Wide Identification of Human functional DNA Using a Neutral Indel Model Lunter, Ponting and Hein, Plos Computational Biology 2006.

The reason the bias only shows when the substitutation rate is rather high is, of course, that you are less likely to mistake non-homologous sequences as homologous when mis-placing a gap if you have a low sequence identity on the true alignment compared to when you have a high sequence identity, i.e. when you have a low substitution rate.

The citation, for Research Blogging:
Lunter, G. (2007). Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes. Bioinformatics, 23(13), i289-i296. DOI: 10.1093/bioinformatics/btm185


Yesterday I got my new iMac. This is my first Mac, and so far I am pretty impressed. The Mac OS X Leopard is really cool. Of course, so far I have only played with the machine, so I don’t know how it is for actual work, but I am looking forward to trying that out.

I decided to get a Mac partly because Storm is speaking so highly of them, and because I want to port my software to OS X. I already have a Windows machine, and I haven’t ported my software to that, something that would be a lot more useful, so I am not sure how real that argument is. I don’t know how to develop software on Windows. Since OS X is essentially a UNIX system I think I should be able to develop my software there, though. At BiRC there’s a few people working with Macs to help me, anyway, so that will help.

A Peer Review How-To

Robert S. Zucker of UC Berkeley wrote an excellent letter on the reviewing process. It is worth a read if you, as me, review a lot of papers.

The second mistake often made by reviewers is failing to consider all of the journal’s goals and requirements, including standards and guidelines stated in the editorial policy and gleaned from its articles. Do not reject a manuscript simply because its ideas are not original, if it offers the first strong evidence for an old but important idea. Do not reject a paper with a brilliant new idea simply because the evidence was not as comprehensive as could be imagined. Do not reject a paper simply because it is not of the highest significance, if it is beautifully executed and offers fresh ideas with strong evidence. Seek a balance among criteria in making a recommendation.

Mathematical modeling in systems biology

The next term is approaching, and this term I teach the course Mathematical Modeling in Systems Biology.

From the course description:

Biological systems such as cells, regulatory gene networks and protein interaction complexes cannot be understood from reflections on the individual components (genes, mRNA, proteins etc) alone, but must be understood through considerations involving all components at the same time. Naturally, that ̈places heavy demands on the way we perceive the system. Systems biology is concerned with modelling the dynamics of biological systems at a “systems level”, i.e. by considering the interactions of all the components of a system rather than the isolated properties of
the components. This course will present mathematical techniques for modelling dynamic systems in this context, with the main focus on stochastic modelling and computer simulation techniques for analysing dynamical systems.

After this course the participants will have insight into techniques for modelling the dynamics of biological systems, including the distinction between a system and its components, and how the performance of the system depends on more than the individual components alone. The method of work at the course will also train the participants to plan and complete projects and to present and communicate professional problems.

For the course homepage I’m using BiRC’s skeletonz system, ’cause I’m fed up with the university’s AULA system.