A Peer Review How-To

Robert S. Zucker of UC Berkeley wrote an excellent letter on the reviewing process. It is worth a read if you, as me, review a lot of papers.

The second mistake often made by reviewers is failing to consider all of the journal’s goals and requirements, including standards and guidelines stated in the editorial policy and gleaned from its articles. Do not reject a manuscript simply because its ideas are not original, if it offers the first strong evidence for an old but important idea. Do not reject a paper with a brilliant new idea simply because the evidence was not as comprehensive as could be imagined. Do not reject a paper simply because it is not of the highest significance, if it is beautifully executed and offers fresh ideas with strong evidence. Seek a balance among criteria in making a recommendation.

Mathematical modeling in systems biology

The next term is approaching, and this term I teach the course Mathematical Modeling in Systems Biology.

From the course description:

Biological systems such as cells, regulatory gene networks and protein interaction complexes cannot be understood from reflections on the individual components (genes, mRNA, proteins etc) alone, but must be understood through considerations involving all components at the same time. Naturally, that ̈places heavy demands on the way we perceive the system. Systems biology is concerned with modelling the dynamics of biological systems at a “systems level”, i.e. by considering the interactions of all the components of a system rather than the isolated properties of
the components. This course will present mathematical techniques for modelling dynamic systems in this context, with the main focus on stochastic modelling and computer simulation techniques for analysing dynamical systems.

After this course the participants will have insight into techniques for modelling the dynamics of biological systems, including the distinction between a system and its components, and how the performance of the system depends on more than the individual components alone. The method of work at the course will also train the participants to plan and complete projects and to present and communicate professional problems.

For the course homepage I’m using BiRC’s skeletonz system, ’cause I’m fed up with the university’s AULA system.

Software roadmap

I’ve put a roadmap for our association mapping software up on my BiRC homepage. It is a bit of a mix of my old homepage design and some php to synchronize it with our bug database. I really don’t know php, so I’m not sure it is an ideal design. It is only php because we use Mantis for our bug database. I really don’t want to write my own bug database, so that is the way it is.

On Recombination Induced Multiple and Simultaneous Coalescent Events

ResearchBlogging.org

We just published a new paper. The paper concerns a problem that Jotun’s been working on since he and Carsten Wiuf published some results on the distribution of ancestral material of a present day sample back in time. Jo Davies worked on it as a summer student project years back, and last year we returned to the problem when Frank Simancik did a summer student project.

On Recombination Induced Multiple and Simultaneous Coalescent Events

J. Davies, F. Simancik, R. Lyngsø, T. Mailund, and J. Hein

Genetics 177: 2151–2160 (2007). doi:10.1534/genetics.107.071126

Abstract: Coalescent Theory is almost ubiquitous in contemporary molecular population genetics. Inherent in most applications is a continuous time approximation that assumes sample size is small relative to the actual population size. This assumption in effect precludes simultaneous and multiple coalescent events, which can constitute an arbitrarily large component when sample size is sufficiently large. In most situations this is justifiably ignored as a large sample size will only have few ancestors a couple of generations back and then the assumption is valid. However, in tracing the evolutionary history of large chromosomal segments, a large recombination rate will consistently keep the number of ancestors large such that multiple and simultaneous coalescent events cannot be ignored. This can create a major disparity between discrete time and continuous time models and we here show its importance illustrated with parameters typical of the human genome. The presence of gene convergence only aggravates its importance. This could seriously undermine the application of coalescent theory to complete genomes. However, it can be shown that multiple and simultaneous coalescent events influences global quantities, such as total number of ancestors, but has negligible effect on local quantities, such as linkage disequilibrium or similarities of close local trees. Reassuringly the majority of applications of coalescent models with recombination are based on local quantities for purposes such as association mapping.

What is the problem?

If you sample DNA from present day individuals and then trace its history back in time you will see coalescent events and recombination events. Coalescent events occur when two lines, as we trace them back in time, join (or coalesce). This correspond to, when considered moving forward in time, a cell divides to eventually produce two siblings who are both ancestors of individuals in our present day sample. Recombination events occur when a single line, moving back in time, split into two. Considered forward in time, this correspond to two lines combining in a chromosomal recombination.

Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory

This process can be modelled mathematically, and the theory for this is called coalescent theory. A nice introduction can be found in the book by Jotun, Mikkel and Carsten: Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory ISBN 978-0198529965.

The mathematical proces is, of course, an approximation to the real process. The real process is probably too complex to model mathematically, and if it is possible to model it, the mathematics would be too complex to give any insight to the process in any case.

However, on approximation made in the mathematical model is potentially problematic. The model assumes that coalescent events and recombination events occur so rarely, on the time scale considered, that two essentially never occur at the same time.

For a small sample in a large population, this assumption is justified. The probability of multiple events at the same time is essentially zero. When the sample size is on the same order as the population size, however, the assumption is no longer valid.

This hasn’t really been a major issue, since even for large samples, the time it takes for a large sample to coalesce into a small sample is very short compared to the time it takes for the entire process to run.

That is, if we ignore recombination events!

Recombination events produce new lines, as we move back in time, just as coalescent events remove lines. If we consider ancestral material, which is the DNA we sampled at present day, the coalescent events will eventually win and reduce the material such that each nucleotide is only found in a single line. If we also consider non-ancestral material, DNA that belonged to an ancestor of our sample but that did not get passed on to the present day sample, then we reach an equilibrium between coalescent events and recombination events that keeps several lines moving back in time.

It is this situation that Carsten and Jotun considered in their paper

The ancestry of a sample of sequences subject to recombination [pdf]

C. Wiuf and J. Hein

Genetics 151: 1217-1228 (1990).

and as it turns out, it is possible for the number of lines to remain large, compared to the population size, if only the recombination rate is sufficiently high. In fact, the number of lines can be larger than the population size!

This sounds like a major problem with the theory, but it isn’t really. It is just applying the theory to a part of parameter space where essential assumptions are no longer valid.

If we consider single genes, as the theory intended, the recombination rate is low and there is no problem with the theory. If we start considering entire chromosomes, however, we enter the parameter space where the theory breaks down!

What is the result?

We considered this problem and simulated the process both when allowing multiple events (using a simpler, but computationally shower, method) and when assuming that they do not occur.

LD table

Number of lineages back in time

The model that allows multiple events changes the equilibrium behaviour of the system. The number of lines, as we trace them back in time, changes, and we no longer end up in the strange situation of having more lines than individuals in the population.

Local properties, however, such as the phylogenies at individual nucleotides and the linkeage disequilibrium (statistical relatedness of nucleotides), are not affected by allowing multiple events. This is the good news. It means that the models we have used when developing association mapping tools are just as valid as they have allways been.

Happy holidays everyone

I just got back from Christmas celebrations with my family. I didn’t bring my laptop this year, and it was great getting away from work for a couple of days. Sure, I brought a few books, but reading up on numerical methods for ODEs can hardly be called real work — it belongs in the relaxation category.

In any case, it is very different from the last two Christmases, where I’ve had to prepare tutorials for PSB. Going to Hawaii just after New Year is great and all, but it sort of ruins the holiday that I have to work through it.

Anyway, now I am back in Aarhus and will head off to the office in a little while. I’ll only work a few hours, though, and not too seriously. I have a few pet projects that I haven’t had time to look at before now. The days between Christmas and New Year’s Eve is perfect for those.