Alignment bias in genomics

I have previously written a bit about how optimal alignment algorithms introduce an alignment bias and even done some work on it myself (currently submitted for publication, so I cannot link to it yet). Today I saw a paper in the current issue of Science addressing the same problem.

A summary can be found in

Lining Up to Avoid Bias

Antonis Rokas

Science Vol. 319. no. 5862, pp. 416 – 417

and the full paper (probably requires a subscription) is

Alignment Uncertainty and Genomic Analysis

Karen M. Wong, Marc A. Suchard, and John P. Huelsenbech

Science Vol. 319. no. 5862, pp. 473 – 476

The problem with alignments

I’ve already described the problem in the previous post, where I used the examples from Gerton Lunter’s paper

Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes

G. A. Lunter

Bioinformatics 2007; DOI: 10.1093/bioinformatics/btm185

although there the focus was on the problems with indels. Of course, without indels there simply isn’t any problem with alignment, so that is not as unreasonable as it might sound.

Essentially, the problem is that we use algorithms to infer optimal alignments and then treat these alignments as absolute truth, ignoring the uncertainty in the inference.

In Wong et al. they compare seven different alignment algorithms and consider typical evolutionary analysis — inference of phylogenies and detecting selection — based on the inferred alignments, and see a large variability of analysis result dependent on inference method.

The solution proposed in Wong et al. is the same as Gerton proposes: statistical alignmentet methods. Quoting Wong et al.:

The problem of alignment uncertainty in genomic studies, identified here, is not a problem of sloppy analysis. Many comparative genomics studies are carefully performed and reasonable in design. However, even carefully designed and carried out analyses can suffer from these types of problems because the methods used in the analysis of the genomic data do not properly accommodate alignment uncertainty in the first place.

In a comparative genomics study, we advocate that alignment be treated as a random variable, and inferences of parameters of interest to the genomicist, such as the amount of nonsynonymous divergence or the phylogeny, consider the different possible alignments in proportion to their probability.

Of course, this is what the statistical alignment people in Oxford have been trying for years and it is not quite as easy as it sounds.

Citations, for Research Blogging:Rokas, A. (2008). GENOMICS: Lining Up to Avoid Bias. Science, 319(5862), 416-417. DOI: 10.1126/science.1153156Wong, K.M., Suchard, M.A., Huelsenbeck, J.P. (2008). Alignment Uncertainty and Genomic Analysis. Science, 319(5862), 473-476. DOI: 10.1126/science.1151532

Petri Nets and Systems Biology

I did my PhD in the Coloured Petri Nets group here in Aarhus, but since I finished my PhD and changed my research field to bioinformatics I haven’t touched Petri nets. Now, that I’m stating to get interested in systems biology, I seem to run into them again and again.

A lot of people seem interested in modelling biological systems in various types of Petri nets. I sort of see why. Petri nets have been used in modelling a wide variety of dynamic systems, so why not apply them to biological systems as well?

The papers I’ve read have left me a bit disappointed, though.

Most of the papers I’ve read seem to just add extensions to Petri nets for the sake of adding the extensions (or as excuse to get a paper published, you pick). I won’t blame Petri nets nor systems biology for this, though. I’ve seen this in every single formalism I’ve read up on. It is a kind of feature creep that we computer scientists just cannot seem to avoid. Whenever we see an ever so tiny potential problem with a computer language, we immediately find a way to fix it and rarely do we worry if it is worth the problem to fix or if what it is fixing is really that much of a problem in the first place. For some reason, we just cannot keep things simple.

Anyway, I’m going to ignore this particular problem in this post and instead ask, what do Petri nets add to systems biology?

What do Petri nets add to systems biology?

Most papers I’ve read seem to just use Petri nets as a front-end for some other formalism. Some use Petri nets as a graphical way of specifying differential equations or some use (stochastic) Petri nets just as a front-end for Gillespie simulations.

If Petri nets are just used as a front-end for something else, then is that really the way to go? Sure, it is probably easier to get a feeling for a system by looking at a network than by looking at a set of coupled differential equations, but the lack of compositionality in Petri nets does mean that a lot of systems end up as “spaghetti networks”, so perhaps process algebra was a better approach here? The same goes for setting up stochastic simulations.

Don’t get me wrong, I do like Petri nets. I especially like that their graphical representation. I am just a bit disappointed that that is all they seem to bring to the table.

So far, the only paper I’ve seen that actually uses “good old” Petri net theory — p– and t-invariants, in this case — is the paper I read today (and incidentally the paper that got me thinking about all of this):

Petri net-based method for the analysis of the dynamics of signal propagation in signaling pathways

Simon Hardy and Pierre N. Robillard

Bioinformatics Advance Access published online on November 22, 2007

and even that paper seems to me to basically be modelling using differential equations. I might be wrong here, though, I haven’t read it that thoroughly yet. They do extract some signalling information from simulations and I didn’t quite get to which degree they need the net structure (as opposed to just the set of ODEs) to extract that.

Am I reading the wrong papers, or just missing the point here? If you know of any papers I really ought to read to get the point of using Petri nets in systems biology, then please let me know!

Practising what I preach?

Now after reading through all this it might surprise you that I will use stochastic Petri nets in the systems biology class I teach with Casten Wiuf this term.

It is not so much because of the nets, though. We want to use stochastic processes in the class and compare them with differential equation modelling to contrast deterministic (“large number of molecules”) models. The text book we use

Stochastic modelling for systems biology

Stochastic modelling for systems biology

Darren J. Wilkinson

Chapman & Hall/CRC, 2006.

uses stochastic Petri nets, and that made the choice for us.

But is it the right choice? Would I actually use Petri nets myself if I had to model a biological system?

Honestly, I do not know. I am very familiar with nets from my PhD work, but not in the context of systems biology. I wouldn’t know the right tools to use. I could easily end up programming simulators or numerical analysis methods myself, and then I am not sure I would gain much from starting out with nets.

I guess I really need to read up on Petri nets in systems biology… but where should I start?


According to this, geologists have suggested a new geological epoch (from about 200 years ago to the present): the epoch where humans have caused major changes to Earth’s geology.I knew we had a major effect on the biosphere, but are we having a major impact on the geology? Maybe we are…

Earth has been subject to the same kinds of physical forces–wind, waves, sunlight–throughout the planet’s existence. But life has been much more varied in its impact. The appearance of oxygen-producing photogenesis, the rise of land plants, and many other evolutionary events have shaped the planet in dramatic ways. And now–humans. In the past 200 years, ever since the human population reached 1 billion, the use of fossil fuels, the growth of metropolises, and other influences have begun to affect the stratigraphic process, altering the physical and chemical nature of ocean sediments, ice cores, and surface deposits.

Go figure.