I just got back from Christmas celebrations with my family. I didn’t bring my laptop this year, and it was great getting away from work for a couple of days. Sure, I brought a few books, but reading up on numerical methods for ODEs can hardly be called real work — it belongs in the relaxation category.
In any case, it is very different from the last two Christmases, where I’ve had to prepare tutorials for PSB. Going to Hawaii just after New Year is great and all, but it sort of ruins the holiday that I have to work through it.
Anyway, now I am back in Aarhus and will head off to the office in a little while. I’ll only work a few hours, though, and not too seriously. I have a few pet projects that I haven’t had time to look at before now. The days between Christmas and New Year’s Eve is perfect for those.
Then, I started thinking about my own introduction to statistics. I had the mandatory classes on probability and statistics while doing my comp. science degree and pretty much hated the stats. part of it (less so the probability, ’cause I have a soft spot for pure math). It wasn’t until I really needed to know stats. for my own work I started getting into it, and then I found that it was actually pretty easy. I guess most things get a lot easier once you are motivated for it…
I found these interesting not least because he refers to a paper that we published earlier this year:
Hobolth A, Christensen OF, Mailund T, Schierup MH. 2007. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet 3:e7. doi:10.1371/journal.pgen.0030007
That paper was mainly on a new statistical method for analysing speciation. A method that combined comparative genomics with population genetics through a model that joined hidden Markov models with coalescence theory. Of course, that is not really what caught people’s attention. What we did in the paper was to apply our new method on data from human, gorrilla, chimp and orangutan, and one result that came out of that was a very recent split between human and chimp; a split only 4.1 million years old.
We get a very resent speciation split between human and apes exactly because of the combined population genetics and genomics. If we only look at the genomic sequences, the distance between these will necessarily be larger than the distance between the species — it takes a while from the time a piece of DNA is in the same individual until it is two different individuals in separate species — and our method is able to estimate the speciation split from the genome split.
I’m not sure how well I am explaining this here. I gave a (not too technical) talk in the computer science department some months ago, maybe that explains it better:
(sorry about the quality of the slides here, it looks like slideshare messed up the fonts)
A few other studies of genomic data before our own also reported more recent speciation times of human and chimp than previously believed — moving the time from about 6-8 million years ago down to maybe 4-5 million years ago — so a recent divergence between human and chimp might not be too far fetch after all, but still, I think our estimate is a bit too recent.
This is also what John Hawks writes.
Why do we get such a recent divergence, then?
It is hard to say. The 4.1 million years is what comes out of applying our method on the (admittedly small) data we had. It is a very new method, however. There is a lot we do not take into account in it and there might be biases in it we haven’t fully understood yet.
We are currently working on improving the method and once we get more data — the orangutan has already been sequenced and is now being assemblied and the gorilla genome is in the process of being sequenced — we will redo our analysis. It will be interesting to see how that turns out.
Tomorrow I’m teaching string algorithms covering approximate pattern matching and the Wu-Manber algorithm.
I’m actually also teaching genome analysis but Mikkel is giving the lecture tomorrow, so I don’t have to worry about that.
The good thing about string algorithms is that I have taught it several times before, so there is very little preparation time. I probably ought to spend some more time for it this time, ’cause I don’t particularly find approximate pattern matching that intersting in this class (it is more interesting in algorithms in bioinformatics, the class that Storm is teaching) so I wanted to replace it with something else this year, but didn’t find the time.
Just for the fun of it, I’ve started using Slideshare to publish my presentations. I also put the slides on the course homepage, of course, but with Slideshare I can put the presentations directly on the web like this:
Now isn’t that cool?
Whether the slides make sense without someone presenting them, I don’t know. In some sense I hope not, because then I am really wasting my time giving the lectures…