Some places, not everywhere…

Just to avoid confusion, if you read this, it doesn’t imply this.

We are suggesting that humans and orangutans are closer related than either to the chimpanzees in ~0.5% of the genome (and chimpanzees and orangutans are closer related than either to humans in another ~0.5%). That has to do with incomplete lineage sorting, and does not, in any way, imply that we as a species are closer related to orangutans than to chimpanzees.

Oh, and it doesn’t really mean that orangutan is our closest living relative in ~0.5% of the genome either. It is just closer than chimpanzee, but the gorilla could be closer related to us in those positions, so…

More orangutan news

Some more news coverage of the orangutan genome paper:

The paper mentioned at the bottom of the last one is this one, and there’s a press release for that one as well:

The orangutan genome is out (probably)

I was just told over email that the orangutan genome paper is out at Nature. Right now, I cannot connect to Nature, though, so I cannot really tell.

Anyway, I found posts about it here and here.

We’ve been involved in the analysis here in Aarhus, applying our CoalHMM methods, and we will have two companion papers out. The first in Genome Research – any minute now, really, it is supposed to come out today – and the second in PLoS Genetics – not sure when, it is in the pipeline but I haven’t received a release date yet.

We’ve received a lot of questions to the Genome Research paper the last couple of days, and I’m busy answering emails right now, but I’ll be back and commenting on it here as soon as I have the time.

Update: Ah, Nature is up again, and you can start reading here.

Call for help: Teaching statistics for Machine Learning

On Monday I start teaching my Machine Learning course again. I’m looking at the material for the first week right now, and I want to change it from last year.

Typically, my students will have had classes on mathematical modeling, a bit of probability theory and a bit of statistics, but experience tells me that they only have a very superficial knowledge about it. They don’t need much more for this class, but I still want to get some key points out regarding the statistics that we will be using in the class, and the last few years I don’t think I managed that well.

I don’t want to focus on modeling so much, and I certainly don’t want to discuss experiment design since the data we look at generally is just collected data that we need to make some kind of sense of, not collected to decide one theory against another.

It really is about a few points: Given the data and some generic model, say a neural network, why do we estimate the parameters in the way we do? What can we say about the accuracy of predictions? That kind of stuff.

I usually go a little bit into Bayesian statistics for model selection, but most of what they see in the class are different generic models that they estimate parameters for through maximum likelihood.

The thing is, while they generally remember how they estimate the parameters in different models when we get to the exam, they focus on the details of a particular model and rarely remember that they are essentially doing the same thing for all the models: maximizing a likelihood in a probabilistic model.

The first couple of years I taught this class, I definitely focused too much on the mathematical details in this. Going through derivations of the math, explaining how you got various posteriors from conjugate priors and such. Major fail.

I tried changing that last year, focusing more on examples, but it didn’t help much once we got to the exam.

Do any of you have experience with teaching statistics core concepts, preferably with some good examples? Care to share?

If you don’t teach this stuff, but have had classes like it, what worked for you as a student and what definitely didn’t work?