05 Jan

Admixture thoughts

Ok, this is just admitting that I have been rather stupid in my ways of thinking until recently. Just sharing this so you don’t do the mistakes that I have been making.

For a little while I was working on a coalescent model for admixture where I was thinking in terms of tracing lineages back in an admixed population until those lineages moved into source populations where they could then coalesce with lineages there.

In my mind, the scenario looked like this


It is a good model. Easy to model and easy to reason about. It is just very likely to be wrong.

If you have three populations, where one is admixed between two of them, how likely is it that the admixed population directly obtained genes from those two? Not bloody likely is what it is.

Much more likely is a scenario like this


The “source” populations are not directly the source populations; they are merely related to the source populations. And they are not necessarily equally related to them.

If you trace back the lineages from population C to the admixture time, you won’t be able to coalesce with lineages from A or B at that time. You can coalesce further back in time, when the admixed populations merge with population A and B — and that can happen a lot further back in time and at very different time points for A and B.

It isn’t that much harder to model it. You need to model that lineages from population C at some point gets separated into two different populations that now cannot coalesce, then have to wait a bit further before they can coalesce with lineages from A or B, but it isn’t that hard to model.

I just didn’t think about that, and now I feel really stupid.

Realising the full complexity of what you have to work with makes it all a lot more interesting, though.

19 Dec

Back up again

Well, I guess I’m back…

When I wrote the post on admixture proportions the other day I got back to this blog after having neglected it for a very long time. The wordpress dashboard was lighted up with necessary update and half of them required an update of the underlying software, such as MySQL and PHP.

I couldn’t do that myself so I asked to get it updated on the hosting server, which I got, but in the process the site moved server, so it has been offline a bit. First to fix some software issue and then also because it takes a little while for the various DNS servers to update their cache.

Anyway, from where I’m sitting now, the site is up and running again.

17 Dec

Estimating admixture proportions

I am not entirely sure about this, but something seems wrong to me in a number of papers I have read recently.

A couple of them I even reviewed before they were published so if I am right in my suspicion I am partly responsible.

Anyway, it has to do with estimating the admixture proportions when one population, let’s call it X, is admixed between two other populations, A and B, say. Rather, two populations A’ and B’, A’ closely related to A and B’ closely related to B, admixed to create the population X’ ancestral to X. X’ was created with a proportion of α from A’ and β=1-α from B’.

We want to estimate α.

In Durand et al. (2011) we get a test for this. It is based on counting ABBA-BABA patterns — essentially the D statistics without normalisation — and comparing these for two selected quartets of populations. They call it the f^ estimator and it is described around equation (7) and (8).

First there is one version where — in terms of the populations I described above — you compare the quartet (A, X, B and O) with (A1, A2, B, O) with two samples from A. The idea here is, as far as I understand, that A2 must be completely “A” so we see a contrast to how much X is compared to someone who is completly A.

There is nothing wrong with that, but it isn’t an estimate of the admixture proportions. It doesn’t take into account that “A-ness” has evolved since the admixture time — potentially for a long time if that event is far back in time — so we are seeing both the admixture and that evolution.

The second version takes another sequence related to A but that branched off before the admixture event. If we use that version we can actually get an estimate of the admixture proportions.

I will shortly explain how, but just mention that the thing that worries me is that I see the first case being used to estimate the proportions with (generally) acknowleding that it isn’t what it is doing; worse if you compare two populations to figure out how admixed they are and you ignore this problem, how do you know that it is the admixture proportions you are measuring and not the drift after that admixture event?

Okay, to the estimator.

I find it easier to think in terms of the f4 statistics from Patterson et al. (2012). In general the way of thinking about drift evolving along admixture graphs I find extremely elegant and easy to reason about, at least compared to counts of site patterns.

The f4 statistics — which is essentially the D statistics so very similar to the Durand ABBA-BABA counts — captures the overlap between the “drift flow” between two pairs of populations. f4(A,B;C,O) for example is the drift on the overlap of the path from A to B and from C to O. That is the overlap between the blue and the green line, or the drift on edge x. f4(A,B;C,O) = f4(C,O;A,B) = x

When there is admixture, the drift from one population to another takes more than one path, so for example the drift from X to B takes two different routes, one over the edge close to A, with probability alpha, and one over the edge close to B, with probability beta. For f4(C,O;X,B) we therefore again have the only overlap on edge x but we only take that path with probality alpha (the path we take with probability beta doesn’t overlap the path from C to O so it doesn’t get counted). f4(C,O;X,B) = αx.

Since f4(C,O;A,B) = x and f4(C,O;X,B) = αx we can estimate α as f4(C,O;X,B)/f4(C,O;A,B). This is called the f4 ratio estimator in Patterson et al. and is essentially the same as the second f^ estimator from Durand et al.

When the admixture event — or at least the branching off of the population that will admix — is ancestral to both A and C we have a different topology so the ratio is not equal to alpha. f4(C,O;A,B) = x + y so now we have f4(C,O;X,B)/f4(C,O;A,B) = αx / (x + y).

It is a lower bound for alpha, but how much below alpha you get depends on the length of branch y.

Unless I am misunderstanding the f^ statistics, and it is very different from the f4 ratio estimator, I think I am seeing several papers estimating alpha using the second topology. All those estimates are then too low.

Or am I missing something?

Durand, E.Y. et al., 2011. Testing for ancient admixture between closely related populations. Molecular Biology and Evolution, 28(8), pp.2239–2252.

Patterson, N. et al., 2012. Ancient admixture in human history. Genetics, 192(3), pp.1065–1093.

31 Oct

Workflow for ReadCube and Papers3

I am switching between ReadCube and Papers a lot. I like both tools (although I have had my share of problems with both as well), but they have different strengths and I want a combination.

ReadCube is really great for finding papers. Their enhanced PDFs makes it very easy to get to cited papers and they are really good at finding papers that have cited what you are reading. So for reading through the literature it is my favourite. It sucks when you have to cite papers, though. I have tried to use its citation tool but it only works with Word and you are screwed if you want to sort the cited references alphabetically.

Papers is better for citing, especially when you want to use BibTeX. It works well with most editors and it is very easy to export your library to a BibTex file. It isn’t automated, which is a pity, but it is reasonably good.

So what I really want is to use ReadCube when collecting my papers and then use Papers when citing them.

Here’s a little trick for automatically importing papers to Papers when you import them into ReadCube.

You can tell Papers to watch for PDFs in a folder and automatically import them. So if you go to preferences you can tell it to watch the ReadCube files. It should be in your Documents folder and be called something like ReadCube Media.

This won’t import the papers there, though, it just makes a folder where, if you add the PDFs there it will import them.

To automatically add new papers you go to Automator and make a Folder Action that simply copies new files into this folder.

Now, when you import a file in ReadCube you will also get it added to Papers.

03 Mar

Discusing papers during the review process…

For the last two-three years I’ve been signing my reviews. I find that I write better reviews when I’m not anonymous.

It shouldn’t be like that, but it is. I’m less likely to get lazy if I know that people will see who wrote the review.

It creates a dilemma, though: should I discuss manuscripts with authors before they are published? I am likely to run into them at meetings, and it is hard not to talk about the manuscripts there. If it is authors I’m frequently discussing with online the dilemma is there as well.

As a reviewer, you have two tasks: making sure that the science is solid and improving the presentation of the results (the manuscript). The second part is probably easier to do with a back-and-forth discussion, but the thing is, as a reviewer you are not really working for the authors but for the editor.

It is the editor who ultimately has to make a decision on the manuscript, and it is him or her you are assisting. This is why you really shouldn’t write your recommendations for acceptance or rejection in the review but only tell the editor. The editor needs to know what your concerns are and what the authors are doing to address them.

On the other hand, the paper is moving faster forward when you don’t have to wait weeks between each point and counter-point.

It gets even weirder when the manuscript is already out there on a preprint server, and you have already discussed it with the authors before you find yourself a reviewer of it, which has happened to me a couple of times recently.

How do you guys deal with these things?