<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mailund on the Internet &#187; Paper reviews</title>
	<atom:link href="http://www.mailund.dk/index.php/category/work/paper-reviews/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mailund.dk</link>
	<description>Computer science, bioinformatics, genetics, and everything in between</description>
	<lastBuildDate>Sun, 15 May 2011 11:24:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>CoalHMM analysis of the human/chimpanzee ancestor, based on the orangutan genome</title>
		<link>http://www.mailund.dk/index.php/2011/02/03/coalhmm-analysis-of-the-humanchimpanzee-ancestor-based-on-the-orangutan-genome/</link>
		<comments>http://www.mailund.dk/index.php/2011/02/03/coalhmm-analysis-of-the-humanchimpanzee-ancestor-based-on-the-orangutan-genome/#comments</comments>
		<pubDate>Thu, 03 Feb 2011 16:40:07 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Paper reviews]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2236</guid>
		<description><![CDATA[I&#8217;ve been wanting to write about our paper on the orangutan genome for a while, but I&#8217;ve just been too busy so far, so a little late I finally get to it. Besides the Nature paper, where we contributed to the analysis of the two sub-species of orangutans, we have two companion papers. One is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.mailund.dk/wp-content/uploads/2011/02/Screen-shot-2011-02-03-at-7.10.21-PM.png"><img class="size-thumbnail wp-image-2245 alignright" style="margin: 5px;" title="Nature front cover" src="http://www.mailund.dk/wp-content/uploads/2011/02/Screen-shot-2011-02-03-at-7.10.21-PM-150x150.png" alt="" width="150" height="150" /></a>I&#8217;ve been wanting to write about our paper on the orangutan genome for a while, but I&#8217;ve just been too busy so far, so a little late I finally get to it.</p>
<p>Besides the Nature paper, where we contributed to the analysis of the two sub-species of orangutans, we have two companion papers. One is already out in &#8220;early access&#8221; at Genome Research and the other will be out later in PLoS Genetics. Since the latter paper is not out yet, this post will be about the Genome Research paper.</p>
<h2>Coalescent in an isolation model</h2>
<p>Since all our work is based on coalescent theory and in particular CoalHMMs, I&#8217;ll start there.</p>
<p>Imagine we have two species, and we sample a gene in each. We can then ask, what is the divergence between the two genes? This divergence will be determined by 1) the divergence of the two species, let&#8217;s call that <em>T</em>, and 2) the coalescence time between the two genes within the ancestral species, let&#8217;s call that <em>C</em>.</p>
<p>The species divergence we assume is fixed for all genes, so while it is unknown it is not a stochastic variable. The coalescence time, however, is stochastic, and from <a href="http://en.wikipedia.org/wiki/Coalescent_theory">coalescence theory</a> we expect it to be <a href="http://en.wikipedia.org/wiki/Exponential_distribution">exponentially distributed</a> with a rate determined by the <a href="http://en.wikipedia.org/wiki/Effective_population_size">effective population size</a> in the ancestral species.</p>
<p>We call this setup an <em>isolation model</em>, and we will use the distribution of divergence times to make inference about the speciation time and the effective population size in the ancestral species.</p>
<p>The figure below illustrates the setup.</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2011/02/IM-coalescence.png"><img class="aligncenter size-medium wp-image-2239" title="Isolation model" src="http://www.mailund.dk/wp-content/uploads/2011/02/IM-coalescence-300x273.png" alt="" width="300" height="273" /></a>If <em>C</em> is exponentially distributed, and the divergence is given by <em>D=C+T</em>, then we can make inference about both parameters as follows: We sample a number of independent genes and get their divergence time. For the exponential distribution, the mean is equal to the standard deviation, so looking at the standard deviation of the divergences we can get the parameter for the exponential distribution. That gives us the mean value of <em>C</em>, and if we then look at <em>D-</em>E[<em>C</em>] we get an estimate for <em>T.</em></p>
<p>Below is an example of this, where I&#8217;ve estimated the coalescence rate and divergence time from 50 divergence samples.</p>
<p style="text-align: left;"><a href="http://www.mailund.dk/wp-content/uploads/2011/02/Estimates-in-the-isolation-model.png"><img class="aligncenter size-medium wp-image-2241" title="Estimates" src="http://www.mailund.dk/wp-content/uploads/2011/02/Estimates-in-the-isolation-model-300x300.png" alt="" width="300" height="300" /></a></p>
<h2>Complications</h2>
<p style="text-align: left;">This is all very simple, but there are a few problems.</p>
<p style="text-align: left;">First, you don&#8217;t really get independent samples of the divergence time between two species. If you sample <em>n</em> individuals from the first species and <em>m</em> from the second, the <em>n</em> in the first species will all have found a common ancestor before that lineage reach the ancestral species, and the same goes for the <em>m</em> samples in the other species. So no matter how many individuals you look at, you end up with a sample of two in the ancestral species. I&#8217;ve written about this before <a href="http://www.mailund.dk/index.php/2009/02/12/on-gene-trees-and-species-trees/">here</a>.</p>
<p style="text-align: left;">It is not a show-stopper, though, since genes in different parts of the genome are close enough to independent. So if you sample different loci instead of different individuals, you get your independent samples. So while adding more individuals won&#8217;t help, having an entire genome to look at gives you plenty of samples.</p>
<p style="text-align: left;">The second problem is that we cannot actually get samples of the divergence time. You cannot look at two pieces of DNA and from that get their divergence. You need to estimate it. It isn&#8217;t really that hard, since you can get a good estimate from the number of differences between the two sequences. That is, if the entire alignment of sequences have the same divergence time.</p>
<p style="text-align: left;">If there is a recombination somewhere in the sequences, they do <em>not</em> have the same divergence time, and you cannot estimate the divergence.</p>
<p style="text-align: left;">You can get around this by looking at short DNA segments, where you expect few if any recombinations. You won&#8217;t get a good estimate of the divergence then, but you can maybe alleviate this by having a lot of genes (but estimating the coalescence rate based on a standard deviation that have contributions from both the coalescence process and the estimation problems is, well, problematic).</p>
<p style="text-align: left;">You&#8217;d also have to throw most of your data away if you are looking at short segments scattered along the genome (and you cannot have them too close to each other, because then they will no longer be independent).</p>
<h2>The CoalHMM approach</h2>
<p style="text-align: left;">The models we develop to deal with this are based on <a href="http://en.wikipedia.org/wiki/Hidden_Markov_model">hidden Markov models</a>.</p>
<p style="text-align: left;">Using these models, we can estimate the divergence time for single nucleotides. Normally you cannot, since they are either equal or difference, and that doesn&#8217;t tell you much about their divergence (is it zero for equal and infinity for different?). We can do this, because the flanking DNA contains information about this, whether recombinations have occurred or not, and we can capture this information through the Markov model.</p>
<p style="text-align: left;">It is a rough approximation to the coalescence process, but as far as we can tell, it works reasonably well.</p>
<p style="text-align: left;">We are getting pretty close to being able to estimate the distribution of divergence times using hidden Markov models, but the model we use is the one that will be published in PLoS Genetics soon and not the model we used in the Genome Research paper, so I&#8217;ll wait a bit with describing how that works.</p>
<p style="text-align: left;">The model we used in the Genome Research paper is the one described in <a href="http://www.mailund.dk/index.php/2009/09/22/new-coalhmm-paper-out/">this paper</a>.</p>
<p style="text-align: left;">In this model, we do not attempt to estimate the actual divergence times, but instead use something called <em>incomplete lineage sorting.</em> The idea here is, that if we have a third species closely related to the other two, then sometimes the two sister species have such deep divergence times, that one of them can end up being closer related to the third species than its sister species.</p>
<p style="text-align: left;"><a href="http://www.mailund.dk/wp-content/uploads/2011/02/ILS.png"><img class="aligncenter size-medium wp-image-2243" title="Incomplete lineage sorting" src="http://www.mailund.dk/wp-content/uploads/2011/02/ILS-300x251.png" alt="" width="300" height="251" /></a>This leaves a stronger signal in the DNA and is thus easier to model and make inference about.</p>
<p style="text-align: left;">The model based on this needs only four states: one state where the two sister species coalesce early, and three states with deep divergence. If the divergence is deep, the topology of relationships between the species should be uniform &#8212; each topology is seen with one third probability &#8212; and how often we see deep divergences is given by the two speciation times together with the effective population size of the ancestor of the sister species.</p>
<p style="text-align: left;">As we scan along a genome alignment, we can infer how often we see recent divergences and how often we see deep divergences, and how the deep divergences are distributed along the three topologies.</p>
<p style="text-align: left;">Below is a figure that Julien made for illustrating this.</p>
<p style="text-align: left;"><a href="http://www.mailund.dk/wp-content/uploads/2011/02/Screen-shot-2011-02-03-at-6.47.24-PM.png"><img class="aligncenter size-medium wp-image-2244" title="ILS along a genome alignment" src="http://www.mailund.dk/wp-content/uploads/2011/02/Screen-shot-2011-02-03-at-6.47.24-PM-300x169.png" alt="" width="300" height="169" /></a>With this model, you don&#8217;t extract as much information from the genomes as you would if you could estimate the divergence times, but with full genomes to work with, you have plenty of information to get good estimates.</p>
<p style="text-align: left;">You need three closely related species to work with, though.</p>
<h2>Incomplete lineage sorting patterns among human, chimpanzee and orangutan suggest recent orangutan speciation and widespread selection</h2>
<p>And now, finally, we get to the paper.</p>
<blockquote><p><strong><a href="http://genome.cshlp.org/content/early/2011/01/26/gr.114751.110.abstract">Incomplete lineage sorting patterns among human, chimpanzee and orangutan suggest recent orangutan speciation and widespread selectio</a>n</strong><br />
Asger Hobolth, Julien Y. Dutheil, John Hawks, Mikkel H. Schierup and Thomas Mailund</p>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p>We search the complete orangutan genome for regions where humans are more closely related to orangutans than to chimpanzees due to incomplete lineage sorting (ILS) in the ancestor of human and chimpanzees. The search uses our recently developed coalescent HMM framework. We find ILS present in ~1% of the genome, and that the ancestral species of human and chimpanzees never experienced a severe population bottleneck. The existence of ILS is validated with simulations, site pattern analysis, and analysis of rare genomic events. The existence of ILS allows us to disentangle the time of isolation of humans and orangutans (the speciation time) from the genetic divergence time, and we find speciation to be as recent as 9-13 mya (contingent on the calibration point). The analyses provide further support for a recent speciation of human and chimpanzee at ~4 mya and a diverse ancestor of human and chimpanzee with an effective population size of ~50,000 individuals. Posterior decoding infers ILS for each nucleotide in the genome and we use this to deduce patterns of selection in the ancestral species. We demonstrate the effect of background selection in the common ancestor of humans and chimpanzees. In agreement with predictions from population genetics, ILS found to be reduced in exons and gene dense regions when we control for confounding factors such as GC content and recombination rate. Finally, we find the broad scale recombination rate to be conserved through the complete ape phylogeny.</p></blockquote>
<p>In this paper we used humans, chimpanzees and orangutans.</p>
<p>The first question to ask is then, are these three species close enough that we see incomplete lineage sorting?</p>
<p>Without it, we don&#8217;t have the signal in the data that we need for the model.</p>
<p>Based on previous estimates of the species divergence times and ancestral effective population size of humans and chimpanzees we could work out that some was expected. So that is a good start. To make sure, though, we used some simpler approaches. We looked at indels to check if there would be signals in these supporting clustering of human and orangutan or chimp and orangutan and found that. We also looked at the distribution of alignment columns and again found some signals for alternative topologies of the three species. So with that checked, we applied the model.</p>
<p>From the model we estimate three things: 1) The speciation times for humans and chimps, and from the African apes and orangutan, 2) the effective population size of the ancestral species, and 3) in which regions of the genome humans and chimps, humans and orangutan, and chimp and orangutan are closest related.</p>
<p>I won&#8217;t say much about number two. The effective population size is a weird parameter that can be affected by so many things, that it is really hard to interpret, and right now we just don&#8217;t know what really is important, so I&#8217;d rather not make any claims (but I&#8217;ll say a few things about <em>local</em> effective population sizes towards the end of the post).</p>
<p>Number one is interesting because it tells us something about when humans diverged from the other two apes. Our estimates are measured in the number of substitutions since the divergence, but assuming a molecular clock and assuming we have a good estimate of the rate we can get an estimate in years.</p>
<p>Assuming a rate of around 1 substitution per nucleotide per billion years &#8212; an estimate based on several earlier papers that get this number from calibrations with the fossil record &#8212; we get a human/chimp speciation around 4-4.5 million years ago, and a human/orangutan speciation around 11-13 million years ago.</p>
<p>I really don&#8217;t know how reasonable this is, in relation to the fossil record, so this is when we got <a href="http://johnhawks.net/weblog">John Hawks</a> involved. I have my fingers crossed that he will blog about this at some point.</p>
<p>There are good reasons to be a bit skeptical, though. From recent studies, we know that the substitution rate is lower in humans today, and if that is also true in the past, the estimates should be moved further back in time. We cannot get too far back, though, without running into inconsistencies in the deeper past, but how this will all play out once we do more analysis I cannot say yet. It is something we look into for the gorilla genome (and I&#8217;ll just leave that as a cliff hanger for now, I&#8217;ll get back to it when we have published that genome).</p>
<p>For number three, I don&#8217;t really know. You might be surprised that we are sometimes closer related to the orangutan than the chimpanzee, or you might not. It depends on your prior assumptions, I guess.</p>
<p>We didn&#8217;t really find anything cool correlated to the patterns of relatedness, so we don&#8217;t have much of a story to tell about this.</p>
<h2>Ancestral selection</h2>
<p>The final thing we looked at in the paper was correlations between incomplete lineage sorting and gene density.</p>
<p>Why this is interesting gets a bit technical but has to do with the effective population size.  As I mentioned above, it is a bit of a weird parameter, but one that is affected by selection. If you have a <a href="http://en.wikipedia.org/wiki/Selective_sweep">selective sweep</a> the genetic diversity is reduced, and you see this as a reduction in the effective population size. The same effect is seen with<a href="http://en.wikipedia.org/wiki/Purifying_selection"> purifying selection</a>, where again the genetic diversity is reduced and so is the effective population size.</p>
<p>Incomplete lineage sorting is positively correlated with the effective population size, so if you observe a correlation between incomplete lineage sorting and gene density, it is a signal for selection.</p>
<p>We observe this, and take it as a signal that selection rather than just drift has been a major player in the evolution of our genome.</p>
<p>How much of a surprise this is depends on your prior assumptions again, I guess, but it does indicate that neutrality may not always be the obvious null model for genome analysis.</p>
<p>It is a pretty weak signal for this, though, in this analysis. We see so little incomplete lineage sorting for these three species that it is really hard to analyse it in detail.</p>
<p>When we get human, chimp and gorilla, there is a lot more incomplete lineage sorting, and we can do a lot more. We are seeing some cool signals there, but I&#8217;ll let that be the second cliff hanger for the gorilla genome paper.</p>
<p>&#8211;<br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Genome+Research&amp;rft_id=info%3Adoi%2F10.1101%2Fgr.114751.110&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Incomplete+lineage+sorting+patterns+among+human%2C+chimpanzee+and+orangutan+suggest+recent+orangutan+speciation+and+widespread+selection&amp;rft.issn=1088-9051&amp;rft.date=2011&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fgenome.cshlp.org%2Fcgi%2Fdoi%2F10.1101%2Fgr.114751.110&amp;rft.au=Hobolth%2C+A.&amp;rft.au=Dutheil%2C+J.&amp;rft.au=Hawks%2C+J.&amp;rft.au=Schierup%2C+M.&amp;rft.au=Mailund%2C+T.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CComputer+Science+%2F+Engineering%2CMathematics%2CGenetics%2C+Bioinformatics%2C+Computational+Biology">Hobolth, A., Dutheil, J., Hawks, J., Schierup, M., &amp; Mailund, T. (2011). Incomplete lineage sorting patterns among human, chimpanzee and orangutan suggest recent orangutan speciation and widespread selection <span style="font-style: italic;">Genome Research</span> DOI: <a rev="review" href="http://dx.doi.org/10.1101/gr.114751.110">10.1101/gr.114751.110</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2011/02/03/coalhmm-analysis-of-the-humanchimpanzee-ancestor-based-on-the-orangutan-genome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Textile plots of LD</title>
		<link>http://www.mailund.dk/index.php/2010/04/29/textile-plots-of-ld/</link>
		<comments>http://www.mailund.dk/index.php/2010/04/29/textile-plots-of-ld/#comments</comments>
		<pubDate>Thu, 29 Apr 2010 04:08:53 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Paper reviews]]></category>
		<category><![CDATA[LD]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2126</guid>
		<description><![CDATA[There&#8217;s a paper that came out yesterday in PLoS ONE on visualising LD structure: The Textile Plot: A New Linkage Disequilibrium Display of Multiple-Single Nucleotide Polymorphism Genotype Data Kumasaka, Nakamure and Kamatani Linkage disequilibrium (LD) is a major concern in many genetic studies because of the markedly increased density of SNP (Single Nucleotide Polymorphism) genotype [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a paper that came out yesterday in PLoS ONE on visualising LD structure:</p>
<blockquote><p><a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0010207"><strong>The Textile Plot: A New Linkage Disequilibrium Display of Multiple-Single Nucleotide Polymorphism Genotype Data</strong></a></p>
<p>Kumasaka, Nakamure and Kamatani</p>
<p>Linkage disequilibrium (LD) is a major concern in many genetic studies because of the markedly increased density of SNP (Single Nucleotide Polymorphism) genotype markers. This dramatic increase in the number of SNPs may cause problems in statistical analyses, such as by introducing multiple comparisons in hypothesis testing and colinearity in logistic regression models, because of the presence of complex LD structures. Inferences must be made about the underlying genetic variation through the LD structure before applying statistical models to the data. Therefore, we introduced the textile plot to provide a visualization of LD to improve the analysis of the genetic variation present in multiple-SNP genotype data. The plot can accentuate LD by displaying specific geometrical shapes, and allowing for the underlying haplotype structure to be inferred without any haplotype-phasing algorithms. Application of this technique to simulated and real data sets illustrated the potential usefulness of the textile plot as an aid to the interpretation of LD in multiple-SNP genotype data. The initial results of LD mapping and haplotype analyses of disease genes are encouraging, indicating that the textile plot may be useful in disease association studies.</p></blockquote>
<p>An example of this new kind of plots looks like this:</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2010/04/journal.pone_.0010207.g005.png"><img class="aligncenter size-medium wp-image-2127" title="Textile plot" src="http://www.mailund.dk/wp-content/uploads/2010/04/journal.pone_.0010207.g005-181x300.png" alt="" width="181" height="300" /></a>At a quick glance it looks like it is displaying haplotype blocks, like you can get in HaploView (although in a nicer graphics).</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2010/04/haplotypes.png"><img class="aligncenter size-medium wp-image-2128" title="Haplotype blocks" src="http://www.mailund.dk/wp-content/uploads/2010/04/haplotypes-300x225.png" alt="" width="300" height="225" /></a>It isn&#8217;t quite that, though.</p>
<p>The textile plot is showing LD between genotypes and not haplotype blocks, so you always have three &#8220;blocks&#8221; per column, and so you don&#8217;t know the phase of the genotypes you are looking at.</p>
<p>The plot simply visualises the genotype LD structure, and I am sure that with a bit of practice they can be used to explore that.</p>
<p>I don&#8217;t have that practice, though, so I find them a bit hard to interpret.  They are beautiful, though.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2010/04/29/textile-plots-of-ld/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Phylogenomics of primates and their ancestral populations</title>
		<link>http://www.mailund.dk/index.php/2009/11/17/phylogenomics-of-primates-and-their-ancestral-populations/</link>
		<comments>http://www.mailund.dk/index.php/2009/11/17/phylogenomics-of-primates-and-their-ancestral-populations/#comments</comments>
		<pubDate>Tue, 17 Nov 2009 18:46:19 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Paper reviews]]></category>
		<category><![CDATA[phylogenetics]]></category>
		<category><![CDATA[phylogenomics]]></category>
		<category><![CDATA[primate evolution]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=1968</guid>
		<description><![CDATA[If you are interested in phylogenomics and primate evolution &#8212; including human evolution &#8212; this new review in Genome Research is a must read. Phylogenomics of primates and their ancestral populations Adam Siepel Genome assemblies are now available for nine primate species, and large-scale sequencing projects are underway or approved for six others. An explicitly [...]]]></description>
			<content:encoded><![CDATA[<p>If you are interested in phylogenomics and primate evolution &#8212; including human evolution &#8212; this new review in Genome Research is a must read.</p>
<blockquote><p><a href="http://genome.cshlp.org/content/19/11/1929.abstract"><strong>Phylogenomics of primates and their ancestral populations</strong></a></p>
<p>Adam Siepel</p>
<p>Genome assemblies are now available for nine primate species, and large-scale sequencing projects are underway or approved for six others. An explicitly evolutionary and phylogenetic approach to comparative genomics, called phylogenomics, will be essential in unlocking the valuable information about evolutionary history and genomic function that is contained within these genomes. However, most phylogenomic analyses so far have ignored the effects of variation in ancestral populations on patterns of sequence divergence. These effects can be pronounced in the primates, owing to large ancestral effective population sizes relative to the intervals between speciation events. In particular, local genealogies can vary considerably across loci, which can produce biases and diminished power in many phylogenomic analyses of interest, including phylogeny reconstruction, the identification of functional elements, and the detection of natural selection. At the same time, this variation in genealogies can be exploited to gain insight into the nature of ancestral populations. In this Perspective, I explore this area of intersection between phylogenetics and population genetics, and its implications for primate phylogenomics. I begin by “lifting the hood” on the conventional tree-like representation of the phylogenetic relationships between species, to expose the population-genetic processes that operate along its branches. Next, I briefly review an emerging literature that makes use of the complex relationships among coalescence, recombination, and speciation to produce inferences about evolutionary histories, ancestral populations, and natural selection. Finally, I discuss remaining challenges and future prospects at this nexus of phylogenetics, population genetics, and genomics.</p></blockquote>
<p>&#8230;and if you are wondering why my blog is so quiet these days, it is because I am swamped with four of the genome projects mentioned in the paper: orangutan, bonobo, gorilla and macaque&#8230;</p>
<p>Any summary of this paper that I write will not really do justice to it &#8212; you really should read it yourself and you will be happy you did &#8212; so I&#8217;ll just briefly summarize the topics that Adam covers.</p>
<p>First he covers basic phylogenetics, that is figuring out species relationships.  This is, by now, a well known field and essentially boils down to modeling sequence evolution as Markov chains so you can estimate divergence times and tree relationships from the substitutions between sequences.</p>
<p>For closely related species, though, that is only a small part of the picture, and the more interesting part of the paper involves introducing population genetics to phylogenetics.  You have to remember that speciation somehow involves populations; two species do not just split up, rather groups of individuals diverge and their genomes start diverging as groups rather than individuals.  That leads to <a href="http://www.mailund.dk/index.php/2009/02/27/on-segment-lengths-going-back-in-time-in-the-coalescence-process-part-2-the-ancestry-of-two-species/">varying sequence divergence as you scan along the genomes</a>, and under certain conditions to incomplete lineage sorting, where <a href="http://www.mailund.dk/index.php/2009/02/12/on-gene-trees-and-species-trees/">gene trees are different from species trees</a>.</p>
<p>This doesn&#8217;t just cause complications in genomic inference, though.  It provides valuable information about ancestral species and about speciation processes, which is the next topic Adam covers.  For primates, this is especially important.  The time intervals between speciations are short, and the ancestral effective population sizes are large *, so 1) if you ignore this your results will be way off, but 2) if you embrace it you have a <em>lot</em> of information to learn about the ancestry of the primates.</p>
<p>This then leads us to speciation models.  There are plenty of those, where the simplest (allopatric speciation) just assumes that some barrier appears between two populations after which they evolve independently to the point where they can no longer reproduce as hybrids.  That is probably a good model for the chimp/bonobo split, where the Congo River got in the way (chimps can&#8217;t swim), but it <em>is</em> a bit simple so more complex scenarios are worth considering for most speciation events.  The point here just is that different scenarios will leave different signals in the genomes, and we should be able to work this out by looking at the extant genomes.</p>
<p>There&#8217;s a nice review of the work done so far in the paper, but honestly we are still only at the starting phase of modeling this, and a lot of work remains before we can say anything conclusively about <em>any</em> of the primate speciations.</p>
<p>Next we get to selection.  With the whole neutral theory we have turned to believe that we can explain most of genome evolution with neutral mutations &#8212; well I have anyway, but that might just be me.  Recent results, though, hints at selection being a major force in genome evolution anyway. My older colleagues tells me that selection was much more important in theory years back, but my background gave me the intuition that it could pretty much be ignored when comparing genomes; maybe I was wrong on that.</p>
<p>Perhaps the null model when we look at entire genomes shouldn&#8217;t be neutrality after all, I don&#8217;t know&#8230; We are seeing signals to that effect in our own work, anyway, but I&#8217;ll tell you all about that later when those papers are out, for now let&#8217;s just read Adam&#8217;s paper that is much more interesting anyway!</p>
<p>The last part of the paper is on Future Prospects.  Well, most papers are, so no surprise there, but if you are getting into the field there are some interesting areas to start thinking about in this review.</p>
<p>How do we incorporate the ancestral recombination graph (ARG) into phylogenetic analysis?  How do we model it without the combinatorial state space explosion?  How do we infer anything usable from the weak signals that is in the data for this? How do we combine model sophistication with computational efficiency to alleviate the state space explosion? Which model assumptions are essential and which can we get away with approximating?</p>
<p>Let me add a few of my own: How do we model this complex system without too much complex math so that when we have results we can actually interpret the results?  How do we check if deviations from our model actually shows evidence for some model over another, and are not just showing that we have the wrong model?</p>
<p>Go read the paper!  Seriously, it is a great read!</p>
<p>&#8211;</p>
<p>* Yeah, about ancestral population sizes&#8230; there are consistent estimates of very large ancestral effective population sizes, using very different methods, but generally it seems like the ancestral species were more diverge than the extant species are.  The consistent results, with different methods, indicates that this might be true, but it still is somewhat suspicious, but I guess we will learn more over the coming years as we get more data and more sophisticated methods.</p>
<p>&#8211;<br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Genome+Research&amp;rft_id=info%3Adoi%2F10.1101%2Fgr.084228.108&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Phylogenomics+of+primates+and+their+ancestral+populations&amp;rft.issn=1088-9051&amp;rft.date=2009&amp;rft.volume=19&amp;rft.issue=11&amp;rft.spage=1929&amp;rft.epage=1941&amp;rft.artnum=http%3A%2F%2Fgenome.cshlp.org%2Fcgi%2Fdoi%2F10.1101%2Fgr.084228.108&amp;rft.au=Siepel%2C+A.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C+%2C+Evolutionary+Biology">Siepel, A. (2009). Phylogenomics of primates and their ancestral populations <span style="font-style: italic;">Genome Research, 19</span> (11), 1929-1941 DOI: <a rev="review" href="http://dx.doi.org/10.1101/gr.084228.108">10.1101/gr.084228.108</a></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Genome+Research&amp;rft_id=info%3Adoi%2F10.1101%2Fgr.084228.108&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Phylogenomics+of+primates+and+their+ancestral+populations&amp;rft.issn=1088-9051&amp;rft.date=2009&amp;rft.volume=19&amp;rft.issue=11&amp;rft.spage=1929&amp;rft.epage=1941&amp;rft.artnum=http%3A%2F%2Fgenome.cshlp.org%2Fcgi%2Fdoi%2F10.1101%2Fgr.084228.108&amp;rft.au=Siepel%2C+A.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C+%2C+Evolutionary+Biology">321-327=-6<br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/11/17/phylogenomics-of-primates-and-their-ancestral-populations/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Detecting Selective Sweeps: A New Approach Based on Hidden Markov Models</title>
		<link>http://www.mailund.dk/index.php/2009/09/30/detecting-selective-sweeps-a-new-approach-based-on-hidden-markov-models/</link>
		<comments>http://www.mailund.dk/index.php/2009/09/30/detecting-selective-sweeps-a-new-approach-based-on-hidden-markov-models/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 15:59:44 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Paper reviews]]></category>
		<category><![CDATA[genetics]]></category>
		<category><![CDATA[HMM]]></category>
		<category><![CDATA[Selection]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=1894</guid>
		<description><![CDATA[Two of my main interests are hidden Markov models and selection.  A paper from this spring, in Genetics, combines the two: Detecting Selective Sweeps: A New Approach Based on Hidden Markov Models Boitard, Schlötterer and Futschik Detecting and localizing selective sweeps on the basis of SNP data has recently received considerable attention. Here we introduce [...]]]></description>
			<content:encoded><![CDATA[<p>Two of my main interests are hidden Markov models and selection.  A paper from this spring, in Genetics, combines the two:</p>
<blockquote><p><a href="http://www.genetics.org/cgi/content/abstract/181/4/1567"><strong><span title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Genetics&amp;rft_id=info%3Adoi%2F10.1534%2Fgenetics.108.100032&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Detecting+Selective+Sweeps%3A+A+New+Approach+Based+on+Hidden+Markov+Models&amp;rft.issn=0016-6731&amp;rft.date=2009&amp;rft.volume=181&amp;rft.issue=4&amp;rft.spage=1567&amp;rft.epage=1578&amp;rft.artnum=http%3A%2F%2Fwww.genetics.org%2Fcgi%2Fdoi%2F10.1534%2Fgenetics.108.100032&amp;rft.au=Boitard%2C+S.&amp;rft.au=Schlotterer%2C+C.&amp;rft.au=Futschik%2C+A.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMathematics%2CGenetics%2C+Applied+Mathematics">Detecting Selective Sweeps: A New Approach Based on Hidden Markov Models</span></strong></a></p>
<p><span title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Genetics&amp;rft_id=info%3Adoi%2F10.1534%2Fgenetics.108.100032&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Detecting+Selective+Sweeps%3A+A+New+Approach+Based+on+Hidden+Markov+Models&amp;rft.issn=0016-6731&amp;rft.date=2009&amp;rft.volume=181&amp;rft.issue=4&amp;rft.spage=1567&amp;rft.epage=1578&amp;rft.artnum=http%3A%2F%2Fwww.genetics.org%2Fcgi%2Fdoi%2F10.1534%2Fgenetics.108.100032&amp;rft.au=Boitard%2C+S.&amp;rft.au=Schlotterer%2C+C.&amp;rft.au=Futschik%2C+A.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMathematics%2CGenetics%2C+Applied+Mathematics">Boitard, Schlötterer and Futschik</span></p>
<p>Detecting and localizing selective sweeps on the basis of SNP<sup> </sup>data has recently received considerable attention. Here we introduce<sup> </sup>the use of hidden Markov models (HMMs) for the detection of<sup> </sup>selective sweeps in DNA sequences. Like previously published<sup> </sup>methods, our HMMs use the site frequency spectrum, and the spatial<sup> </sup>pattern of diversity along the sequence, to identify selection.<sup> </sup>In contrast to earlier approaches, our HMMs explicitly model<sup> </sup>the correlation structure between linked sites. The detection<sup> </sup>power of our methods, and their accuracy for estimating the<sup> </sup>selected site location, is similar to that of competing methods<sup> </sup>for constant size populations. In the case of population bottlenecks,<sup> </sup>however, our methods frequently showed fewer false positives.</p></blockquote>
<h3>Selective sweeps</h3>
<p>Under a simple Wright-Fisher model, a neutral mutation that is just introduced into a population  can slowly increase and decrease in frequency until it is eventually either fixed in the population, which happens with probability <img src="http://www.mailund.dk/wp-content/cache/tex_d91f7f356cd888fce543feef9bce26b5.png" align="absmiddle" class="tex" alt="\frac{1}{2N_e}" />, or until it is lost from the population againg, which happens with probability <img src="http://www.mailund.dk/wp-content/cache/tex_b7f6353e3d31852a370f594d4176d183.png" align="absmiddle" class="tex" alt="1-\frac{1}{2N_2}" /> of course.</p>
<p>The expected time from such a mutation is introduced into the population and until it is fixed, if it is lucky to be fixed, is <img src="http://www.mailund.dk/wp-content/cache/tex_68423194578196a2f3bf9afe2e7a976f.png" align="absmiddle" class="tex" alt="2N_2" /> generations.  During this time, the descendant chromosomes of the original mutant chromosome will be subjected to new mutations and to recombinations.</p>
<p>Once this mutation is fixed, everyone in the population will of course share that particular mutation (ignoring back-mutations and such here), but because of recombination nearby sites will not necessarily all be derived from the original mutation chromosome.  Close to the mutation site &#8212; where few recombinations will have broken up the sequence &#8212; most chromosomes will be derived from the mutation chromosome and as we move away from the mutation site fewer chromosomes will be derived from that original chromosome.</p>
<p>Now, if the mutation introduced has a selective advantage, essentially the same process will play out.  In each generation there is a slightly higher chance that this mutation will have off-springs, but that is essentially the only difference.</p>
<p>What this means is that initially there is still a very good chance that the mutation will be lost &#8212; even with slightly better odds accidents <em>do</em> happen &#8212; but once the mutation has reached a reasonable frequency it is almost guaranteed to reach fixation &#8212; unless a <em>lot</em> of accidents happen.</p>
<p>Once the frequency of the site under selection is high enough it will very quickly reach fixation.  The expected time it takes depends on the selection strength but unless the selective advantage is very small it will reach fixation a lot faster than if it was neutral.  Think logarithmic time in the size of the population compared to linear time.</p>
<p>Since it reaches fixation much faster than a neutral mutation, fewer mutations and fewer recombinations will have time to occur, so a much wider region around the mutation site will be shared by all descendant chromosomes.  Combined, this means that for a selected site you expect a wide region with a more recent shared ancestor than you would expect at a neutral site, a phenomena called a <em><a href="http://en.wikipedia.org/wiki/Selective_sweep">selective sweep</a>.</em></p>
<h3>Site frequency spectra</h3>
<p>Now, from the population genetics model you can work out &#8212; putting your thinking hat on or just simulate &#8212; the expected distribution of derived and ancestral alleles: the <em>site frequency spectrum</em>.  This will be different from neutral alleles and selected alleles because of the shorter time back to the common ancestor for the selected sites.  The shorter site means that there is a general reduction in polymorphism near a selected site, and derived alleles that appeared on chromosomes with the beneficial mutation will be at a higher frequency than they would be if they weren&#8217;t &#8220;hitchhiking&#8221; on the selection of the beneficial mutation.</p>
<p>The pattern is a bit complicated by recombination, since you need to take into account that the further away from the selected site you look, the weaker the hitchhiking effect will be; a new mutation can only hitchhike as long as it is linked to the selected site, and recombinations break that link.</p>
<p>Anyway, the different spectra of derived and ancestral alleles can be used to detect selective sweeps.  Two methods that exploit this, that is relevant for this post, are <a href="http://www.genetics.org/cgi/content/abstract/160/2/765">Kim and Stephan (2002)</a> and <a href="http://genome.cshlp.org/content/15/11/1566.abstract">Nielsen <em>et al.</em> (2005)</a>.</p>
<p>Of course, selection is not the only thing that can mess up the site frequency spectrum and make it different from the expected neutral distribution.  Demographic effects like expending populations and bottlenecks can look very similar to selection effects, so we cannot absolutely rule out neutrality if we see a deviation from the expected spectrum.  Still, the site frequency spectra of neutrality versus selection can be used for scanning for selection.</p>
<h3>Detecting sweeps in a hidden Markov model</h3>
<p>The new result in the Genetics paper is a hidden Markov model that uses site frequency spectra to scan for selective sweeps.</p>
<p>Using an HMM means that the model can capture spatial patterns along a genome and capture transitions from &#8220;neutral&#8221; regions &#8212; where no sweep has occurred or is occurring &#8212; from &#8220;selected&#8221; regions &#8212; where a sweep occurred or is occurring.  So you don&#8217;t have to assume that a locus you are looking at is either a neutral region or a selected region and you don&#8217;t have to fiddle around with sliding windows to scan a genome, you explicitly capture the changing patters.</p>
<p>One of the nice properties of HMMs for genomic scans and the reason I love them so much.</p>
<p>The model Boitard <em>et al.</em> develop is quite simple.  They have three states: a neutral state, a selected state, and an intermediate used to capture sites that are slightly caught up in the hitchhiking but not close enough to a selected site to get the full effect.</p>
<p>The transition matrix has a single parameter, <img src="http://www.mailund.dk/wp-content/cache/tex_83878c91171338902e0fe0fb97a8c47a.png" align="absmiddle" class="tex" alt="p" />, that is the probability that a neutral or selected site switches to the intermediate state (and the intermediate state switches to those two with equal probability set to <img src="http://www.mailund.dk/wp-content/cache/tex_603bc185c1e95940156e64accf7c24f5.png" align="absmiddle" class="tex" alt="p/2" />).</p>
<p><center><img src="http://www.mailund.dk/wp-content/cache/tex_a376b4016ae561e43bba358d8751e00b.png" align="absmiddle" class="tex" alt="T=\begin{pmatrix}1-p&amp;p&amp;0\\ p/2&amp;1-p&amp;p/2\\ 0&amp;p&amp;1-p\end{pmatrix}" /></center></p>
<p>This of course has the unfortunate effect that the prior distribution (stationary distribution) of the chain will give you 25% chance of a site being neutral, 25% chance of it being selected and 50% chance of being intermediate, which doesn&#8217;t really match my expectation of the amount of selection in, say, a human genome. Also, the (prior) expected length of a sweeped region is the same as a neutral region which also does not match my intuition.  With enough data, though, the likelihood should overrule the prior so perhaps it is not too much of a worry&#8230;</p>
<p>The emissions of the model are frequencies of derived alleles, so for each site it will emit a frequency that depends on the state.  This is where they capture the different expected frequencies depending on whether a site is neutral or selected.</p>
<p>They use the Kim and Stephan&#8217;s and Nielsen <em>et al</em>. methods for this, to develop three variations of HMMs: HMMA, using Kim and Stephan, HMMB using Nielsen <em>et al.</em> and HMMB-SEQ, that also uses Nielsen <em>et al.</em> but only considers segregating sites.  The latter is only for comparison purposes and of course ignores a lot of the information in the data, since the amount of non-segregating sites reflects the general level of polymorphism in a region which again is dependent on the depth of the local genealogy and will be affected by selection.</p>
<p>They use simulations under neutrality to fix the parameter <img src="http://www.mailund.dk/wp-content/cache/tex_83878c91171338902e0fe0fb97a8c47a.png" align="absmiddle" class="tex" alt="p" /> so they get a 5% false positive rate, and then use the models to scan for sweeps.</p>
<p>They get an okay power for detecting sweeps, but compared to the previous methods they don&#8217;t get <em>that </em>much since they did pretty good as well:</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/09/Screen-shot-2009-09-30-at-6.06.47-PM.png"><img class="aligncenter size-medium wp-image-1901" title="Table 1" src="http://www.mailund.dk/wp-content/uploads/2009/09/Screen-shot-2009-09-30-at-6.06.47-PM-300x182.png" alt="Table 1" width="300" height="182" /></a>Where they refer to this table in the paper they say they have a higher power, but compared to the CLsw column, the Kim and Stephan&#8217;s method, they do not.  After all, it is difficult to beat a power of 1.</p>
<p>They do, however, appear to be more robust to bottlenecks where the two other methods have very high false positive rates:</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/09/Screen-shot-2009-09-30-at-6.14.04-PM.png"><img class="aligncenter size-medium wp-image-1902" title="Table 5" src="http://www.mailund.dk/wp-content/uploads/2009/09/Screen-shot-2009-09-30-at-6.14.04-PM-300x129.png" alt="Table 5" width="300" height="129" /></a></p>
<p>&#8211;<br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Genetics&amp;rft_id=info%3Adoi%2F10.1534%2Fgenetics.108.100032&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Detecting+Selective+Sweeps%3A+A+New+Approach+Based+on+Hidden+Markov+Models&amp;rft.issn=0016-6731&amp;rft.date=2009&amp;rft.volume=181&amp;rft.issue=4&amp;rft.spage=1567&amp;rft.epage=1578&amp;rft.artnum=http%3A%2F%2Fwww.genetics.org%2Fcgi%2Fdoi%2F10.1534%2Fgenetics.108.100032&amp;rft.au=Boitard%2C+S.&amp;rft.au=Schlotterer%2C+C.&amp;rft.au=Futschik%2C+A.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMathematics%2CGenetics%2C+Applied+Mathematics">Boitard, S., Schlotterer, C., &amp; Futschik, A. (2009). Detecting Selective Sweeps: A New Approach Based on Hidden Markov Models <span style="font-style: italic;">Genetics, 181</span> (4), 1567-1578 DOI: <a rev="review" href="http://dx.doi.org/10.1534/genetics.108.100032">10.1534/genetics.108.100032</a></span><br />
273-307=-34</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/09/30/detecting-selective-sweeps-a-new-approach-based-on-hidden-markov-models/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Not exactly an impressive success rate&#8230;</title>
		<link>http://www.mailund.dk/index.php/2009/09/26/not-exactly-an-impressive-success-rate/</link>
		<comments>http://www.mailund.dk/index.php/2009/09/26/not-exactly-an-impressive-success-rate/#comments</comments>
		<pubDate>Sat, 26 Sep 2009 16:33:28 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Paper reviews]]></category>
		<category><![CDATA[Work]]></category>
		<category><![CDATA[Research life]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=1887</guid>
		<description><![CDATA[From my own experience I know that it can be hard to get access to data that you would really love to analyse, but I didn&#8217;t expect it to be quite this bad, even for data that is required to be available by the journals where the papers describing the data are published: Empirical study [...]]]></description>
			<content:encoded><![CDATA[<p>From my own experience I know that it can be hard to get access to data that you would really love to analyse, but I didn&#8217;t expect it to be quite this bad, even for data that is <em>required</em> to be available by the journals where the papers describing the data are published:</p>
<blockquote><p><strong><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007078">Empirical study of data sharing by authors publishing in PLoS journals</a></strong></p>
<p>Savage and Vickers, PLoS ONE 2009</p>
<h3 style="font-family: Georgia, 'Times New Roman', Times, serif; color: #333333; font-size: 1.3em; font-weight: bold; margin-top: 20px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; border-bottom-width: 0px; border-bottom-style: initial; border-bottom-color: initial; padding: 0px;">Background</h3>
<p>Many journals now require authors share their data with other investigators, either by depositing the data in a public repository or making it freely available upon request. These policies are explicit, but remain largely untested. We sought to determine how well authors comply with such policies by requesting data from authors who had published in one of two journals with clear data sharing policies.</p>
<h3 style="font-family: Georgia, 'Times New Roman', Times, serif; color: #333333; font-size: 1.3em; font-weight: bold; margin-top: 20px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; border-bottom-width: 0px; border-bottom-style: initial; border-bottom-color: initial; padding: 0px;">Methods and Findings</h3>
<p>We requested data from ten investigators who had published in either PLoS Medicine or PLoS Clinical Trials. All responses were carefully documented. In the event that we were refused data, we reminded authors of the journal&#8217;s data sharing guidelines. If we did not receive a response to our initial request, a second request was made. Following the ten requests for raw data, three investigators did not respond, four authors responded and refused to share their data, two email addresses were no longer valid, and one author requested further details. A reminder of PLoS&#8217;s explicit requirement that authors share data did not change the reply from the four authors who initially refused. Only one author sent an original data set.</p>
<h3 style="font-family: Georgia, 'Times New Roman', Times, serif; color: #333333; font-size: 1.3em; font-weight: bold; margin-top: 20px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; border-bottom-width: 0px; border-bottom-style: initial; border-bottom-color: initial; padding: 0px;">Conclusions</h3>
<p>We received only one of ten raw data sets requested. This suggests that journal policies requiring data sharing do not lead to authors making their data sets available to independent investigators.</p></blockquote>
<p>Getting a 10% success rate, when it should be 100% is pretty bad&#8230;<br />
&#8211;<br />
269-304=-35</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/09/26/not-exactly-an-impressive-success-rate/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Detecting ancient admixture and estimating demographic parameters in multiple human populations</title>
		<link>http://www.mailund.dk/index.php/2009/09/26/detecting-ancient-admixture-and-estimating-demographic-parameters-in-multiple-human-populations/</link>
		<comments>http://www.mailund.dk/index.php/2009/09/26/detecting-ancient-admixture-and-estimating-demographic-parameters-in-multiple-human-populations/#comments</comments>
		<pubDate>Sat, 26 Sep 2009 14:11:33 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Paper reviews]]></category>
		<category><![CDATA[genetics]]></category>
		<category><![CDATA[Human evolution]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=1879</guid>
		<description><![CDATA[I read this paper on our way back from Leipzig and then again today to see if I missed anything in the first read through (I was pretty tired at the time). Detecting ancient admixture and estimating demographic parameters in multiple human populations Wall, Lohmueller and Plagnol, Mol Biol Evo 26(8):1823-1827 We analyze patterns of [...]]]></description>
			<content:encoded><![CDATA[<p>I read this paper on our way back from Leipzig and then again today to see if I missed anything in the first read through (I was pretty tired at the time).</p>
<blockquote><p><a href="http://mbe.oxfordjournals.org/cgi/content/abstract/msp096"><strong>Detecting ancient admixture and estimating demographic parameters in multiple human populations</strong></a></p>
<p><em>Wall, Lohmueller and Plagnol, Mol Biol Evo 26(8):1823-1827</em></p>
<p>We analyze patterns of genetic variation in extant human polymorphism<sup> </sup>data from the National Institute of Environmental Health Sciences<sup> </sup>single nucleotide polymorphism project to estimate human demographic<sup> </sup>parameters. We update our previous work by considering a larger<sup> </sup>data set (more genes and more populations) and by explicitly<sup> </sup>estimating the amount of putative admixture between modern humans<sup> </sup>and archaic human groups (e.g., Neandertals, <em>Homo erectus</em>, and<em><sup> </sup>Homo floresiensis</em>). We find evidence for this ancient admixture<sup> </sup>in European, East Asian, and West African samples, suggesting<sup> </sup>that admixture between diverged hominin groups may be a general<sup> </sup>feature of recent human evolution.</p></blockquote>
<p>What they do in this paper is to fit a two population coalescent model, with expansion, migration, bottlenecks and the works, to both an African+European and an African+Asian data set, then use this fitted model as a null model of the genetics of the populations.  They then 1) do a test on an LD statistic against this null model, taking rejections of this null model as evidence for admixture from archaic humans, and 2) fit an admixture extension of the model to estimate the level of admixture.  They find evidence for admixture with archaic humans for both data sets, with a somewhat higher degree in the Europeans.</p>
<p>I&#8217;m a bit underwhelmed by the paper, I must admit.  I&#8217;m not saying that there is no admixture with archaic humans, but this approach does not convince me.</p>
<p>Even when taking various demographic effects into account in the modeling, the null model is unlikely to exactly fit real data.  Taking deviations from the null model as any kind of evidence for admixture thus seems a bit hasty.</p>
<p>Not that I have any better ideas as to how to approach this, just, in my eyes the jury is still out on the question of admixture with archaic humans&#8230;</p>
<p>&#8211;<br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Molecular+Biology+and+Evolution&amp;rft_id=info%3Adoi%2F10.1093%2Fmolbev%2Fmsp096&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Detecting+Ancient+Admixture+and+Estimating+Demographic+Parameters+in+Multiple+Human+Populations&amp;rft.issn=0737-4038&amp;rft.date=2009&amp;rft.volume=26&amp;rft.issue=8&amp;rft.spage=1823&amp;rft.epage=1827&amp;rft.artnum=http%3A%2F%2Fmbe.oxfordjournals.org%2Fcgi%2Fdoi%2F10.1093%2Fmolbev%2Fmsp096&amp;rft.au=Wall%2C+J.&amp;rft.au=Lohmueller%2C+K.&amp;rft.au=Plagnol%2C+V.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Anthropology%2CBiology%2CGenetics%2C+Bioinformatics%2C+Computational+Biology%2C+Evolutionary+Anthropology">Wall, J., Lohmueller, K., &amp; Plagnol, V. (2009). Detecting Ancient Admixture and Estimating Demographic Parameters in Multiple Human Populations <span style="font-style: italic;">Molecular Biology and Evolution, 26</span> (8), 1823-1827 DOI: <a rev="review" href="http://dx.doi.org/10.1093/molbev/msp096">10.1093/molbev/msp096</a></span><br />
269-303=-34</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/09/26/detecting-ancient-admixture-and-estimating-demographic-parameters-in-multiple-human-populations/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>HMMoC and HMMConverter</title>
		<link>http://www.mailund.dk/index.php/2009/09/18/hmmoc-and-hmmconverter/</link>
		<comments>http://www.mailund.dk/index.php/2009/09/18/hmmoc-and-hmmconverter/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 10:42:12 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Paper reviews]]></category>
		<category><![CDATA[HMM]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=1815</guid>
		<description><![CDATA[I just want to say a few words about a short paper I read last week, and a paper that is a few years old now but related to it. The first is out in advanced access in Nucleic Acids Research: HMMConverter 1.0: a toolbox for hidden Markov models Lam and Meyer Hidden Markov models [...]]]></description>
			<content:encoded><![CDATA[<p>I just want to say a few words about a short paper I read last week, and a paper that is a few years old now but related to it.</p>
<p>The first is out in advanced access in Nucleic Acids Research:</p>
<blockquote><p><strong><a href="http://nar.oxfordjournals.org/cgi/content/abstract/gkp662v1">HMMConverter 1.0: a toolbox for hidden Markov models</a></strong></p>
<p>Lam and Meyer</p>
<p>Hidden Markov models (HMMs) and their variants are widely used<sup> </sup>in Bioinformatics applications that analyze and compare biological<sup> </sup>sequences. Designing a novel application requires the insight<sup> </sup>of a human expert to define the model&#8217;s architecture. The implementation<sup> </sup>of prediction algorithms and algorithms to train the model&#8217;s<sup> </sup>parameters, however, can be a time-consuming and error-prone<sup> </sup>task. We here present HMMC<span>ONVERTER</span>, a software package for setting<sup> </sup>up probabilistic HMMs, pair-HMMs as well as generalized HMMsand pair-HMMs. The user defines the model itself and the algorithms<sup> </sup>to be used via an XML file which is then directly translated<sup> </sup>into efficient C++ code. The software package provides linear-memory<sup> </sup>prediction algorithms, such as the Hirschberg algorithm, banding<sup> </sup>and the integration of prior probabilities and is the first<sup> </sup>to present computationally efficient linear-memory algorithms<sup> </sup>for automatic parameter training. Users of HMMC<span>ONVERTER</span> canthus set up complex applications with a minimum of effort and<sup> </sup>also perform parameter training and data analyses for large<sup> </sup>data sets.</p></blockquote>
<p>the other was published in Bioinformatics in 2007:</p>
<blockquote><p><strong><a href="http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/18/2485">HMMoC &#8211; a compiler for hidden Markov models</a></strong></p>
<p>Lunter</p>
<p>Hidden Markov models are widely applied within computational<sup> </sup>biology. The large data sets and complex models involved demand<sup> </sup>optimized implementations, while efficient exploration of model<sup> </sup>space requires rapid prototyping. These requirements are not<sup> </sup>met by existing solutions, and hand-coding is time-consuming<sup> </sup>and error-prone. Here, I present a compiler that takes over<sup> </sup>the mechanical process of implementing HMM algorithms, by translating<sup> </sup>high-level XML descriptions into efficient C++ implementations.<sup> </sup>The compiler is highly customizable, produces efficient and<sup> </sup>bug-free code, and includes several optimizations.</p></blockquote>
<p>Both papers describe compilers that generate C++ implementations of hidden Markov model algorithms from XML specifications, and really they are very similar.</p>
<p>The basic HMM algorithms are quite straightforward to implement, but if you want more complex models such as pair-HMMs or generalized HMMs there is a tad more complications to deal with, and if you need to optimize the algorithms in either runtime or memory usage there are some more complex algorithms you can use such as &#8220;banding&#8221; &#8211; implemented in both HMMoC and HMMConverter &#8211; that risk giving sub-optimal results but at a much reduced running time and memory consumption, or the Hirschberg algorithm &#8211; only implemented in HMMConverter as far as I can see &#8211; that exchanges a doubling in running time for a much reduced memory consumption.</p>
<p>Implementing such extra algorithms is not conceptually hard, but can be quite tedious and error prone, so it makes good sense to have code generators building the algorithms for you.  That is exactly what these tools do.</p>
<p>At a bird&#8217;s eye view, the tools are very similar.  You specify the HMM in an XML file (a specification language that I personally don&#8217;t like that much, but that is of course very subjective) and the tools then generate the algorithms you ask them to, output as C++ code.</p>
<p>HMMoC provides a number of handles for you to add your own C++ code to the generated code; I am not sure if HMMConverter does the same, but on the other hand HMMConverter provides handles for various constraints on the parameters so it might be easier to re-parameterize models made with that.</p>
<p>Another cool feature unique to HMMConverter is priors on sequence annotation.  You can provide an annotation to the input sequence(s) that is then incorporated in the emission probabilities.  The prior is really on hidden states, but incorporating them into the emission probabilities has exactly the effect you want from them: they weight the posterior probabilities of the hidden states along the input.</p>
<p>To deal with numerical issues, HMMConverter works in log-space while HMMoC uses something called &#8220;extended-exponent real numbers&#8221;.  Working in log-space can be really slow for the Forward and Backward algorithms, since you have to switch in and out of log-space to deal with sums of probabilities (the Viterbi algorithm doesn&#8217;t have this problem, so there the log-space solution is pretty fast).</p>
<p>Unfortunately, there isn&#8217;t any comparison between the execution times of algorithms generated with the two tools in the new paper, so I don&#8217;t know how much this matters.  In the HMM library I am developing with Andreas we found that the log-solution was very slow, though, and therefore we use a re-scaling approach instead.</p>
<p>I would love to see a comparison of the runtime efficiency between the approaches, but just not <em>quite</em> enough to go and do it myself right now&#8230;</p>
<p>&#8211;</p>
<ul>
<li><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nucleic+Acids+Research&amp;rft_id=info%3Adoi%2F10.1093%2Fnar%2Fgkp662&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=HMMCONVERTER+1.0%3A+a+toolbox+for+hidden+Markov+models&amp;rft.issn=0305-1048&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fwww.nar.oxfordjournals.org%2Fcgi%2Fdoi%2F10.1093%2Fnar%2Fgkp662&amp;rft.au=Lam%2C+T.&amp;rft.au=Meyer%2C+I.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CComputer+Science%2CBioinformatics%2C+Computational+Biology%2C+Software+Engineering">Lam, T., &amp; Meyer, I. (2009). HMMCONVERTER 1.0: a toolbox for hidden Markov models <span style="font-style: italic;">Nucleic Acids Research</span> DOI: <a rev="review" href="http://dx.doi.org/10.1093/nar/gkp662">10.1093/nar/gkp662</a></span></li>
<li><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Bioinformatics&amp;rft_id=info%3Adoi%2F10.1093%2Fbioinformatics%2Fbtm350&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=HMMoC+a+compiler+for+hidden+Markov+models&amp;rft.issn=1367-4803&amp;rft.date=2007&amp;rft.volume=23&amp;rft.issue=18&amp;rft.spage=2485&amp;rft.epage=2487&amp;rft.artnum=http%3A%2F%2Fwww.bioinformatics.oxfordjournals.org%2Fcgi%2Fdoi%2F10.1093%2Fbioinformatics%2Fbtm350&amp;rft.au=Lunter%2C+G.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CComputer+Science%2CBioinformatics%2C+Computational+Biology%2C+Software+Engineering">Lunter, G. (2007). HMMoC a compiler for hidden Markov models <span style="font-style: italic;">Bioinformatics, 23</span> (18), 2485-2487 DOI: <a rev="review" href="http://dx.doi.org/10.1093/bioinformatics/btm350">10.1093/bioinformatics/btm350</a></span></li>
</ul>
<p>261-289=-28</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/09/18/hmmoc-and-hmmconverter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Independent mammalian genome contractions following the KT boundary</title>
		<link>http://www.mailund.dk/index.php/2009/09/02/independent-mamalian-genome-contractions-following-the-kt-boundary/</link>
		<comments>http://www.mailund.dk/index.php/2009/09/02/independent-mamalian-genome-contractions-following-the-kt-boundary/#comments</comments>
		<pubDate>Wed, 02 Sep 2009 14:27:51 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Paper reviews]]></category>
		<category><![CDATA[evolution]]></category>
		<category><![CDATA[genome evolution]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=1656</guid>
		<description><![CDATA[Tomorrow it is my turn to present a paper at our genome evolution journal club at BiRC, and I have picked this one: Independent mammalian genome contractions following the KT boundary Mina Rho et al. Genome Biology and Evolution, 2009 Abstract Although it is generally accepted that major changes in the earth&#8217;s history are significant [...]]]></description>
			<content:encoded><![CDATA[<p>Tomorrow it is my turn to present a paper at our <em>genome evolution</em> journal club at BiRC, and I have picked this one:</p>
<blockquote><p><a href="http://gbe.oxfordjournals.org/cgi/content/abstract/2009/0/2"><strong>Independent mammalian genome contractions following the KT boundary</strong></a></p>
<p>Mina Rho <em>et al.</em> Genome Biology and Evolution, 2009</p>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p style="text-align: left;">Although it is generally accepted that major changes in the<sup> </sup>earth&#8217;s history are significant drivers of phylogenetic diversification<sup> </sup>and extinction, such episodes may also have long-lasting effects<sup> </sup>on genomic architecture. Here we show that widespread reductions<sup> </sup>in genome size have occurred in multiple lineages of mammals<sup> </sup>subsequent to the Cretaceous–Tertiary (KT) boundary, whereas<sup> </sup>there is no evidence for such changes in other vertebrate, invertebrate,<sup> </sup>or land plant lineages. Although the mechanisms remain unclear,<sup> </sup>such shifts in mammalian genome evolution may be a consequence<sup> </sup>of an increase in the efficiency of selection against excess<sup> </sup>DNA resulting from post-KT population size expansions. Independent<sup> </sup>historical changes in genome architecture in diverse lineages<sup> </sup>raise a significant challenge to the idea that genome size is<sup> </sup>finely tuned to achieve adaptive phenotypic modifications and<sup> </sup>suggest that attempts to use phylogenetic analysis to infer<sup> </sup>ancestral genome sizes may be problematical.</p>
</blockquote>
<p>We have <a href="http://www.mailund.dk/index.php/2008/10/07/a-short-introduction-to-the-human-genome/">previously read Michael Lynch&#8217;s book</a> on genome architecture and evolution and this paper reads a lot like that book in general theme.</p>
<p>Anyway, the paper looks at the age distribution of LTR repetitive elements.  These are transposable elements in the genome where when they are inserted they have two long terminal repeat (LTR) strings that are identical.  These two identical sequences diverge via mutations over time, and from the divergence between the two you can date the age of the insertion.</p>
<p>If the elements are inserted with a fixed rate B and disappear again with another fixed rate D, we can model this age distribution as a simple birth/death process and the number of elements at time <img src="http://www.mailund.dk/wp-content/cache/tex_e358efa489f58062f10dd7316b65649e.png" align="absmiddle" class="tex" alt="t" /> is given by <img src="http://www.mailund.dk/wp-content/cache/tex_cb2788dd896c81d5c3d1842d38b3bb7f.png" align="absmiddle" class="tex" alt="N_t = B \exp(-Dt)" />.  For several species this fits quite nicely:</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/09/picture-1.png"><img class="aligncenter size-medium wp-image-1657" title="Age distributions fitting the exponential decay" src="http://www.mailund.dk/wp-content/uploads/2009/09/picture-1-300x141.png" alt="" width="300" height="141" /></a></p>
<p>but for mammals there is a strange &#8220;bulge&#8221; after the KT boundary indicating that either the birth rate has dropped recently or that the death rate has increased:</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/09/picture-2.png"><img class="aligncenter size-medium wp-image-1658" title="Mammalian age distribution" src="http://www.mailund.dk/wp-content/uploads/2009/09/picture-2-300x156.png" alt="" width="300" height="156" /></a></p>
<p>Since this bulge is after the divergence of these lineages, this change in the process must have occurred independently in all these mammals.</p>
<p>The hypothesis for what has happened given in the paper is this:  After the extinction of the dinosaurs the mammals have generally increased in numbers in all lineages with a resulting increase in effective population size.  What happens when the effective population size goes up is that selection becomes more efficient compared to genetic drift, so assuming that these elements are slightly deleterious, we would expect that fewer of them gets fixed and more of them gets removed as the effective population size goes up.</p>
<p>That explanation is of course not proven by the data, but it does fit the pattern observed.</p>
<p>In any case, it is clear that we have experienced a decrease in the recent insertions compared to older elements, which means that unless something else is now taking up the space our genomes are shrinking.</p>
<p>Don&#8217;t worry too much about that, though, it is the junk that is disappearing.</p>
<p>&#8211;</p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Genome+Biology+and+Evolution&amp;rft_id=info%3Adoi%2F10.1093%2Fgbe%2Fevp007&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Independent+Mammalian+Genome+Contractions+Following+the+KT+Boundary&amp;rft.issn=1759-6653&amp;rft.date=2009&amp;rft.volume=2009&amp;rft.issue=0&amp;rft.spage=2&amp;rft.epage=12&amp;rft.artnum=http%3A%2F%2Fgbe.oxfordjournals.org%2Fcgi%2Fdoi%2F10.1093%2Fgbe%2Fevp007&amp;rft.au=Rho%2C+M.&amp;rft.au=Zhou%2C+M.&amp;rft.au=Gao%2C+X.&amp;rft.au=Kim%2C+S.&amp;rft.au=Tang%2C+H.&amp;rft.au=Lynch%2C+M.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C+%2C+Evolutionary+Biology">Rho, M., Zhou, M., Gao, X., Kim, S., Tang, H., &amp; Lynch, M. (2009). Independent Mammalian Genome Contractions Following the KT Boundary <span style="font-style: italic;">Genome Biology and Evolution, 2009</span>, 2-12 DOI: <a rev="review" href="http://dx.doi.org/10.1093/gbe/evp007">10.1093/gbe/evp007</a></span><br />
245-253=-8</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/09/02/independent-mamalian-genome-contractions-following-the-kt-boundary/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Patterns of autosomal divergence between the human and chimpanzee genomes support an allopatric model of speciation</title>
		<link>http://www.mailund.dk/index.php/2009/08/26/patterns-of-autosomal-divergence-between-the-human-and-chimpanzee-genomes-support-an-allopatric-model-of-speciation/</link>
		<comments>http://www.mailund.dk/index.php/2009/08/26/patterns-of-autosomal-divergence-between-the-human-and-chimpanzee-genomes-support-an-allopatric-model-of-speciation/#comments</comments>
		<pubDate>Wed, 26 Aug 2009 17:34:15 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Paper reviews]]></category>
		<category><![CDATA[Paper review]]></category>
		<category><![CDATA[speciation]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=1632</guid>
		<description><![CDATA[A few days ago I wrote about the hypothesis of complex speciation between humans and chimps, and today I&#8217;ll briefly discuss another paper on the human / chimp speciation: Patterns of autosomal divergence between the human and chimpanzee genomes support an allopatric model of speciation Matthew T. Webster, Gene 443 70-75, 2009 Abstract There is [...]]]></description>
			<content:encoded><![CDATA[<p>A few days ago I wrote about <a href="http://www.mailund.dk/index.php/2009/08/19/doubts-about-complex-speciation-between-humans-and-chimpanzees/">the hypothesis of complex speciation between humans and chimps</a>, and today I&#8217;ll briefly discuss another paper on the human / chimp speciation:</p>
<blockquote><p><a href="http://www.sciencedirect.com/science?_ob=ArticleURL&amp;_udi=B6T39-4WBC1R2-2&amp;_user=10&amp;_rdoc=1&amp;_fmt=&amp;_orig=search&amp;_sort=d&amp;_docanchor=&amp;view=c&amp;_acct=C000050221&amp;_version=1&amp;_urlVersion=0&amp;_userid=10&amp;md5=78b2676953ee0a2bcc25a728d48e9a49"><strong>Patterns of autosomal divergence between the human and chimpanzee genomes support an allopatric model of speciation</strong></a></p>
<p>Matthew T. Webster, <em>Gene</em> 443 70-75, 2009</p>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p style="text-align: left;">There is a large variation in divergence times across genomic regions between human and chimpanzee. It has been suggested that this could partly result from selection against ancestral gene flow between incipient species in regions of the genome containing genetic incompatibilities. It is possible that such barriers to gene flow could arise in specific genes or in chromosomal inversions. I analysed patterns of lineage sorting that occur between human, chimpanzee and gorilla genomic sequences by examining divergent site patterns in &gt; 18 Mb genomic alignments. I develop a method to normalise site patterns by the mutational spectrum to minimise errors caused by misinference caused by recurrent mutation. Here I show that divergence times appear to be uniform between coding and noncoding sequences and between inverted and non-rearranged portions of chromosomes. I therefore find no evidence to support the large-scale accumulation of genetic incompatibilities at speciation genes or chromosomal inversions in the ancestral population of humans and chimpanzees. In addition, site patterns that are discordant with the species tree occur more frequently in regions with high human recombination rates. This could indicate the action of selective sweeps in the ancestral population, but could also be indicative of increased rates of homoplasy in these regions. I argue that these observations are compatible with a neutral allopatric model of speciation.</p>
</blockquote>
<h3>Models of speciation</h3>
<p>Speciation happens when gene flow stops between one group of a species and another (and doesn&#8217;t start again later or we get something like the hybridization scenario I wrote about in my earlier post).</p>
<p>There are different ways this can happen.  For instance, one group might somehow find itself geographically isolated from the other &#8211; e.g. find themselves on the other side of a large river &#8211; effectively isolating the group from the rest of the species.  This is know as <a href="http://en.wikipedia.org/wiki/Allopatric_speciation">allopatric speciation</a> (or depending on exactly how this plays out, <a href="http://en.wikipedia.org/wiki/Peripatric_speciation">peripatric speciation</a>).</p>
<p>In this scenario, the speciation happens at the time where the groups are isolated.  From that point and onwards the groups are essentially different species, since gene flow has stopped.  It will take some time before the groups are <em>incapable</em> if inter-breeding, but unless they actually merge again at some time before then, the time of the speciation event is the time the groups get separated.</p>
<p>That doesn&#8217;t mean that the genomic divergence time between the two species matches the time back to the speciation event.  Some individuals in one of the groups might be closer related to individuals in the second group than the other individuals in the first group for a few generations.  So the genetic distance between the two species is a bit larger than the &#8220;species distance&#8221;.  Add in recombination and the <a href="http://www.mailund.dk/index.php/2009/02/27/on-segment-lengths-going-back-in-time-in-the-coalescence-process-part-2-the-ancestry-of-two-species/">picture gets a bit more complex</a>.</p>
<p>Still, we can talk about a specific point in time where the speciation time occurred and we have a mathematical model &#8211; the coalescent model &#8211; of the genome distance between the two species that depends on this time and the population genetics in the ancestral species before then.</p>
<p>The speciation can also be caused by &#8220;genetic isolation&#8221;.</p>
<p>If a new mutation enters the group, where homozygotes for either the wildtype or the mutants are fitter than the heterozygotes, then the group will tend to split into two.  The mutants and the wildtypes.</p>
<p>Without recombination, there wouldn&#8217;t be much difference in the genomic distance between the two resulting species.  The heterozygotes would be selected against and the two homozygotes would diverge.</p>
<p>With recombination, again the situation gets a bit more complicated.  The heterozygotes would still be selected against, but assuming heterozygoes still manage to mate from time to time, you would get homozygote offsprings of heterozygoes who are just as fit as other homozygotes.</p>
<p>Because there <em>is</em> selection against heterozygoes you will tend to split the species into two &#8211; the two homozygoes &#8211; but the divergence will be deeper at the locus of the mutation than it will in the rest of the genome.</p>
<p>We call such a locus a &#8220;speciation gene&#8221; and candidates for such genes are functional genes (where we expect some selection) or structural variations such as inversions.</p>
<h3>Back to the paper&#8230;</h3>
<p>What Webster looks at in this paper is the patterns of divergence &#8211; especially deep coalescence events with incomplete lineage sorting where we observe sites grouping human and gorilla or chimp and gorilla &#8211; in the genome.</p>
<p>He then looks at these patterns in genes, introns, inversions &#8230; the candiates for speciation genes, to see if these looks like they are more divergent than the rest of the genome.  If so, then the speciation between humans and chimps could be caused by speciation genes.  If not, then the speciation could be allopatric (the same &#8220;species divergence&#8221; throughout the genome, but of course not the exact same sequence divergence since the coalescence times will still vary along the genome).</p>
<p>Long story short, he doesn&#8217;t find any evidence for deeper divergence these places so we cannot rule out an allopatric speciation here.</p>
<p>He does find a correlation between recombination rate and deep divergence, which can be explained by either increased mutability in regions of high recombination or selective sweeps in the ancestral species.  The latter is much more interesting, really, but we cannot rule out the first explanation so I won&#8217;t comment much on this here&#8230;</p>
<h3>Critisism</h3>
<p>I do have a slight problem with the analysis in the paper, though.</p>
<p>It seems to me that by just looking at differences in divergence time between genes and the rest of the genome &#8211; or between inversions and the rest of the genome or whatnot &#8211; is not particularly powerful for detecting speciation genes.</p>
<p>When comparing general groups like this, it seems to me that a few speciation genes would simply be drowned out by the larger number of &#8220;plain old genes&#8221;.  So all the analysis is really saying is that there isn&#8217;t a large number of speciation genes between humans and chimps, not that there are none.</p>
<p>The paper doesn&#8217;t claim any more than this either, but it would be interesting to work out just how large a fraction of the genes would have to be speciation genes &#8211; and how large a difference between the divergence of speciation genes and the rest of the genome there has to be &#8211; to be able to distinguish between the two scenaria with this analysis.</p>
<p>I haven&#8217;t done the math yet, but I plan to when I get the time&#8230;</p>
<p>&#8211;<br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Gene&amp;rft_id=info%3Adoi%2F10.1016%2Fj.gene.2009.05.006&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Patterns+of+autosomal+divergence+between+the+human+and+chimpanzee+genomes+support+an+allopatric+model+of+speciation&amp;rft.issn=03781119&amp;rft.date=2009&amp;rft.volume=443&amp;rft.issue=1-2&amp;rft.spage=70&amp;rft.epage=75&amp;rft.artnum=http%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0378111909002522&amp;rft.au=Webster%2C+M.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C+Evolutionary+Biology">Webster, M. (2009). Patterns of autosomal divergence between the human and chimpanzee genomes support an allopatric model of speciation <span style="font-style: italic;">Gene, 443</span> (1-2), 70-75 DOI: <a rev="review" href="http://dx.doi.org/10.1016/j.gene.2009.05.006">10.1016/j.gene.2009.05.006</a></span><br />
238-243=-5</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/08/26/patterns-of-autosomal-divergence-between-the-human-and-chimpanzee-genomes-support-an-allopatric-model-of-speciation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Doubts about complex speciation between humans and chimpanzees</title>
		<link>http://www.mailund.dk/index.php/2009/08/19/doubts-about-complex-speciation-between-humans-and-chimpanzees/</link>
		<comments>http://www.mailund.dk/index.php/2009/08/19/doubts-about-complex-speciation-between-humans-and-chimpanzees/#comments</comments>
		<pubDate>Wed, 19 Aug 2009 07:47:14 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Paper reviews]]></category>
		<category><![CDATA[Apes]]></category>
		<category><![CDATA[divergence of human and apes]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=1612</guid>
		<description><![CDATA[I read this paper in bed yesterday before going to sleep: Doubts about complex speciation between humans and chimpanzees Presgraves and Yi, Trends in Ecology &#38; Evolution 2009 Abstract Two patterns from large-scale DNA sequence data have been put forward as evidence that speciation between humans and chimpanzees was complex, involving hybridization and strong selection. [...]]]></description>
			<content:encoded><![CDATA[<p>I read this paper in bed yesterday before going to sleep:</p>
<blockquote><p><a href="http://dx.doi.org/10.1016/j.tree.2009.04.007"><strong>Doubts about complex speciation between humans and chimpanzees</strong></a></p>
<p>Presgraves and Yi, Trends in Ecology &amp; Evolution 2009</p>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p>Two patterns from large-scale DNA sequence data have been put forward as evidence that speciation between humans and chimpanzees was complex, involving hybridization and strong selection. First, divergence between humans and chimpanzees varies considerably across the autosomes. Second, divergence between humans and chimpanzees (but not gorillas) is markedly lower on the X chromosome. Here, we describe how simple speciation and neutral molecular evolution explain both patterns. In particular, the wide range in autosomal divergence is consistent with stochastic variation in coalescence times in the ancestral population; and the lower human–chimpanzee divergence on the X chromosome is consistent with species differences in the strength of male-biased mutation caused by differences in mating system. We also highlight two further patterns of divergence that are problematic for the complex speciation model. Our conclusions raise doubts about complex speciation between humans and chimpanzees.</p></blockquote>
<h3>Complex speciation between humans and chimpanzees</h3>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/08/picture-1.png"><img class="alignright size-thumbnail wp-image-1613" title="Complex speciation" src="http://www.mailund.dk/wp-content/uploads/2009/08/picture-1-150x150.png" alt="" width="150" height="150" /></a>You might remember the <a href="http://www.nature.com/nature/journal/v441/n7097/full/nature04789.html">Patterson <em>et al.</em> paper</a> in Nature back in 2006, that argued for a complex speciation of humans and chimps: An early separation between the two, followed by a hybridization and then the extinction of one of the species ancestral to the hybrids.</p>
<p>The arguments for this theory were 1) large variation in divergence time along the autosomal chromosomes and 2) a much more recent divergence of the X chromosome compared to the autosomes.</p>
<p>Wakeley <a href="http://www.nature.com/nature/journal/v452/n7184/full/nature06805.html">then argued</a> that 1) at least didn&#8217;t need any complex speciation history.  The variation in divergence is actually as would be expected just from variation in coalescence times along the chromosomes, assuming a reasonably large effective population size of the human/chimp ancestor species.</p>
<p>As for 2), the coalescence process alone cannot explain the recent divergence of X chromosomes.  We do expect a more recent divergence of X chromosomes than autosomes, since the effective population size of X chromosomes is 3/4 of that of the autosomal chromosomes, but the divergence of the X chromosomes is less than what can be explained by this.</p>
<p>This could either be explained by selection on the X chromosome (which essentially reduces the effective population size and thus leads to a reduced divergence) or by the difference in mutation rate between males and females that would affect the X chromosome differently than the autosomes (reducing the difference between the two).</p>
<p>It is well known that there is a bias in mutation rate between males and females, having to do with the average number of genome replications per generation in males and females, respectively.  The details I won&#8217;t go into here (although they are pretty important for the post, the post would just get too long and I don&#8217;t want to loose the readers who already know this &#8230; I might write about it in a separate post another day&#8230;)</p>
<p>Anyway&#8230;</p>
<p>Selection is probably not likely.  It would require a pretty uniform selection across the X chromosome.  The male-biased mutation explanation sounds more reasonable.</p>
<p>A problem with both explanation, though &#8211; <a href="http://www.nature.com/nature/journal/v452/n7184/full/nature06806.html">Patterson <em>et al</em>. argued in their reply</a> &#8211; is that this weird pattern in X is only observed between human and chimp and not between human and gorilla (or chimp and gorilla).</p>
<blockquote><p>If mutation-rate differences alone could explain the observed data, we would expect a consistent value for <em><img style="border: 0pt none; vertical-align: baseline;" src="http://www.nature.com/__chars/alpha/black/ital/base/glyph.gif" alt="alpha" /></em> from the human–chimpanzee and human–gorilla divergence data, but estimates of <em><img style="border: 0pt none; vertical-align: baseline;" src="http://www.nature.com/__chars/alpha/black/ital/base/glyph.gif" alt="alpha" /></em> are significantly different (<em>P</em> =  0.001). A high value of <em><img style="border: 0pt none; vertical-align: baseline;" src="http://www.nature.com/__chars/alpha/black/ital/base/glyph.gif" alt="alpha" /></em> also cannot explain other important features in <a href="http://www.nature.com/nature/journal/v452/n7184/full/nature06806.html#t1">Table 1</a>: the near-absence of sites on chromosome X that cluster humans and gorillas or chimpanzees and gorillas; or why human–gorilla divergence should not be reduced on chromosome X (such a reduction would be expected if high male mutation rate were responsible for low human–chimpanzee genetic divergence on chromosome X).</p></blockquote>
<h3>Lineage specific male biased mutation rate</h3>
<p>The Presgraves and Yi paper argues that male biased mutation rate <em>can</em> explain the pattern after all.</p>
<p>True, the low divergence on X is only observed between humans and chimps and not between humans and gorillas, but if the strength of this bias is larger on the human and chimp lineages than on the gorilla lineage it could still be an explanation.</p>
<p>Chimps are very promiscuous, humans somewhat less so, while gorillas are polygynous.  This affects sperm production so chimps produce most sperm per ejaculation, gorillas the least and humans again inbetween.</p>
<p>With more sperm produced in humans and chimps than in gorillas, it is therefore conceivable that the mutation bias is stronger in chimps and humans than in gorillas.</p>
<p>So they estimate this bias per lineage and get exactly that result: the bias is strongest in chimps, intermediate in humans and weakest in gorillas:</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/08/picture-21.png"><img class="aligncenter size-medium wp-image-1615" title="Differences in strength of male-biased mutation among hominoid lineages" src="http://www.mailund.dk/wp-content/uploads/2009/08/picture-21-300x154.png" alt="" width="300" height="154" /></a></p>
<p>With different male-biased mutation rate in the lineages, with much less bias in gorillas, there is nothing strange in a reduced divergence on X chromosomes between humans and chimps than between humans and gorillas.</p>
<p>Voilà!  No more need for a complex speciation history!</p>
<p>At least until the next paper&#8230;</p>
<p>&#8211;</p>
<ol>
<li><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Trends+in+Ecology+%26+Evolution&amp;rft_id=info%3Adoi%2F10.1016%2Fj.tree.2009.04.007&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Doubts+about+complex+speciation+between+humans+and+chimpanzees&amp;rft.issn=01695347&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0169534709001906&amp;rft.au=Presgraves%2C+D.&amp;rft.au=Yi%2C+S.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C+%2C+Evolutionary+Biology">Presgraves, D., &amp; Yi, S. (2009). Doubts about complex speciation between humans and chimpanzees <span style="font-style: italic;">Trends in Ecology &amp; Evolution</span> DOI: <a rev="review" href="http://dx.doi.org/10.1016/j.tree.2009.04.007">10.1016/j.tree.2009.04.007</a></span></li>
<li><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Apmid%2F16710306&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Genetic+evidence+for+complex+speciation+of+humans+and+chimpanzees.&amp;rft.issn=0028-0836&amp;rft.date=2006&amp;rft.volume=441&amp;rft.issue=7097&amp;rft.spage=1103&amp;rft.epage=8&amp;rft.artnum=&amp;rft.au=Patterson+N&amp;rft.au=Richter+DJ&amp;rft.au=Gnerre+S&amp;rft.au=Lander+ES&amp;rft.au=Reich+D&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C+Evolutionary+Biology">Patterson N, Richter DJ, Gnerre S, Lander ES, &amp; Reich D (2006). Genetic evidence for complex speciation of humans and chimpanzees. <span style="font-style: italic;">Nature, 441</span> (7097), 1103-8 PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/16710306">16710306</a></span></li>
<li><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Apmid%2F18337768&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Complex+speciation+of+humans+and+chimpanzees.&amp;rft.issn=0028-0836&amp;rft.date=2008&amp;rft.volume=452&amp;rft.issue=7184&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Wakeley+J&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C%2C+Evolutionary+Biology">Wakeley J (2008). Complex speciation of humans and chimpanzees. <span style="font-style: italic;">Nature, 452</span> (7184) PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/18337768">18337768</a></span></li>
<li><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Adoi%2F10.1038%2Fnature06806&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Patterson+et+al.+reply&amp;rft.issn=0028-0836&amp;rft.date=2008&amp;rft.volume=452&amp;rft.issue=7184&amp;rft.spage=0&amp;rft.epage=0&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fdoifinder%2F10.1038%2Fnature06806&amp;rft.au=Patterson%2C+N.&amp;rft.au=Richter%2C+D.&amp;rft.au=Gnerre%2C+S.&amp;rft.au=Lander%2C+E.&amp;rft.au=Reich%2C+D.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C+%2C+Evolutionary+Biology">Patterson, N., Richter, D., Gnerre, S., Lander, E., &amp; Reich, D. (2008). Patterson et al. reply <span style="font-style: italic;">Nature, 452</span> (7184) DOI: <a rev="review" href="http://dx.doi.org/10.1038/nature06806">10.1038/nature06806</a></span></li>
</ol>
<p>231-236=-5</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/08/19/doubts-about-complex-speciation-between-humans-and-chimpanzees/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

