<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mailund on the Internet &#187; Research</title>
	<atom:link href="http://www.mailund.dk/index.php/category/work/research/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mailund.dk</link>
	<description>Computer science, bioinformatics, genetics, and everything in between</description>
	<lastBuildDate>Sun, 15 May 2011 11:24:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>On the radio about orangutans</title>
		<link>http://www.mailund.dk/index.php/2011/03/10/on-the-radio-about-orangutans/</link>
		<comments>http://www.mailund.dk/index.php/2011/03/10/on-the-radio-about-orangutans/#comments</comments>
		<pubDate>Thu, 10 Mar 2011 16:15:33 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2283</guid>
		<description><![CDATA[I&#8217;m working on a text book chapter and am a few weeks past the deadline, so I haven&#8217;t had time to blog lately. I have a few things I&#8217;d like to write about as soon as I am done with the chapter, but in the mean time here&#8217;s a podcast radio interview with me and [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m working on a text book chapter and am a few weeks past the deadline, so I haven&#8217;t had time to blog lately. I have a few things I&#8217;d like to write about as soon as I am done with the chapter, but in the mean time here&#8217;s a podcast radio interview with me and Mikkel Schierup (in Danish, though, sorry about that):</p>
<ul>
<li><a href="http://www.dr.dk/P1/Videnskabensverden/Udsendelser/2011/03/09103811.htm">Vi er alle lidt orangutaner</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2011/03/10/on-the-radio-about-orangutans/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Chimp research</title>
		<link>http://www.mailund.dk/index.php/2011/02/04/chimp-research/</link>
		<comments>http://www.mailund.dk/index.php/2011/02/04/chimp-research/#comments</comments>
		<pubDate>Fri, 04 Feb 2011 11:57:54 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2266</guid>
		<description><![CDATA[Sorry for linking to a lot of Danish sites, but this story is also kind of cool: Dansk forskning skal redde chimpansen. Its about genetics of chimps, and a project that I am involved in. We have sequenced the exomes of 30 chimps (eastern, central and western chimps) and we hope to submit out first [...]]]></description>
			<content:encoded><![CDATA[<p>Sorry for linking to a lot of Danish sites, but this story is also kind of cool: <a href="http://politiken.dk/videnskab/ECE1185918/dansk-forskning-skal-redde-chimpanser/">Dansk forskning skal redde chimpansen</a>.</p>
<p>Its about genetics of chimps, and a project that I am involved in. We have sequenced the exomes of 30 chimps (eastern, central and western chimps) and we hope to submit out first paper in a few weeks.</p>
<p>It&#8217;s a really cool project, but unfortunately I cannot say more about it until it is out&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2011/02/04/chimp-research/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CoalHMM analysis of the human/chimpanzee ancestor, based on the orangutan genome</title>
		<link>http://www.mailund.dk/index.php/2011/02/03/coalhmm-analysis-of-the-humanchimpanzee-ancestor-based-on-the-orangutan-genome/</link>
		<comments>http://www.mailund.dk/index.php/2011/02/03/coalhmm-analysis-of-the-humanchimpanzee-ancestor-based-on-the-orangutan-genome/#comments</comments>
		<pubDate>Thu, 03 Feb 2011 16:40:07 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Paper reviews]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2236</guid>
		<description><![CDATA[I&#8217;ve been wanting to write about our paper on the orangutan genome for a while, but I&#8217;ve just been too busy so far, so a little late I finally get to it. Besides the Nature paper, where we contributed to the analysis of the two sub-species of orangutans, we have two companion papers. One is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.mailund.dk/wp-content/uploads/2011/02/Screen-shot-2011-02-03-at-7.10.21-PM.png"><img class="size-thumbnail wp-image-2245 alignright" style="margin: 5px;" title="Nature front cover" src="http://www.mailund.dk/wp-content/uploads/2011/02/Screen-shot-2011-02-03-at-7.10.21-PM-150x150.png" alt="" width="150" height="150" /></a>I&#8217;ve been wanting to write about our paper on the orangutan genome for a while, but I&#8217;ve just been too busy so far, so a little late I finally get to it.</p>
<p>Besides the Nature paper, where we contributed to the analysis of the two sub-species of orangutans, we have two companion papers. One is already out in &#8220;early access&#8221; at Genome Research and the other will be out later in PLoS Genetics. Since the latter paper is not out yet, this post will be about the Genome Research paper.</p>
<h2>Coalescent in an isolation model</h2>
<p>Since all our work is based on coalescent theory and in particular CoalHMMs, I&#8217;ll start there.</p>
<p>Imagine we have two species, and we sample a gene in each. We can then ask, what is the divergence between the two genes? This divergence will be determined by 1) the divergence of the two species, let&#8217;s call that <em>T</em>, and 2) the coalescence time between the two genes within the ancestral species, let&#8217;s call that <em>C</em>.</p>
<p>The species divergence we assume is fixed for all genes, so while it is unknown it is not a stochastic variable. The coalescence time, however, is stochastic, and from <a href="http://en.wikipedia.org/wiki/Coalescent_theory">coalescence theory</a> we expect it to be <a href="http://en.wikipedia.org/wiki/Exponential_distribution">exponentially distributed</a> with a rate determined by the <a href="http://en.wikipedia.org/wiki/Effective_population_size">effective population size</a> in the ancestral species.</p>
<p>We call this setup an <em>isolation model</em>, and we will use the distribution of divergence times to make inference about the speciation time and the effective population size in the ancestral species.</p>
<p>The figure below illustrates the setup.</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2011/02/IM-coalescence.png"><img class="aligncenter size-medium wp-image-2239" title="Isolation model" src="http://www.mailund.dk/wp-content/uploads/2011/02/IM-coalescence-300x273.png" alt="" width="300" height="273" /></a>If <em>C</em> is exponentially distributed, and the divergence is given by <em>D=C+T</em>, then we can make inference about both parameters as follows: We sample a number of independent genes and get their divergence time. For the exponential distribution, the mean is equal to the standard deviation, so looking at the standard deviation of the divergences we can get the parameter for the exponential distribution. That gives us the mean value of <em>C</em>, and if we then look at <em>D-</em>E[<em>C</em>] we get an estimate for <em>T.</em></p>
<p>Below is an example of this, where I&#8217;ve estimated the coalescence rate and divergence time from 50 divergence samples.</p>
<p style="text-align: left;"><a href="http://www.mailund.dk/wp-content/uploads/2011/02/Estimates-in-the-isolation-model.png"><img class="aligncenter size-medium wp-image-2241" title="Estimates" src="http://www.mailund.dk/wp-content/uploads/2011/02/Estimates-in-the-isolation-model-300x300.png" alt="" width="300" height="300" /></a></p>
<h2>Complications</h2>
<p style="text-align: left;">This is all very simple, but there are a few problems.</p>
<p style="text-align: left;">First, you don&#8217;t really get independent samples of the divergence time between two species. If you sample <em>n</em> individuals from the first species and <em>m</em> from the second, the <em>n</em> in the first species will all have found a common ancestor before that lineage reach the ancestral species, and the same goes for the <em>m</em> samples in the other species. So no matter how many individuals you look at, you end up with a sample of two in the ancestral species. I&#8217;ve written about this before <a href="http://www.mailund.dk/index.php/2009/02/12/on-gene-trees-and-species-trees/">here</a>.</p>
<p style="text-align: left;">It is not a show-stopper, though, since genes in different parts of the genome are close enough to independent. So if you sample different loci instead of different individuals, you get your independent samples. So while adding more individuals won&#8217;t help, having an entire genome to look at gives you plenty of samples.</p>
<p style="text-align: left;">The second problem is that we cannot actually get samples of the divergence time. You cannot look at two pieces of DNA and from that get their divergence. You need to estimate it. It isn&#8217;t really that hard, since you can get a good estimate from the number of differences between the two sequences. That is, if the entire alignment of sequences have the same divergence time.</p>
<p style="text-align: left;">If there is a recombination somewhere in the sequences, they do <em>not</em> have the same divergence time, and you cannot estimate the divergence.</p>
<p style="text-align: left;">You can get around this by looking at short DNA segments, where you expect few if any recombinations. You won&#8217;t get a good estimate of the divergence then, but you can maybe alleviate this by having a lot of genes (but estimating the coalescence rate based on a standard deviation that have contributions from both the coalescence process and the estimation problems is, well, problematic).</p>
<p style="text-align: left;">You&#8217;d also have to throw most of your data away if you are looking at short segments scattered along the genome (and you cannot have them too close to each other, because then they will no longer be independent).</p>
<h2>The CoalHMM approach</h2>
<p style="text-align: left;">The models we develop to deal with this are based on <a href="http://en.wikipedia.org/wiki/Hidden_Markov_model">hidden Markov models</a>.</p>
<p style="text-align: left;">Using these models, we can estimate the divergence time for single nucleotides. Normally you cannot, since they are either equal or difference, and that doesn&#8217;t tell you much about their divergence (is it zero for equal and infinity for different?). We can do this, because the flanking DNA contains information about this, whether recombinations have occurred or not, and we can capture this information through the Markov model.</p>
<p style="text-align: left;">It is a rough approximation to the coalescence process, but as far as we can tell, it works reasonably well.</p>
<p style="text-align: left;">We are getting pretty close to being able to estimate the distribution of divergence times using hidden Markov models, but the model we use is the one that will be published in PLoS Genetics soon and not the model we used in the Genome Research paper, so I&#8217;ll wait a bit with describing how that works.</p>
<p style="text-align: left;">The model we used in the Genome Research paper is the one described in <a href="http://www.mailund.dk/index.php/2009/09/22/new-coalhmm-paper-out/">this paper</a>.</p>
<p style="text-align: left;">In this model, we do not attempt to estimate the actual divergence times, but instead use something called <em>incomplete lineage sorting.</em> The idea here is, that if we have a third species closely related to the other two, then sometimes the two sister species have such deep divergence times, that one of them can end up being closer related to the third species than its sister species.</p>
<p style="text-align: left;"><a href="http://www.mailund.dk/wp-content/uploads/2011/02/ILS.png"><img class="aligncenter size-medium wp-image-2243" title="Incomplete lineage sorting" src="http://www.mailund.dk/wp-content/uploads/2011/02/ILS-300x251.png" alt="" width="300" height="251" /></a>This leaves a stronger signal in the DNA and is thus easier to model and make inference about.</p>
<p style="text-align: left;">The model based on this needs only four states: one state where the two sister species coalesce early, and three states with deep divergence. If the divergence is deep, the topology of relationships between the species should be uniform &#8212; each topology is seen with one third probability &#8212; and how often we see deep divergences is given by the two speciation times together with the effective population size of the ancestor of the sister species.</p>
<p style="text-align: left;">As we scan along a genome alignment, we can infer how often we see recent divergences and how often we see deep divergences, and how the deep divergences are distributed along the three topologies.</p>
<p style="text-align: left;">Below is a figure that Julien made for illustrating this.</p>
<p style="text-align: left;"><a href="http://www.mailund.dk/wp-content/uploads/2011/02/Screen-shot-2011-02-03-at-6.47.24-PM.png"><img class="aligncenter size-medium wp-image-2244" title="ILS along a genome alignment" src="http://www.mailund.dk/wp-content/uploads/2011/02/Screen-shot-2011-02-03-at-6.47.24-PM-300x169.png" alt="" width="300" height="169" /></a>With this model, you don&#8217;t extract as much information from the genomes as you would if you could estimate the divergence times, but with full genomes to work with, you have plenty of information to get good estimates.</p>
<p style="text-align: left;">You need three closely related species to work with, though.</p>
<h2>Incomplete lineage sorting patterns among human, chimpanzee and orangutan suggest recent orangutan speciation and widespread selection</h2>
<p>And now, finally, we get to the paper.</p>
<blockquote><p><strong><a href="http://genome.cshlp.org/content/early/2011/01/26/gr.114751.110.abstract">Incomplete lineage sorting patterns among human, chimpanzee and orangutan suggest recent orangutan speciation and widespread selectio</a>n</strong><br />
Asger Hobolth, Julien Y. Dutheil, John Hawks, Mikkel H. Schierup and Thomas Mailund</p>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p>We search the complete orangutan genome for regions where humans are more closely related to orangutans than to chimpanzees due to incomplete lineage sorting (ILS) in the ancestor of human and chimpanzees. The search uses our recently developed coalescent HMM framework. We find ILS present in ~1% of the genome, and that the ancestral species of human and chimpanzees never experienced a severe population bottleneck. The existence of ILS is validated with simulations, site pattern analysis, and analysis of rare genomic events. The existence of ILS allows us to disentangle the time of isolation of humans and orangutans (the speciation time) from the genetic divergence time, and we find speciation to be as recent as 9-13 mya (contingent on the calibration point). The analyses provide further support for a recent speciation of human and chimpanzee at ~4 mya and a diverse ancestor of human and chimpanzee with an effective population size of ~50,000 individuals. Posterior decoding infers ILS for each nucleotide in the genome and we use this to deduce patterns of selection in the ancestral species. We demonstrate the effect of background selection in the common ancestor of humans and chimpanzees. In agreement with predictions from population genetics, ILS found to be reduced in exons and gene dense regions when we control for confounding factors such as GC content and recombination rate. Finally, we find the broad scale recombination rate to be conserved through the complete ape phylogeny.</p></blockquote>
<p>In this paper we used humans, chimpanzees and orangutans.</p>
<p>The first question to ask is then, are these three species close enough that we see incomplete lineage sorting?</p>
<p>Without it, we don&#8217;t have the signal in the data that we need for the model.</p>
<p>Based on previous estimates of the species divergence times and ancestral effective population size of humans and chimpanzees we could work out that some was expected. So that is a good start. To make sure, though, we used some simpler approaches. We looked at indels to check if there would be signals in these supporting clustering of human and orangutan or chimp and orangutan and found that. We also looked at the distribution of alignment columns and again found some signals for alternative topologies of the three species. So with that checked, we applied the model.</p>
<p>From the model we estimate three things: 1) The speciation times for humans and chimps, and from the African apes and orangutan, 2) the effective population size of the ancestral species, and 3) in which regions of the genome humans and chimps, humans and orangutan, and chimp and orangutan are closest related.</p>
<p>I won&#8217;t say much about number two. The effective population size is a weird parameter that can be affected by so many things, that it is really hard to interpret, and right now we just don&#8217;t know what really is important, so I&#8217;d rather not make any claims (but I&#8217;ll say a few things about <em>local</em> effective population sizes towards the end of the post).</p>
<p>Number one is interesting because it tells us something about when humans diverged from the other two apes. Our estimates are measured in the number of substitutions since the divergence, but assuming a molecular clock and assuming we have a good estimate of the rate we can get an estimate in years.</p>
<p>Assuming a rate of around 1 substitution per nucleotide per billion years &#8212; an estimate based on several earlier papers that get this number from calibrations with the fossil record &#8212; we get a human/chimp speciation around 4-4.5 million years ago, and a human/orangutan speciation around 11-13 million years ago.</p>
<p>I really don&#8217;t know how reasonable this is, in relation to the fossil record, so this is when we got <a href="http://johnhawks.net/weblog">John Hawks</a> involved. I have my fingers crossed that he will blog about this at some point.</p>
<p>There are good reasons to be a bit skeptical, though. From recent studies, we know that the substitution rate is lower in humans today, and if that is also true in the past, the estimates should be moved further back in time. We cannot get too far back, though, without running into inconsistencies in the deeper past, but how this will all play out once we do more analysis I cannot say yet. It is something we look into for the gorilla genome (and I&#8217;ll just leave that as a cliff hanger for now, I&#8217;ll get back to it when we have published that genome).</p>
<p>For number three, I don&#8217;t really know. You might be surprised that we are sometimes closer related to the orangutan than the chimpanzee, or you might not. It depends on your prior assumptions, I guess.</p>
<p>We didn&#8217;t really find anything cool correlated to the patterns of relatedness, so we don&#8217;t have much of a story to tell about this.</p>
<h2>Ancestral selection</h2>
<p>The final thing we looked at in the paper was correlations between incomplete lineage sorting and gene density.</p>
<p>Why this is interesting gets a bit technical but has to do with the effective population size.  As I mentioned above, it is a bit of a weird parameter, but one that is affected by selection. If you have a <a href="http://en.wikipedia.org/wiki/Selective_sweep">selective sweep</a> the genetic diversity is reduced, and you see this as a reduction in the effective population size. The same effect is seen with<a href="http://en.wikipedia.org/wiki/Purifying_selection"> purifying selection</a>, where again the genetic diversity is reduced and so is the effective population size.</p>
<p>Incomplete lineage sorting is positively correlated with the effective population size, so if you observe a correlation between incomplete lineage sorting and gene density, it is a signal for selection.</p>
<p>We observe this, and take it as a signal that selection rather than just drift has been a major player in the evolution of our genome.</p>
<p>How much of a surprise this is depends on your prior assumptions again, I guess, but it does indicate that neutrality may not always be the obvious null model for genome analysis.</p>
<p>It is a pretty weak signal for this, though, in this analysis. We see so little incomplete lineage sorting for these three species that it is really hard to analyse it in detail.</p>
<p>When we get human, chimp and gorilla, there is a lot more incomplete lineage sorting, and we can do a lot more. We are seeing some cool signals there, but I&#8217;ll let that be the second cliff hanger for the gorilla genome paper.</p>
<p>&#8211;<br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Genome+Research&amp;rft_id=info%3Adoi%2F10.1101%2Fgr.114751.110&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Incomplete+lineage+sorting+patterns+among+human%2C+chimpanzee+and+orangutan+suggest+recent+orangutan+speciation+and+widespread+selection&amp;rft.issn=1088-9051&amp;rft.date=2011&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fgenome.cshlp.org%2Fcgi%2Fdoi%2F10.1101%2Fgr.114751.110&amp;rft.au=Hobolth%2C+A.&amp;rft.au=Dutheil%2C+J.&amp;rft.au=Hawks%2C+J.&amp;rft.au=Schierup%2C+M.&amp;rft.au=Mailund%2C+T.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CComputer+Science+%2F+Engineering%2CMathematics%2CGenetics%2C+Bioinformatics%2C+Computational+Biology">Hobolth, A., Dutheil, J., Hawks, J., Schierup, M., &amp; Mailund, T. (2011). Incomplete lineage sorting patterns among human, chimpanzee and orangutan suggest recent orangutan speciation and widespread selection <span style="font-style: italic;">Genome Research</span> DOI: <a rev="review" href="http://dx.doi.org/10.1101/gr.114751.110">10.1101/gr.114751.110</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2011/02/03/coalhmm-analysis-of-the-humanchimpanzee-ancestor-based-on-the-orangutan-genome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Some places, not everywhere&#8230;</title>
		<link>http://www.mailund.dk/index.php/2011/01/26/some-places-not-everywhere/</link>
		<comments>http://www.mailund.dk/index.php/2011/01/26/some-places-not-everywhere/#comments</comments>
		<pubDate>Wed, 26 Jan 2011 18:40:00 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2227</guid>
		<description><![CDATA[Just to avoid confusion, if you read this, it doesn&#8217;t imply this. We are suggesting that humans and orangutans are closer related than either to the chimpanzees in ~0.5% of the genome (and chimpanzees and orangutans are closer related than either to humans in another ~0.5%). That has to do with incomplete lineage sorting, and [...]]]></description>
			<content:encoded><![CDATA[<p>Just to avoid confusion, if you read <a href="http://www.sciencedaily.com/releases/2011/01/110126131548.htm">this</a>, it doesn&#8217;t imply <a href="http://www.sciencedaily.com/releases/2009/06/090618084304.htm">this</a>.</p>
<p>We are suggesting that humans and orangutans are closer related than either to the chimpanzees in ~0.5% of the genome (and chimpanzees and orangutans are closer related than either to humans in another ~0.5%). That has to do with <a href="http://www.mailund.dk/index.php/2009/02/12/on-gene-trees-and-species-trees/">incomplete lineage sorting</a>, and does not, in any way, imply that we as a species are closer related to orangutans than to chimpanzees.</p>
<p>Oh, and it doesn&#8217;t really mean that orangutan is our closest living relative in ~0.5% of the genome either. It is just closer than chimpanzee, but the gorilla could be closer related to us in those positions, so&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2011/01/26/some-places-not-everywhere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More orangutan news</title>
		<link>http://www.mailund.dk/index.php/2011/01/26/more-orangutan-news/</link>
		<comments>http://www.mailund.dk/index.php/2011/01/26/more-orangutan-news/#comments</comments>
		<pubDate>Wed, 26 Jan 2011 18:18:32 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2222</guid>
		<description><![CDATA[Some more news coverage of the orangutan genome paper: Orangutan DNA boosts survival chances: study Researchers perform genomic analysis of orangutans Scientists determine what makes an orangutan an orangutan Tiny orangutan populations are surprisingly diverse Genome analysis outlines variations in orangutans of Borneo, Sumatra Orangutan genome full of surprises The paper mentioned at the bottom [...]]]></description>
			<content:encoded><![CDATA[<p>Some more news coverage of the orangutan genome paper:</p>
<ul>
<li><a href="http://www.google.com/hostednews/afp/article/ALeqM5hNui4KU8KhllSoOwhXfca1FidgZg?docId=CNG.f55f656a9f597ee071fe1ead97d63e4a.491">Orangutan DNA boosts survival chances: study</a></li>
<li><a href="http://www.redorbit.com/news/science/1985648/researchers_perform_genomic_analysis_of_orangutans/">Researchers perform genomic analysis of orangutans</a></li>
<li><a href="http://www.nsf.gov/news/news_summ.jsp?cntn_id=118471&amp;org=NSF&amp;from=news">Scientists determine what makes an orangutan an orangutan</a></li>
<li><a href="http://www.newscientist.com/article/dn20036-tiny-orangutan-populations-are-surprisingly-diverse.html">Tiny orangutan populations are surprisingly diverse</a></li>
<li><a href="http://www.bcm.edu/news/item.cfm?newsID=3331">Genome analysis outlines variations in orangutans of Borneo, Sumatra</a></li>
<li><a href="http://news.sciencemag.org/sciencenow/2011/01/orangutan-genome.html?ref=hp">Orangutan genome full of surprises</a></li>
</ul>
<p>The paper mentioned at the bottom of the last one is <a href="http://genome.cshlp.org/content/early/2011/01/26/gr.114751.110.abstract">this one</a>, and there&#8217;s a press release for that one as well:</p>
<ul>
<li><a href="http://genome.cshlp.org/site/press/gr114751.xhtml">Genetic archaeology finds parts of our genome more closely related to orangutans than chimps</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2011/01/26/more-orangutan-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The orangutan genome is out (probably)</title>
		<link>http://www.mailund.dk/index.php/2011/01/26/the-orangutan-genome-is-out-probably/</link>
		<comments>http://www.mailund.dk/index.php/2011/01/26/the-orangutan-genome-is-out-probably/#comments</comments>
		<pubDate>Wed, 26 Jan 2011 17:13:57 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2219</guid>
		<description><![CDATA[I was just told over email that the orangutan genome paper is out at Nature. Right now, I cannot connect to Nature, though, so I cannot really tell. Anyway, I found posts about it here and here. We&#8217;ve been involved in the analysis here in Aarhus, applying our CoalHMM methods, and we will have two [...]]]></description>
			<content:encoded><![CDATA[<p>I was just told over email that the orangutan genome paper is out at Nature. Right now, I cannot connect to Nature, though, so I cannot really tell.</p>
<p>Anyway, I found posts about it <a href="http://content.usatoday.com/communities/sciencefair/post/2011/01/orangutans-share-97-of-genes-with-humans/1">here</a> and <a href="http://www.punemirror.in/article/26/201101252011012522080696235e06310/Orangutan-DNA-more-diverse-than-human.html">here</a>.</p>
<p>We&#8217;ve been involved in the analysis here in Aarhus, applying our CoalHMM methods, and we will have two companion papers out. The first in Genome Research &#8211; any minute now, really, it is supposed to come out today &#8211; and the second in PLoS Genetics &#8211; not sure when, it is in the pipeline but I haven&#8217;t received a release date yet.</p>
<p>We&#8217;ve received a lot of questions to the Genome Research paper the last couple of days, and I&#8217;m busy answering emails right now, but I&#8217;ll be back and commenting on it here as soon as I have the time.</p>
<p><strong>Update: </strong>Ah, Nature is up again, and you can start reading <a href="http://www.nature.com/news/2011/110121/full/news.2011.50.html">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2011/01/26/the-orangutan-genome-is-out-probably/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Neanderthal genome paper is out</title>
		<link>http://www.mailund.dk/index.php/2010/05/07/neanderthal-genome-paper-is-out/</link>
		<comments>http://www.mailund.dk/index.php/2010/05/07/neanderthal-genome-paper-is-out/#comments</comments>
		<pubDate>Fri, 07 May 2010 05:46:07 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[neanderthals]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2169</guid>
		<description><![CDATA[What an exciting thing to wake up to!  The neanderthal genome has now been published. Read the buzz about it here: Anthropology.net Dieneks&#8217; Anthropology John Hawks Byte Size Biology while I go read the actual paper.]]></description>
			<content:encoded><![CDATA[<p>What an exciting thing to wake up to!  The neanderthal genome has now been published.</p>
<p>Read the buzz about it here:</p>
<ul>
<li><a href="http://anthropology.net/2010/05/06/the-neandertal-draft-genome/">Anthropology.net</a></li>
<li><a href="http://dienekes.blogspot.com/2010/05/tales-of-neanderthal-admixture-in.html">Dieneks&#8217; Anthropology</a></li>
<li><a href="http://johnhawks.net/weblog/reviews/neandertals/neandertal_dna/neandertals-live-genome-sequencing-2010.html">John Hawks</a></li>
<li><a href="http://bytesizebio.net/index.php/2010/05/06/there-is-a-little-bit-of-neanderthal-in-many-of-us/">Byte Size Biology</a></li>
</ul>
<p>while I go read the <a href="http://www.sciencemag.org/cgi/content/full/328/5979/710">actual paper</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2010/05/07/neanderthal-genome-paper-is-out/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Horizontal gene transfer and alien invaders</title>
		<link>http://www.mailund.dk/index.php/2010/05/01/horizontal-gene-transfer-and-alien-invaders/</link>
		<comments>http://www.mailund.dk/index.php/2010/05/01/horizontal-gene-transfer-and-alien-invaders/#comments</comments>
		<pubDate>Sat, 01 May 2010 05:47:14 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2135</guid>
		<description><![CDATA[From Science Daily: Researchers at The University of Texas at Arlington have found the first solid evidence of horizontal DNA transfer, the movement of genetic material among non-mating species, between parasitic invertebrates and some of their vertebrate hosts. Genome biologist Cédric Feschotte and postdoctoral researchers Clément Gilbert and Sarah Schaack found evidence of horizontal transfer [...]]]></description>
			<content:encoded><![CDATA[<p>From <a href="http://www.sciencedaily.com/releases/2010/04/100430155856.htm">Science Daily</a>:</p>
<blockquote><p>Researchers at The University of Texas at Arlington have found the first solid evidence of horizontal DNA transfer, the movement of genetic material among non-mating species, between parasitic invertebrates and some of their vertebrate hosts.</p>
<p>Genome biologist Cédric Feschotte and postdoctoral researchers Clément Gilbert and Sarah Schaack found evidence of horizontal transfer of transposon from a South American blood-sucking bug and a pond snail to their hosts. A transposon is a segment of DNA that can replicate itself and move around to different positions within the genome. Transposons can cause mutations, change the amount of DNA in the cell and dramatically influence the structure and function of the genomes where they reside.</p></blockquote>
<p>I heard about this in February where I was at a <a href="http://mbi.osu.edu/2009/ws4description.html">meeting</a> where Cédric gave a talk.</p>
<p>What they have found is families of transposons in different branches of mammals that doesn&#8217;t seem to have been inherited from further up the phylogeny.  Some distantly related mammals have them, but their close relations do not.  They appear to have just popped out of no where (so Cédric calls them &#8220;space invaders&#8221;).</p>
<p>They seem to have entered the genomes at roughly the same time, a time where the ancestors of those species have lived in the same area, and what points to horizontal gene transfer is that parasites that would have fed on these animals do have the same transposon family.</p>
<p>His talk at the meeting was recorded (all the talks were) but I haven&#8217;t yet found the videos online so I guess they are still being processed or something.  When I find them, I&#8217;ll let you know.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2010/05/01/horizontal-gene-transfer-and-alien-invaders/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On non-genic disease polymorphism</title>
		<link>http://www.mailund.dk/index.php/2010/04/27/on-non-genic-disease-polymorphism/</link>
		<comments>http://www.mailund.dk/index.php/2010/04/27/on-non-genic-disease-polymorphism/#comments</comments>
		<pubDate>Tue, 27 Apr 2010 15:51:29 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[association mapping]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2116</guid>
		<description><![CDATA[Daniel MacArthur discusses genome-wide association studies which has so far mainly found disease associated polymorphisms outside of genes. The claim in question is that the tendency of GWAS to find disease associations outside of protein-coding genes is somehow a problem; but, as p-ter notes, there&#8217;s perfectly plausible reasons for disease risk variants to be found [...]]]></description>
			<content:encoded><![CDATA[<p>Daniel MacArthur discusses genome-wide association studies which has so far <a href="http://scienceblogs.com/geneticfuture/2010/04/why_disease_associations_outsi.php">mainly found disease associated polymorphisms outside of genes</a>.</p>
<blockquote><p>The claim in question is that the tendency of GWAS to find disease associations outside of protein-coding genes is somehow a problem; but, as p-ter notes, there&#8217;s perfectly plausible reasons for disease risk variants to be found in non-coding regions.</p>
<div>Indeed, I think most of us working in genomics have seen the proliferation of non-coding hits in GWAS studies as a positive, in that it seems to be teaching us something new and unexpected about the underlying biology of human variation.</div>
</blockquote>
<p>There <em>is</em> a problem with polymorphisms outside of genes.  We generally have no idea how they functionally affect us to increase or decrease the disease risk.  If we have no idea what a given polymorphism means in terms of function, it is harder to work out; we don&#8217;t really know where to start with figuring it out.</p>
<p>As far as I can see, though, that is the only problem with that.</p>
<p>That&#8217;s it, though, as far as I can see.  If the polymorphism is statistically significant associated with the disease, and we can replicate this in independent data, then that is what the data is saying.  It might be inconvenient, but tough luck!  No one promised us that this would be easy.</p>
<p>Quoting from <a href="http://www.gnxp.com/wp/uncategorized/how-do-non-genic-polymorphisms-influence-disease-risk">Gene Expression</a>:</p>
<blockquote><p>Their answer to this rhetorical question is that common SNPs (used on current genotyping platforms) are generally nonfunctional. The alternative, the evidence for which I’ll present here, is that <strong>our ability to predict functional SNPs is poor. In the phrase “no known function”, the emphasis should be on the word “known”</strong>.</p></blockquote>
<p>GWA studies have been a great success in locating polymorphisms associated with disease, that we can actually replicate.</p>
<p>Sure, we are working with very large data sets here, and false positives is a major problem (see e.g. <a href="http://www.mailund.dk/index.php/2009/07/02/true-positives-and-false-positives/">here</a> and <a href="http://www.mailund.dk/index.php/2009/07/05/false-positives-and-large-sample-sizes/">here</a>), but this is a problem we can handle.</p>
<p><img class="aligncenter size-full wp-image-2117" title="There's a solution for any problem" src="http://www.mailund.dk/wp-content/uploads/2010/04/solutions.png" alt="" width="500" height="455" /></p>
<p>And sure, GWA lets us find only the CD/CV type of disease associations and not all diseases will follow this pattern, but with the success of GWA studies so far, I think it is fair to say that there are enough to be found here to make it worthwhile!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2010/04/27/on-non-genic-disease-polymorphism/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Estimating Divergence Times</title>
		<link>http://www.mailund.dk/index.php/2010/04/26/estimating-divergence-times/</link>
		<comments>http://www.mailund.dk/index.php/2010/04/26/estimating-divergence-times/#comments</comments>
		<pubDate>Mon, 26 Apr 2010 17:14:43 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Teaching]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2100</guid>
		<description><![CDATA[Below is the introduction text to some lecture notes I&#8217;m working on.  I&#8217;m putting them up here to get some feedback, since this is the part in my lecture notes I am the least sure about. The rest of the notes will be on mathematical models, and I am pretty confident that I understand those, [...]]]></description>
			<content:encoded><![CDATA[<p>Below is the introduction text to some lecture notes I&#8217;m working on.  I&#8217;m putting them up here to get some feedback, since this is the part in my lecture notes I am the least sure about. The rest of the notes will be on mathematical models, and I am pretty confident that I understand those, but my paleontology knowledge is shaky at best, so any corrections, comments or suggestions for papers I should read will be most welcome!</p>
<h3>Fossil and genetic evidence: Lower and upper bounds</h3>
<p>When estimating the evolutionary relationship between species we have two sources of data we can use to date when species diverged: fossil evidence and genetic evidence, the latter based on the assumption of the molecular clock that lets us estimate divergence time based on the observed differences between genomic sequences. Both are by their very nature biased, but in opposite direction. Dates based on fossil evidence gives us a lower bound on the speciation time, while genetic evidence gives us upper bounds on the speciation time [1].</p>
<p>Fossils can be dated reasonably accurate through physical or geological methods, but they rely on morphological differences between species. Morphological characteristics unique to one set of species, when found in a fossil, tells us that the given group of species diverged from other species before the time where the fossil species existed. Deciding which morphological features are unique to a given group of species is, of course, somewhat subjective, but ignoring this, the fossil date is only a lower bound of the species split since the morphological features will have to have evolved after the species split. How long it took for these features to evolve, plus how close the fossil is to the emergence of the features considering possible gaps in the fossil record, influences how tight the lower bound is.</p>
<p>For genetic data, on the other hand, population genetics in ancestral species influences the dating of species splits. The coalescence process [2] in population genetics means that when we consider two genomes in the same population, they have a most recent common ancestor (MRCA) some distance back in time. When considering two genomes from different species, the MRCA is found at a distance back in time first given by the divergence of the species, and then the divergence the two genomes have within the ancestral species.</p>
<p>The divergence of genomes within a species depends on the effective population size; a technical term referring to the the population size of reproducing genomes. The larger the effective population size, the further back in time the MRCA will be found. On average, the number of generations back in time the MRCA will be found is equal to the effective population size. So for two genomes in the same species, we expect their MRCA to be found 2N<sub>e</sub> generations back in time, where N<sub>e</sub> is the number of diploid individuals reproducing, and the factor of two because in a diploid population of (effective) size N<sub>e</sub>, there are 2N<sub>e</sub> genomes.</p>
<p>The genetic distance between two species is therefore given by the species split plus 2N<sub>e</sub> generations, and the genetic distance is thus an upper bound on the species divergence.</p>
<p style="text-align: center;"><a href="http://www.mailund.dk/wp-content/uploads/2010/04/Upper-and-lower-bounds-of-species-divergence.png"><img class="aligncenter size-medium wp-image-2101" style="border: 1px solid black;" title="Upper and lower bounds of species divergence" src="http://www.mailund.dk/wp-content/uploads/2010/04/Upper-and-lower-bounds-of-species-divergence-235x300.png" alt="" width="235" height="300" /></a></p>
<p>For humans, the effective population size is ~10,000, so two random human genomes are expected to have diverged around 20,000 generations ago, or 400 thousand years ago (kya) assuming a generation time of 20 years along the lineages back to the MRCA. This puts the sequence divergence of humans, who’s species divergence is of course zero, back to a point before the evolution of modern Man and before the speciation between modern humans and Neanderthals.</p>
<h3>The molecular clock</h3>
<p>We cannot directly observe the divergence between genomes, so genetic dating of speciation relies on the observed differences between genomic sequences. An underlying assumption when doing this is that mutations to genomic sequences occur at a constant rate through time, so the number of mutations are proportional to the time between the genomes; two times the divergence time, since mutations occur on both branches of the split.</p>
<p>We cannot directly observe the number of mutations that occurred between species either. We can only observe the differences between observed sequences. Mutations that occur on lineages that are eventually lost in a population because they leave no present day offspring cannot observed. Only those that survive to be observed in the genomic sequences we can observe. Mutations that spread to the entire species we say gets fixed in the species, and we call such mutations substitutions. When comparing genomic sequences from different species, we mainly observe such fixed mutations unless the species are very closely related and polymorphism in the ancestral species has not been fixed within the decedent species.</p>
<p>For neutrally evolving sequences, sequences not under selection, the number of substitutions is equal to the number of mutations [2]. That is, the number of substitutions that are fixed within a species through the population genetics process are equal to the number of mutations that occur within the species. For species such as primates, we expect most of the genome to be evolving neutrally, since the genomes of these species consists mainly of “junk” DNA that is unlikely to be under selection.</p>
<p>Assuming that the sequences are mainly evolving neutrally, and assuming that mutations occur at a regular rate, we can estimate the number of mutations that occurred between two species using so-called substitution models, that compensate for recurring mutations, mutations at the same genomic site, and translates the number of observed differences between two sequences into expected number of mutations that occurred.</p>
<p>Since mutations enter the sequences through a chemical/physical process, the assumption of a regular rate is not far fetched, and in general there is a close correlation between divergence of species from fossil evidence and the number of mutations estimated from the substitution models. The rate of substitutions does seem to vary somewhat between divergent species groups, with a slow-down in apes compared to old world monkeys and with slight variations even within different primate groups [3]. Within a group of closely related species, however, such as the great apes, the evidence generally seems to justify the molecular clock assumption [3].</p>
<p>There is one important caveat, however: We might be able to estimate the number of mutations that occurred but if we do not know the rate in which new mutations occur we cannot translate the number of mutations into years of divergence.</p>
<h3>Calibrating the molecular clock</h3>
<p>To translate the number of mutations that occurred in a time interval into the number of years of the time interval, we need need to know either the rate with which mutations occur, or how long the time interval was. We  to calibrate the molecular clock.</p>
<p>The approach typically taken is to have a calibration point, a point in time where we are reasonably sure we know the divergence time of two sequences in years, and use the number of mutations between the two sequences to give us the of mutations per of years.</p>
<p>If we pick a point far enough back in time, the relative difference between the sequence divergence and the species divergence will be small. The difference between the two will be 2N<sub>e</sub> generations which might be a difference of hundreds of thousands of years; relatively little if the species divergence is in millions of years.</p>
<p>Of course, we cannot go far enough back in time that the mutation rate has changed, so there is a trade-off between the relative difference in sequence distance and species difference and how conserved the mutation rate is.</p>
<p>For dating the evolution of great apes, one calibration point is the divergence between old world monkeys and apes (<em>catarrhines;</em> lesser apes and greater apes). Based on fossil evidence we expect the split to be between ~20 million years ago (mya) and ~30 mya [1]. That is, we have fossils indicating that the split had occurred ~20 mya and fossils that are believed to be older than the split at ~30 mya.</p>
<p>Only the lower bound of this informs us of the split time, however. The lack of fossil evidence is not evidence that the split occurred later than ~30 mya. Absence of evidence, after all, is not evidence of absence.</p>
<p>Still, it gives us a tentative calibration point, with a relative uncertainty of ~30% of the divergence of the two groups of species.</p>
<h3>Consequences of incorrect calibration</h3>
<p>The genetic (sequence) divergence between two genomes is an upper bound of the species divergence, but a consequence of the calibration problem, genetic estimates of divergence can turn out to be underestimating the speciation times.</p>
<p>If the calibration point underestimates the number of years between the species split, the number of mutations per year will also be underestimated. Consequently, the genetic estimates, while over-estimating the species split in number of mutations, will underestimate the the years separating genomes [1].</p>
<p style="text-align: center;"><a href="http://www.mailund.dk/wp-content/uploads/2010/04/Underestimates-based-on-miscalibration.png"><img class="aligncenter size-medium wp-image-2102" style="border: 1px solid black;" title="Underestimates based on miscalibration" src="http://www.mailund.dk/wp-content/uploads/2010/04/Underestimates-based-on-miscalibration-289x300.png" alt="" width="289" height="300" /></a></p>
<p>This has consequences for our inference about the evolution of great apes and the relationship between humans and our ancestors. Calibrating the molecular clock based on an old world monkey / ape divergence of ~25 mya ago, a time point in the middle of the expected divergence time, will put fossils such as <em>Ardipithecus, Orrorin</em> and <em>Sahelanthropus</em> further back in time than the split between human and chimpanzee, while a calibration point based on a ~30 mya divergence of old world monkeys and apes would put the same fossils after the split between human and chimpanzee; potentially on the lineage leading to humans [1,4].</p>
<p>Conversely, assuming that <em>Sahelanthropus</em> is on the human-specific lineage puts the human-chimpanzee split in the range of 6-7 mya. Using this as a calibration point, the ape / old world monkey divergence is estimated ~27 mya for the lower end of the calibration interval and ~36 mya for the upper range of the calibration interval [3].</p>
<p>Incorrect calibration of the molecular clock can thus turn, what should be an upper bound into under estimates of the sequence divergence, when measured in years rather than number of mutations. An underestimate of an upper bound tells us next to nothing about the true value, unless we have some grasp of how tight the bounds are, but unfortunately this is the best knowledge we currently have about the divergence time of species.</p>
<p>Our best approach to alleviating this problem is working out the uncertainties in the upper and lower bounds, and that way discarding extreme consequences of the calibration.</p>
<p>From population genetics theory we can make inference about the relative over-estimation caused by the sequence divergence within the coalescence process and disentangle the species divergence from the sequence divergence [5,6]. From this we can tighten the intervals consistent with the fossil record.</p>
<h3>References</h3>
<ol>
<li>Stepier, M.E. &amp; Young, N.M. Timing primate evolution: Lessons from the discordance between molecular and paleontological estimates. <em>Evol Anthropol</em> <strong>17</strong>, 179-188 (2008).</li>
<li>Hein, J., Schierup, M.H. &amp; Wiuf, C. Genegenealogies, variation and evolution: A primer in coalescent theory. Oxford University Press (2005).</li>
<li>Steiper, M.E. &amp; Young, N.M. Primate molecular divergence dates. <em>Mol Phylogenet Evol </em><strong>41</strong>, 384-394 (2006).</li>
<li>Stauffer, R.L., Walker, A., Ryder, O.A., Lyons-Weiler, M. &amp; Hedges, S.B. Human and ape molecular clocks and constraints on paleontological hypotheses. <em>J Hered</em> <strong>92</strong>, 469-474 (2001).</li>
<li>Dutheil, J.Y. <em>et al.</em> Ancestral population genomics: The coalescence hidden Markov model approach. <em>Genetics</em> <strong>183</strong>, 259-274 (2009).</li>
<li>Hobolth, A., Christensen, O.F., Mailund, T. &amp; Schierup, M.H. Genomic relationship and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov Model. <em>PLoS Genet</em> <strong>3</strong> (2007).</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2010/04/26/estimating-divergence-times/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

