Looking for genome assembly papers

I’m starting a journal club for the computer scientists at BiRC.  Since it is me starting it up, I have egoistically chosen a subject I personally would like to learn more about.  Algorithms for Next Generation Sequencing assembly.

Now I don’t even know much about the “plain old” sequence assembly algorithms, so I want to start out with a few papers on that and then move to newer algorithms.

The only problem is that I don’t know which papers are important and which are not.

So if anyone have any ideas, please please let me know!


This week in the blogs

Well, everyone else seems to summarise the posts they found interesting during the week, so it is only fair that I get to as well.  Even with my new year resolution of posting on average a post per day, I cannot cover all the posts I find interesting, so it also gives me an opportunity to simply list a lot of links and perhaps group related posts so you have a chance of reading them together.

In this first installation, though, I’m going to go back a little further this month as well, though, since I collected a few interesting links there. Anyway, here goes:


  1. Sequences from first settlers reveal rapid evolution in Icelandic mtDNA pool (PLoS Genetics)
    1. Genetic variation in space & time – Iceland (Gene Expression)
    2. The genetic history of Iceland (Genetic Future)
    3. Ancient DNA analysis of the Icelandic settlers (Me!)
    4. Genetic drift eliminated rare mtDNA haplotypes from Iceland (John Hawks)
    5. mtDNA selection in Iceland? (John Hawks)
  2. Pervasive Hitchhiking at coding and regulatory sites in humans (PLoS Genetics)
    1. Humans have adapted on genome-wide level? (Gene Expression)
    2. How much selection is going on in humans? (Me!)
  3. A genome-wide genetic signature of Jewish ancestry perfectly separates individuals with and without full Jewish ancestry in a large random sample of European Americans (Genome Biology)
    1. How Ashkenazi Jewish are you? (Gene Expression)
    2. Another paper on Ashkenazi Jewish distinctiveness (Dienekes)

Sequences and alignments

  1. Phylogenetic inference under recombination using Bayesian stochastic topology selection (Bioinformatics)
    1. Phylogenetic inference under recombination using Bayesian stochastic topology selection (Me!)
  2. The experts agree (Finchtalk)


  1. Dynamic languages: Not just for scripting any more (CIO)
  2. Emacs 23 (emacs-fu)


  1. Making classes interactive: better learning or just more fun? (Discovering Biology in a Digital World)
  2. TeacherTube: YouTube for teachers (Discovering Biology in a Digital World)
  3. Students know what physicists believe, but they don’t agree: A study using the CLASS survey (Phys. Rev. ST Phys. Educ.)
    1. Students know what physicists belive, but they don’t agree (Uncertain Principles)

Peer reviewing

  1. How are the mighty fallen (Michael Nielsen)
  2. Three myths about scientific peer review (Michael Nielsen)


There and back again

We came back from Beijing yesterday evening — our luggage this afternoon — and I am still pretty tired after the trip.  I had to leave the office early afternoon, when my brain just shut down.  So don’t expect any quality posts from me for a day or two.

In the mean time, I’ll just refer you to this list of predictions for 2009.

Most I agree with, with the possible exception of this one:

We will not see a retail complete genome sequence offered for less than $1000. I’d be happy to be proven wrong here, mind you, but I just can’t see prices tumbling this far over the next twelve months – even with the huge competition and rapid technological advances in the DNA sequencing sector. Of course, it depends what you mean by “complete” – it will no doubt be possible to offer a fragmentary, low-coverage genome at this price by the end of the year, but such a product would be almost worse than no information at all. Alternatively, cut-price genome sequences may be offered by companies at a loss, to attract attention and create a more sustainable long-term market.

It sounds very reasonable, but I’ve been pretty pessimistic in my predictions about genotyping and sequencing technology in the past, so I will choose to be optimistic for 2009 and predict that we will see a $1000 genome.


How much for that genome?

It is a quiet Saturday morning.  I am slightly hung over.  My scripts are scanning through a genome and I am just sitting here waiting for them to finish with nothing much to do.

So I started thinking.  How much does it actually cost to get a new genome, these days? If I wanted my own genome sequenced, how much would I have to pay and how long would it take to get it?

The (first) human genome project cost about $3 billion (about $300 million for Celera) and took about 10 years (1990 to 2000 for the first assembly, then three more years for completion, but let’s just say 10 here).

Now they want to sequence 1000 humans in three years for $30-50 million. Next generation sequencing techniques lowered the cost of that project by a factor of 10. Of course, it helps a lot to have the original genome to assemble up against as well.

I’ve asked Roald about the price for the first “arab genome”, but I haven’t gotten an answer yet.  I guess he doesn’t work Saturday mornings ;-)

The genomics age

Some people say we are in the “post genomic” age, but really we are just in the middle of the genomics age if anything.  We are seeing an explosion in new genomes sequenced.

From GOLD you can download some statistics on genome projects. Plotting the total number of genomes published against years, you clearly see the explosive increase in data:

It is even more impressive when you consider all genome projects and not just the published genomes so far:

Statistics at NCBI says we have 22 complete Eukaryote genomes, 161 with a draft assembly and 176 in progress. For Prokaryotes the numbers are 749 complete, 540 draft and 676 in progress.

It doesn’t say anything about the cost of sequencing genomes, though, so I don’t know how much the price has dropped over time.

I was a bit surprised to learn that the only mammals considered complete are mouse and man. There are 22 mammals with draft assemblies and another 26 in progress.  Will the draft genomes be completed any time soon?