Tenfold cost reduction in exon resequencing

Yesterday Roald Forsberg sent me this link: NHLBI, NHGRI Offer $12M to Cut Cost of Exon
Sequencing for Large-Scale Disease Studies

This is a research project that will offer four awards totalling $12M to help reduce the cost of reseqencing all exons, making this a viable approach to whole genome association mapping.

[The project will] complement the 1,000 Genomes Project that the NHGRI, the Wellcome Trust Sanger Institute, and the Beijing Genomics Institute announced last week.

Unlike that project, which aims to catalog genetic variations in human populations unbiased for disease, the resequencing technology program is geared towards studies that will correlate sequence variations with disease phenotypes.

When writing the grant proposal for my new project (that started yesterday) I argued that it was time to consider whole genome resequencing for association mapping and that we would see whole genome resequencing in association mapping within five to ten years. I am beginning to think I was being pessimistic here. Sure, this only focus on exons, but if we can sequence all exons, then the entire genome is not far behind.

Imputation based association mapping


A neat idea that has become quite popular the last two years is to consider untyped genotypes as “missing data” and then impute this data using panel data such as HapMap. Imputing all missing markers and then testing those is the ultimate multi-marker association mapping method: you directly test all markers and if you get a hit you immediately know which marker to try to replicate.

Well, I might be overselling it a bit here — there are some situations where imputing markers and testing them individually won’t actually help you and where other multi-marker methods will — but it is a very nice idea and the output is very easy to interpret.

Unfortunately, imputation methods can be pretty slow. We’ve used FastPHASE in our projects, and while it works fine for smaller regions, it is too computationally intensive for whole genome imputations (at least with the computers we have access to).

In this issue of Bioinformatics there’s an application note describing a new tool for doing imputation based association mapping:

Association studies for untyped markers with TUNA

Xiaoquan Wen and Dan L. Nicolae
Bioinformatics 2008 24(3):435-437; doi:10.1093/bioinformatics/btm603

Rather than imputing the actual markers, they impute frequencies of the missing markers in the cases and the controls and that significantly improves both the running time and the memory usage.

Getting only the frequencies will not help us using multi-marker methods on imputed data, but for single marker tests (at least tests that only use the frequencies) I imagine it could be a very useful tool.

Citation for Research Blogging:
Wen, X., Nicolae, D.L. (2008). Association studies for untyped markers with TUNA. Bioinformatics, 24(3), 435-437. DOI: 10.1093/bioinformatics/btm603

My new project: Computational challenges in disease mapping

Today I officially start my new research project Computational Challenges in Disease Mapping.

The project is funded by the Danish Research Council (FNU — Forskningsrådet for Natur og Univers) and is running for three years. It is a direct continuation of the association mapping project I’ve been working on for the last two years.

The 1000 genomes project

I’m absolutely thrilled that we have reached the technological level where it is possible to sequence 1000 genomes just to learn more about human genetic variation.We have learned a lot from the HapMap project about common variation and this knowledge has lead to an explosion in discoveries of genetic factors in several diseases. With actual sequencing of genomes we should also learn about less common genetic variation and who knows where that will take us?I’ve actually known about this project for a while from some of the people involved, but this is the first time I’ve seen it mentioned online, so I thought I would link to it today :)