The expanding world of small RNAs

The regulatory machinery built from small RNA genes is a fairly recent discovery and a very exciting one. There’s a short overview paper about it in Nature:

Molecular biology: The expanding world of small RNAs Nature 451, 414 (2008). doi:10.1038/451414a Helge Großhans & Witold Filipowicz

Automated programming for bioinformatics algorithm deployment

Automatically providing a GUI to bioinformatics algorithms by analysing the source code (and maybe some documentation comments). That is a neat idea.

Automated programming for bioinformatics algorithm deployment
Alterovitz et al.

Bioinformatics 2008 24(3):450-451; doi:10.1093/bioinformatics/btm602

I don’t have any Java code around to test this on, so I don’t know how well it works, but if anyone does, please let me know.

Tenfold cost reduction in exon resequencing

Yesterday Roald Forsberg sent me this link: NHLBI, NHGRI Offer $12M to Cut Cost of Exon
Sequencing for Large-Scale Disease Studies

This is a research project that will offer four awards totalling $12M to help reduce the cost of reseqencing all exons, making this a viable approach to whole genome association mapping.

[The project will] complement the 1,000 Genomes Project that the NHGRI, the Wellcome Trust Sanger Institute, and the Beijing Genomics Institute announced last week.

Unlike that project, which aims to catalog genetic variations in human populations unbiased for disease, the resequencing technology program is geared towards studies that will correlate sequence variations with disease phenotypes.

When writing the grant proposal for my new project (that started yesterday) I argued that it was time to consider whole genome resequencing for association mapping and that we would see whole genome resequencing in association mapping within five to ten years. I am beginning to think I was being pessimistic here. Sure, this only focus on exons, but if we can sequence all exons, then the entire genome is not far behind.

Imputation based association mapping

A neat idea that has become quite popular the last two years is to consider untyped genotypes as “missing data” and then impute this data using panel data such as HapMap. Imputing all missing markers and then testing those is the ultimate multi-marker association mapping method: you directly test all markers and if you get a hit you immediately know which marker to try to replicate.

Well, I might be overselling it a bit here — there are some situations where imputing markers and testing them individually won’t actually help you and where other multi-marker methods will — but it is a very nice idea and the output is very easy to interpret.

Unfortunately, imputation methods can be pretty slow. We’ve used FastPHASE in our projects, and while it works fine for smaller regions, it is too computationally intensive for whole genome imputations (at least with the computers we have access to).

In this issue of Bioinformatics there’s an application note describing a new tool for doing imputation based association mapping:

Association studies for untyped markers with TUNA

Xiaoquan Wen and Dan L. Nicolae
Bioinformatics 2008 24(3):435-437; doi:10.1093/bioinformatics/btm603

Rather than imputing the actual markers, they impute frequencies of the missing markers in the cases and the controls and that significantly improves both the running time and the memory usage.

Getting only the frequencies will not help us using multi-marker methods on imputed data, but for single marker tests (at least tests that only use the frequencies) I imagine it could be a very useful tool.

Citation for Research Blogging:
Wen, X., Nicolae, D.L. (2008). Association studies for untyped markers with TUNA. Bioinformatics, 24(3), 435-437. DOI: 10.1093/bioinformatics/btm603