For the morning paper presentation session, I attended the sequence assembly track.
The papers here all concerned the new algorithmical problems you need to tackle to handle next generation sequencing technologies, with vastly more data and much smaller reads.
Parallel short sequence assembly of transcriptomes BG Jackson, PS Schnable and S Aluru
The first presentation was about a distributed graph algorithm for de novo assembly.
Graph algorithms are a nice approach to sequence assembly, but they are potentially very time and memory expensive. The method here distributes both the memory usage and the computations on multiple CPUs, thus alleviating this problem.
Finding optimal threshold for correction error reads in DNA assembling FYL Cin, HCM Leung, W-L Li and S-M Y
The second presentation was on error correction.
With NGS you get a very high number of reads, but a few percentage of the nucleotides in the reads are called incorrectly. This is corrected for by requiring that each K-mer (for a given K) should occur at least M times (where M is some threshold) before it is belived to be a correct read.
The problem addressed here was how to choose M given a data set. The approach was to model the sequences as generated by a stochastic process and the estimate the expected number of false positives and false negatives for each M and then picking the M that minimises the sum of FP and FN.
Crystallizing short-read assembly around seeds MS Hossain, N Azimi and S Skiena
The third presentation was on a new de novo assembly algorithm taylored to the paired end reads you get from the SOLiD platform.
The first half of the presentation, though, was an overview of various platforms, so I’ll need to read the paper before I have any idea about the specifics of the algorithm.
Short read DNA fragment anchoring algorithm W Wang, P Zhang and X Liu
The last presentation was not on de novo assembly but on reference genome assembly, and concerned finding anchors (sub-strings of a larger string that approximately matches a query string).
This time around I didn’t get any of the details. Perhaps because it was getting close to lunch and I was fading out…
–
14-20=-6