The Gene Sherpa asks: what would you do if we had 1000 USD genomes by next year?
This is a very interesting question, and one I would love to answer. In the proposal for the grant that is currently funding me, I predicted that we would get such data within the decade (it looks like I was being very pessimistic here) and that I would spend the last third of the grant period working on this problem.
I think I should get started right away, now.
My goal is to figure out ways to analyse full sequence data for disease mapping. With full sequences, a few things change compared to SNP chip data.
First, of course, there is the matter of scale. Now you get 6 billion nucleotides per individual instead of 2x500K or 2x1M as with SNP chips.
Second, you are no longer looking for indirect signals, so there are no tagging and multi-marker methods will not be needed to boost the power of indirect signals. You have all the variation observed (but the types of variation is much more complicated).
Third (and perhaps most interesting), the kind of signals we are looking for will change. With SNP chips and tagging SNPs, we are looking for high-frequent variants with modest effect. High frequent variants is all we are tagging (and these have a modest effect if we are still looking for it, if they didn't we would have found them ages ago). With full sequencing, we will be able to look for low frequency variants as well.