Posts Tagged ‘Sequencing’

Everyone is digging for gold, but I want to sell them shovels

Tuesday, June 17th, 2008

Years back, when I was studying computer science, I took a course on virtual machine design by Lars Bak.  At the time he had just returned to Denmark but was still working at SUN and he managed to get a VP from SUN to give one of his lectures for him (I forget who it was, as I said it was many years ago).

That particular lecture wasn’t about building object oriented virtual machines but about building successful software companies.  (No snide remarks about someone from SUN talking about that, please).

This was during the .com bubble or just while it was bursting, and the advice he gave was: “when everyone is digging for gold, you get rich by selling shovels”.

If you build the basic infrastructure that everyone needs, it might not be as glamorous and if you are selling commodity products you won’t get rich over night, but if you are selling something that everyone needs, you won’t loose your market over night.

Personal genomics and medicine shovels, anyone?

I’m telling this story because I just read this post at Genetics Future. It concerns genetic testing and how it will soon change with complete re-sequencing which will be cost-effective Real Soon Now(tm).

The post ends:

There are ruthless economies of scale in the human disease genomics business, both in terms of sequencing infrastructure and the costs of assembling reliable knowledge bases for interpretation, so it will be increasingly difficult for smaller companies to stay competitive.

The personal genomics and genetic testing field is another gold rush (although one where small garage companies aren’t quite in on the game yet).  Right now there’s plenty of testing labs, but with resequencing we’ll probably only get a few large companies, at least until the price for resequencing drops significantly.

I don’t want to compete here.  I’m sure I’ll lose.  I would absolutely love to be selling shovels to the gold diggers!

What will all these companies need?

Of course they will need IT infrastructure to manage their data and statistical methods to correlated genotypes with phenotypes.

The question is, of course, whether it will be possible to sell bioinformatics to such companies, or whether they will want to build all their informatics in house.  Some, they want to, of course, as that will give them a competitive advantage, but surely there will be some commodity software they will want to buy somewhere else.

They won’t build their own OS or database system, but probably their own specialised statistical models. Somewhere in between, there is money to be made, if I can only figure out how…

CLC Genomics Workbench

Thursday, June 12th, 2008

My friends at CLC Bio has just released their Genomics Workbench.  When I talked to them last Friday, I couldn’t quite figure out what the marked for this software is, but Next Generation Sequencing is a hot topic right now, so there probably is one.

Anyway, I wish them luck with it!

It must be hell, sequencing the Neanderthal

Monday, June 9th, 2008

Reading through a page at Nature about metagenomics (probably requires subscription…) I saw this sentence:

“The biggest metagenomic project on Earth might be our Neanderthal genome project,” says Egholm. They are using 454 to sequence the complete genome of a Neanderthal, which Egholm says they hope to release by the end of the year. But 95–98% of the DNA in the Neanderthal sample comes from the environment rather than from a Neanderthal. This means that to get the 1 coverage, or roughly 3 billion base pairs, of the genome, the team must sequence somewhere between 70 billion to 100 billion base pairs of these environmental samples.

Sequencing the Neanderthal must be quite some challenge! Of course, contamination by bacteria should be fairly easy to discover and get rid of compared to contamination by the humans doing the sequences. We are just too closely related to the Neanderthal for that to be a simple task.

Of course, the Neanderthal specimens are handled carefully, but some contamination is unavoidable.  How much of a problem it is, I do not know, though.  I tried googling for it, but didn’t really find any consistent answers.

I look forward to getting my hands on the Neanderthal sequence, though.  I would love running it through our CoalHMM analysis!

New long read sequencing technology

Tuesday, May 20th, 2008

At Next Generation Sequencing I saw a review of a new sequencing technology that allows for long reads, unlike all the other new sequencing methods.  While the other technologies read sequences of up to a hundred base-pairs, this can potentially read thousands.  This is exciting since this longer reads are needed to deal with repetitive regions and since longer reads will enable us to sequence phased chromosomes and not the mix of two chromosomes we usually get.

The review is silent on when we can expect this technology in use or how fast it will be and such, but I hope it is not just vapour-ware.

What would you do with $1000 re-sequencing?

Sunday, March 16th, 2008

The Gene Sherpa asks: what would you do if we had 1000 USD genomes by next year?

This is a very interesting question, and one I would love to answer. In the proposal for the grant that is currently funding me, I predicted that we would get such data within the decade (it looks like I was being very pessimistic here) and that I would spend the last third of the grant period working on this problem.

I think I should get started right away, now.

My goal is to figure out ways to analyse full sequence data for disease mapping. With full sequences, a few things change compared to SNP chip data.

First, of course, there is the matter of scale. Now you get 6 billion nucleotides per individual instead of 2x500K or 2x1M as with SNP chips.

Second, you are no longer looking for indirect signals, so there are no tagging and multi-marker methods will not be needed to boost the power of indirect signals. You have all the variation observed (but the types of variation is much more complicated).

Third (and perhaps most interesting), the kind of signals we are looking for will change. With SNP chips and tagging SNPs, we are looking for high-frequent variants with modest effect. High frequent variants is all we are tagging (and these have a modest effect if we are still looking for it, if they didn’t we would have found them ages ago). With full sequencing, we will be able to look for low frequency variants as well.