Archive for June 11th, 2009

SciPy at last!

Thursday, June 11th, 2009

Scipy is probably the software I have missed the most since I moved to OS X.  I have used it a lot for data analysis whenever the data wasn’t in a simple table form idea for R.  It really is a wonderful module if you do any kind of scientific computing in Python.

I have tried several times to install it on my Mac(s) but with limited success.  On my Mac at the office I got most of it installed, but never managed to get matplotlib working.  On my iMac at home and on my MacBook I could get numpy up and working but not scipy proper.

Until now, that is.  I found this wonderful package: Scipy Superpack.  Well, it is really just a shell script, but it just installs scipy and a bunch of other modules, so now I finally have my old Python toolbox up and running again.

162-167=-5

Pandemic

Thursday, June 11th, 2009

It’s official, the swine flu has been upgraded to pandemic.

WHO chief Dr. Margaret Chan made the announcement Thursday after the U.N. agency held an emergency meeting with flu experts. Chan said she was moving the world to phase 6 — the agency’s highest alert level — which means a pandemic, or global epidemic, is under way.
“The world is moving into the early days of its first influenza pandemic in the 21st century,” Chan told reporters. “The (swine flu) virus is now unstoppable.”

The last pandemic — the Hong Kong flu of 1968 — killed about 1 million people. Ordinary flu kills about 250,000 to 500,000 people each year.

162-166=-4

Multi-core HMMs

Thursday, June 11th, 2009

I’m working on a project where it looks like I could end up with hidden Markov models with hundreds if not thousands of states, and with transition matrices where all entries are non-zero.  Since, for such models, the running time for just calculating the likelihood (not even optimising it) is O(m\cdot n^2) where m is the length of the input and n is the number of states, this could be a problem.  Especially since I’m working on whole genome data, so m can get pretty large as well.

It is part of our CoalHMM work, but I’ll tell you more about that later.  For this post, the actual application isn’t all that important, only the size of the model.

So, anyway, needless to say I need a fast implementation if this is ever going to work.

HMM algorithms

The basic hidden Markov model algorithms are essentially just filling in dynamic programming tables.

The “Forward” algorithm, for example, looks something like

for (int i = 0; i < m; ++i) {
    for (int j = 0; j < n; ++j) {
        double sum = 0.0;
        for (int k = 0; k < n; ++k) {
            sum += T[k,j] * E[j,x[i]];
        }
        F[i,j] = sum;
    }
}

There’s a few more details in a real implementation, ’cause this version will have underflows in a few steps, but this is the essential idea.

SIMD speedups

We have developed a small library with these algorithms here at BiRC (mainly implemented by one of my student programmers, Andreas) that speeds up the algorithms using single instruction, multiple data (SIMD) instructions.

The idea here is that we can calculate the sums as above by adding several entries in parallel.  For doubles we can handle two numbers in parallel, for floats we can handle four, so ideally we can speed up the algorithms by a factor of two or four.

We don’t exactly achieve that – there is some overhead involved – but it is not too bad.  The figure below compares the running time with and without SIMD for varying number of states, n, with a fixed input length m=10,000.

The number type – float or double – doesn’t matter for the plain algorithm since the same number of operations is used, so I’ve only plotted the time for doubles.

This is all good and well, but with really large HMMs I think we will need even more of a boost.

Multi-core speedups

It is not hard to get your hands on a multi-core machine these days – if anything it is hard to get a single core machine.  Not massively multi-core machines, although I expect to see those in a few years, but at least dual or quad core machines. So I was hoping to get another factor of two or four from running two or four threads.

In general, parallelising the instructions in the program is harder than parallelising the data operations as we do with the SIMD.  For one thing, there is a lot of synchronisation issues when running multiple threads, and this leads to a lot of overhead.

When filling in the dynamic programming table as above, each column in the table depends on the previous column, but each cell in the column is independent from the other cells in the column.  Thus, we should be able to process each cell in a column independently, but synchronise the threads after each column so we ensure that the previous column is always fully processed.

Since the synchronisation isn’t free, we expect that we need a large number of rows in the table to make it worthwhile, but in my project I expect hundreds to thousands of states, so perhaps there is some gain to it.

I decided to try it out, anyway.

Setting up threads and synchronising them and all that used to require some programming effort, but now that we have OpenMP it is very easy to add parallelism to a program.  To parallelise the algorithm above, you simply add a pragma:

for (int i = 0; i < m; ++i) {
#pragma omp parallel for
    for (int j = 0; j < n; ++j) {
        double sum = 0.0;
        for (int k = 0; k < n; ++k) {
            sum += T[k,j] * E[j,x[i]];
        }
        F[i,j] = sum;
    }
}

and by magic the middle loop is parallelised so the body of it is run in different threads.  Setting up the threads, assigning ranges of the sum to the different threads, synchronising them after the loop, etc. is all taken care of by the compiler.

Neat.

The results are plotted below (on my dual core iMac):

As is evident, there is a large overhead in the threaded version, so for anything below a few hundred states it is slowing the algorithm down instead of speeding it up, but eventually the parallelism does kick in.

Well, for the version without SIMD at least, which is the one I’ve plotted above.

For double SIMD, it is just worth it for the large number of states

while for floats it isn’t really worthwhile for this range of number of states

Of course, this is just the very first attempt at parallelising the algorithms, so there might still be some tricks to try.  For one thing, I have a feeling I can get rid of some of the thread overhead by adding a pragma outside the outer loop. I recall seeing something like that before but I don’t remember quite how it worked.

Anyway, I’ll play with it some more later.

162-165=-3

$50K personal genome sequencing

Thursday, June 11th, 2009

If you want your very own genome sequenced, you know have the chance for just $48,000.  Not only will Illumina sequence the genome for you, at 30X coverage, they’ll throw in a Mac for you to store it on.

Read much more about it here:

This is both exciting, and then also not so much…

Personal genomics with actual sequencing is exciting.  Companies like DeCodeMe and 23AndMe only genotypes selected common variants.  About a million of them, so covering the genome quite well, but far from the (two times) three billion nucleotides in the full genome, so not likely to capture the rare variants in your genome.

Providing the full sequence is also what Knome does, but while Knome will charge you $99,500, Illumina will only charge you $48,000 (and throw in a Mac).

Of course, the price is what makes it less exciting.  While Illumina’s offer is half the price of Knome, it is still far from DeCodeMe (1M markers for $1000) or 23AndMe (500K markers for $400) and not really in the range of the average citizens.

162-164=-2