Archive for June, 2009

$50K personal genome sequencing

Thursday, June 11th, 2009

If you want your very own genome sequenced, you know have the chance for just $48,000.  Not only will Illumina sequence the genome for you, at 30X coverage, they’ll throw in a Mac for you to store it on.

Read much more about it here:

This is both exciting, and then also not so much…

Personal genomics with actual sequencing is exciting.  Companies like DeCodeMe and 23AndMe only genotypes selected common variants.  About a million of them, so covering the genome quite well, but far from the (two times) three billion nucleotides in the full genome, so not likely to capture the rare variants in your genome.

Providing the full sequence is also what Knome does, but while Knome will charge you $99,500, Illumina will only charge you $48,000 (and throw in a Mac).

Of course, the price is what makes it less exciting.  While Illumina’s offer is half the price of Knome, it is still far from DeCodeMe (1M markers for $1000) or 23AndMe (500K markers for $400) and not really in the range of the average citizens.

162-164=-2

Terminator / Wired for War

Wednesday, June 10th, 2009

I just came back from the movies, watching Terminator Salvation, and the saw this: Wired for War.

Should I be worried?

161-163=-2

Last two weeks in the blogs…

Monday, June 8th, 2009

Ok, with the RSI I didn’t collect a list for last week, so here’s a list spanning the last two weeks instead…

Fun

Genetics

Geology

Programming

Public health

Research life

159-162=-3

Minimisation

Sunday, June 7th, 2009

Just for something to do on a lazy Sunday afternoon – where the only “excitement” was voting for the EU parliament (and the law concerning the succession to the Danish throne that won’t be relevant for the next two generations) – I played with R a little bit.

I should be reading three Master’s theses (I’m censor on three defenses later this month) but I got to bed too late yesterday and feel too tired and lazy to do a proper job on that…

Anyway, I decided to play with the gradient descent method for function minimisation. In my machine learning class I told my students that they should find a library for minimising functions rather than the simple gradient descent method, ’cause that is what I would normally do, but never being too keen on practising what I preach, I decided to experiment a bit with it.

Just for a function of a single value, to keep it simple, but still.

I picked the function f(x) = -x^3 + 10x^2 + 3x – that’s as good as any and I can analytically find the minima so I can easily check the results.  Well, one minimum is minus infinity, but I am not interested in that.

To get that minima, just set the derivative f'(x) = -3x^2 + 20x + 3 to zero and pick the negative solution (the positive is the local maximum).

> f <- function(x) -x**3 + 10*x**2 + 3*x
> df <- function(x) -3*x**2 + 20*x + 3
> roots <- polyroot(c(3,20,-3))
> roots
[1] -0.1467688+0i  6.8134355+0i

So with that out of the way, we can get started.

The Newton-Raphson method

The gradient descent method is a simplification of the Newton-Raphson method. The idea behind the latter is to do a Taylor expansion of f(x) around some local minimum x', f(x') \approx f(x)+f'(x)(x'-x) + \frac{1}{2}f''(x)(x'-x)^2 where we cut the expansion short after the quadratic term.

Take the derivative, with respect to x’-x, on both sides 0 \approx f'(x) + f''(x)(x'-x) and we get x' \approx x - \frac{f'(x)}{f''(x)}.  The goal now is to replace \approx with =, that is find x=x', and we do that iteratively, starting in some candidate x_0 and then at iteration i set x_i = x_{i-1}-\frac{f'(x_{i-1})}{f''(x_{i-1})}.

Repeat until you reach a fixed point, or at least until x_i is reasonably close to x_{i-1} or f'(x_i) is reasonably close to zero.

In R:

NewtonRaphson <- function(x0, df, ddf, n=100) {
  steps <- matrix(0, nr=n, nc=3)
  steps[1,] <- c(x0, df(x0), ddf(x0))
  x <- x0
  for (i in 2:n) {
    x <- x - df(x)/ddf(x)
    steps[i,] <- c(x, df(x), ddf(x))
  }
  steps
}

where I just run for n iterations and keep track of the derivatives along the way so I can inspect the results later.

This method has the nice property that if you start close to a minimum, the distance to that minimum drops off quadratically as a function of iterations, so it converges pretty quickly.

> ddf <- function(x) -6*x + 20
> nr.steps <- NewtonRaphson(0, df, ddf)
> head(nr.steps)
           [,1]          [,2]     [,3]
[1,]  0.0000000  3.000000e+00 20.00000
[2,] -0.1500000 -6.750000e-02 20.90000
[3,] -0.1467703 -3.129221e-05 20.88062
[4,] -0.1467688 -6.737721e-12 20.88061
[5,] -0.1467688  4.440892e-16 20.88061
[6,] -0.1467688 -8.881784e-16 20.88061

In this case, we are converged after only four steps if we start in x_0=0.

The gradient descent method

The gradient descent method doesn’t use the second derivative, but replaces \frac{1}{f''(x)} with a constant \lambda. Useful if you don’t have an expression for the second derivative (or if it is costly to compute).

The iteration goes like x_i = x_{i-1}-\lambda f'(x_{i-1}) and the R code looks like this:

GradientDescent <- function(x0, df, lambda, n=100) {
  steps <- matrix(0, nr=n, nc=2)
  steps[1,] <- c(x0, df(x0))
  x <- x0
  for (i in 2:n) {
    x <- x - lambda*df(x)
    steps[i,] <- c(x, df(x))
  }
  steps
}

From the Newton-Raphson run above, we saw that f''(x) was pretty close to 20, so we can try running with \lambda=1/20:

> gd.steps <- GradientDescent(0, df, 1/20)
> head(gd.steps)
           [,1]          [,2]
[1,]  0.0000000  3.000000e+00
[2,] -0.1500000 -6.750000e-02
[3,] -0.1466250  3.003328e-03
[4,] -0.1467752 -1.321765e-04
[5,] -0.1467686  5.819939e-06
[6,] -0.1467688 -2.562555e-07

where again we converge pretty quickly.

Of course, if we don’t know f''(x) close to the minimum, or if it varies significantly around the minimum, we might not be so lucky that we can pick a good \lambda.

and if we take too large steps, say \lambda=1/10 we might not converge at all but keep over-shooting in the update.

One possible solution is to use a small \lambda but add a momentum to the iteration step, so it becomes x_i=x_{i-1}+m_i where m_i=-\lambda f'(x) + \mu m_{i-1} and where 0\leq\mu\leq 1.  For \mu=0 this is just the plain old gradient descent method, but for \mu>0 the idea is that if we take a large step, the next one should probably also be large so the momentum increases, and when we start making smaller steps it decreases again.

GradientDescentMomentum <- function(x0, df, lambda, mu, n=50) {
  steps <- matrix(0, nr=n, nc=3)
  steps[1,] <- c(x0, df(x0), 0)
  x <- x0
  m <- 0
  for (i in 2:n) {
    m <- -lambda*df(x) + mu*m
    x <- x + m
    steps[i,] <- c(x, df(x), m)
  }
  steps
}

Of course, now we are left with two rather than one parameter to the method that we need to come up with a guess for.

We still have the problem that we won’t converge if we make a guess for \lambda that is too large, and in my experiments we do not gain a whole lot from the momentum if we pick it too small either.

Plus the \mu term has to be chosen carefully as well

so I’m a bit underwhelmed by this idea.

Ok, enough computer time for me today, I need to rest my RSI, so I’ll stop here…

158-161=-3

Repetitive strain injury

Sunday, June 7th, 2009

I have suffered from repetitive strain injury for years.  Got it during my PhD studies where I did a lot of programming.  I have some exercises that helps, but other than that there is only resting from sitting at the computer that alleviates it.  Normally this works fine, and since I am not doing as much late night programming as I used to do, it is not a major handicap.

The last two weeks, though, it has been pretty bad, so I have been off blogging.  I’m slowly recovering, so I guess it is time to slowly get back…

158-160=-2