Archive for May 14th, 2009

Saturday morning physics

Thursday, May 14th, 2009

Here at AU we have something called the “physics show”.  We also have “chemistry show” and I think even a “computer science show”.  Essentially, it is a group of students who gives these “shows” where they demonstrate various physics phenomena (or chemistry or whatnot) by doing experiments on stage and explaining the underlying theory.

It is mainly aimed at high school students and they go out to the schools in Denmark and do these shows to get students interested in science.

I think it is a great idea, and I have seen the show several times and always enjoyed it.

There was even a TV version of it, running late nights or early mornings on an obscure channel, that I watched with interest.  Partly because I know most of the people doing the show, but also because I love physics (I am just not good enough at it for it to be more than a hobby so I stick to bioinformatics).

The TV show ended years ago, but now on iTunes I found something even better to watch when I’m too lazy to do any work myself.

Saturday morning physics from the Uni of Michigan.

It is nothing like the physics show here, but a series of lectures on various topics (not all of the physics, though).

I would love to see more of this; lectures on iTunes.

134-147=-13

Now how exactly was it I did that?

Thursday, May 14th, 2009

RRResearch has some thoughts about keeping records of computer work:

When I do benchwork I consistently keep pretty good notes.  I write down everything I do as I do it, on numbered and dated sheets of paper that go into looseleaf binders, organized by experiment.

But I don’t seem to be able to apply these good record-keeping habits when I’m working with computers.  Instead everything I do feels ‘exploratory’, as if everything I do is just a preliminary check to see what effect a modification will have, before I do something worth writing down.

I recognise this all too well.

It is not so much a problem when I do some exploratory data analysis.  I will have my R log to see what I actually did, and if I find an interesting pattern I know what I found and I don’t really need the history of how I got there so much.

When writing programs I don’t have the problem either.  There I have source control and bug trackers to help me.

My problem is with scripts.

I write a small script to format my data into something I can analyse.  Run a program or two on the data. Write another script to re-format the data.  A small script to pull out relevant data.  Look at that.  Then I need to just check a few things, and that is easy as another little script.

Very soon I have ten to twenty small scripts of five to ten lines each. None of them are really worth putting in version control or cleaning up or anything, ’cause it was all just exploratory anyway, but if I come back to the data a few weeks later, I have no way of reproducing what I did.

It is really horrible.

Ideally, once I know what I want to do with the data, I should clean up the pipeline, put it under version control and document it, but by then I am already done with the data analysis so I rarely bother.

Until I have to do it all again a few weeks or month later on some new data.

At that point I should really clean up the pipeline, but most likely I need to do something slightly different.  Not drastically different, but a few of the steps should be modified anyway, and depending on the results I need a few more scripts and it just spirals out of control.

I don’t really know how to solve this, I only know that what I am doing is quite sub-optimal.

134-146=-12