Posts Tagged ‘Python’

bx-python rules!

Friday, August 28th, 2009

I have downloaded a 44 way multiple whole genome alignment from UCSC and want to extract the alignment of the great apes for an HMM analysis.

Sounds like an afternoon’s worth of script programming, but it isn’t thanks to bx-python.

It’s no more than this little script:

import sys
import gzip
from bx.align import maf
from bx.align.tools import fuse_list, thread

if len(sys.argv) != 3:
    print 'Usage:', sys.argv[0], 'infile outfile'
    sys.exit(2)

reader = maf.Reader(gzip.open(sys.argv[1]))
writer = maf.Writer(gzip.open(sys.argv[2],'w'))

apes = thread(reader, ['ponAbe2','hg18','panTro2','gorGor1'])
fused = fuse_list(apes)
for rec in fused:
    writer.write(rec)

I just use the “thread()” and “fuse_list()” functions that extracts the segments with the given species (and only the rows in those) and that fuses contiguous segments, respectively.

Out of the box, they work on lists so I ran out of RAM when running the script, but it was trivial to change them to work as generators so now it runs like a charm.

240-246=-6

SciPy at last!

Thursday, June 11th, 2009

Scipy is probably the software I have missed the most since I moved to OS X.  I have used it a lot for data analysis whenever the data wasn’t in a simple table form idea for R.  It really is a wonderful module if you do any kind of scientific computing in Python.

I have tried several times to install it on my Mac(s) but with limited success.  On my Mac at the office I got most of it installed, but never managed to get matplotlib working.  On my iMac at home and on my MacBook I could get numpy up and working but not scipy proper.

Until now, that is.  I found this wonderful package: Scipy Superpack.  Well, it is really just a shell script, but it just installs scipy and a bunch of other modules, so now I finally have my old Python toolbox up and running again.

162-167=-5

Not mistakes, just not idiomatic!

Sunday, May 24th, 2009

Here’s a list of “common Python mistakes”:

if a == None:
Really common. None is a singleton, so you can (well, actually, have) to compare to it like ‘if a is None‘ or ‘if a is not None‘.

if (condition):
This is not really pythonic, if doesn’t take () around the condition in python, () purpose is grouping, you can use them in an if but only to clarify a really complex condition or similar stuff. Please don’t clutter code with unneeded characters.

if len(list):
This is a misunderstanding of Python way of things, any empty sequence (string, list, tuple) is False. So if you want to make sure a list is empty, just check if it’s False. Same for tuples, and in some ocassions for strings.

if type(obj) is type(1):
This is killing kittens, there’s a builtin function for this purpose, it’s called -surprise- instance(object, type). For example instance(123, int)

Or are they really mistakes?

They will all work exactly as intended, and essentially with the same runtime complexity.

The only problem is that they are not the idiomatic way of doing things.

Of course, you should follow the language conventions if you want other people to be able to read your code without continuously wondering why you are doing this or that.  Anything that breaks conventions have other programmers stop and wonder what is going on.

Still, I wouldn’t call these mistakes as such.  I’d prefer to reserve that word for constructions that does not have the desired effect — like a = None instead of a == None which at least will give you a syntax error in Python but wouldn’t in C — or does the right thing but at a significant runtime cost — like for (int i = 0; i < strlen(s); ++i) that gives you a quadratic rather than linear running time.

143-153=-10

Unladen Swallow

Friday, March 27th, 2009

At Google they are trying to give Python a real boost.  See Ars Technica’s post on the project.

The project is called Unladen Swallow, a reference to Monty Python and the Holy Grail, and the goals are:

We want to make Python faster, but we also want to make it easy for large, well-established applications to switch to Unladen Swallow.

  1. Produce a version of Python at least 5x faster than CPython.
  2. Python application performance should be stable.
  3. Maintain source-level compatibility with CPython applications.
  4. Maintain source-level compatibility with CPython extension modules.
  5. We do not want to maintain a Python implementation forever; we view our work as a branch, not a fork.

The main approach is to use LLVM (an open source virtual machine) and JIT compilation to speed up the code.  This is probably a good idea.  JIT approaches (plus dynamic runtime optimisation) has done wonders for Java and is under the hood of Google’s Chrome browser in the virtual machien V8.

More interesting, though, in my opinion is that they want to support multi-core machines by getting rid of the global interpreter lock (GIL).  Because of global synchronisation issues, multi-threading in Python isn’t quite as parallel as you might think.  It is fine for system calls without blocking, but not really for exploiting multiple cores.  But see Multiprocessing with Python (I wanted to write a separate post on that, but probably won’t have time, so now I’ll just link to it here).

Moore’s law is dead.  Processors are not getting faster, they are just getting more cores. See also Herb Sutter’s The Free Lunch is Over.  Multi-core software is going to be essential for high performance in the future, and by handling this in the VM for Python, rather than running separate processes, we might see runtime parallelisation optimisation.  That would be really exciting!

86-108=-22

MySQLdb (Python module) on OSX

Wednesday, March 25th, 2009

I’m working on a database for our CoalHMM analysis results.  Right now, I’m playing around with different database designs, so I want to play around with the database on my local machine rather than our shared server.

I used to just do this on Linux, but this is the first time I’m playing with MySQL on OS X.

Installing MySQL was pretty painless as you can download a binary package for it.

MySQLdb, the Python module I use to access the database, is a different matter, however.  I couldn’t find any binary packages and the source distribution doesn’t compile out of the box.

Luckily, I am not the first to want to use MySQLdb on a Mac, and a bit of googling found this page.

Worked like a charm, so now I am back to hacking data base designs.

84-103=-19