bx-python rules!
Friday, August 28th, 2009I have downloaded a 44 way multiple whole genome alignment from UCSC and want to extract the alignment of the great apes for an HMM analysis.
Sounds like an afternoon’s worth of script programming, but it isn’t thanks to bx-python.
It’s no more than this little script:
import sys import gzip from bx.align import maf from bx.align.tools import fuse_list, thread if len(sys.argv) != 3: print 'Usage:', sys.argv[0], 'infile outfile' sys.exit(2) reader = maf.Reader(gzip.open(sys.argv[1])) writer = maf.Writer(gzip.open(sys.argv[2],'w')) apes = thread(reader, ['ponAbe2','hg18','panTro2','gorGor1']) fused = fuse_list(apes) for rec in fused: writer.write(rec)
I just use the “thread()” and “fuse_list()” functions that extracts the segments with the given species (and only the rows in those) and that fuses contiguous segments, respectively.
Out of the box, they work on lists so I ran out of RAM when running the script, but it was trivial to change them to work as generators so now it runs like a charm.
–
240-246=-6