New paper on computing the quartet distance
Saturday, April 4th, 2009We just got another paper accepted:
A quadratic time algorithm for computing the quartet distance between two general trees T. Mailund, J. Nielsen, and C.N.S. Pedersen
To be presented at the Workshop on Bioinformatics Algorithms at IJCBS 2009 August 2009, Shanghai, China
AbstractWe derive a quadratic time and space algorithm for computing the quartet distance between a pair of general trees, i.e. trees where inner nodes can have any degree ≥ 3. The time and space complexity of our algorithm is quadratic in the number of leaves and does not depend on the degree of the inner nodes. This makes it the fastest algorithm for computing the quartet distance between general trees independent of the degree of the inner nodes.
The quartet distance
The quartet distance is a metric for comparing two phylogenetic trees over the same set of taxa.
It works as follows: For each sub set of taxa of size four (a quartet),
you consider the tree topology in the two trees. The tree topology for
can be one of the four shown below. Now, for each such subset you count how often the two phylogenetic trees agree on the quartet topology and how often they do not, and the number of topologies where they differ is the quartet distance between the two trees.

There are
quartets in a tree with
leaves, so explicitly enumerating the topologies and comparing them this way would take at least time in
but through various tricks this can be reduced.
Algorithms for computing the quartet distance
The first algorithms for computing the quartet distance focused on binary trees. For binary trees, the star topologi, topology (d) above, is not possible, and that simplifies the problem a bit.
The fastest algorithm for binary trees so far is time
. Incidentally, my very first bioinformatics paper was an application note about an implementation of pretty much that algorithm. Not quite, my implementation was of an
algorithm, but the last
comes at a considerable code complexity, so I didn’t bother with it.
People didn’t seem to bother with non-binary trees for some reason. I guess we are just used to trees being binary in bioinformatics that we don’t think about it much (even when we should, and when we shouldn’t resolve inner nodes when we are really not sure how to resolve them).
Anyway, here at BiRC we got interested in it because we had three Master’s students who wanted to do some algorithmics in bioinformatics and we asked them to look at the quartet distance and see what they could do we general trees.
That turned out extremely successfully. They worked very well together, and in short time produced several algorithms and got them written down in papers:
- Algorithms for Computing the Quartet Distance between Trees of Arbitrary Degree C. Christiansen, T. Mailund, C.N.S. Pedersen, and M. Randers Proceedings of the Workshop on Algorithms in Bioinformatics (WABI), 2005, LNBI 3692, pp. 77-88
- Computing the Quartet Distance between Evolutionary Trees of Bounded Degree M. Stissing, C.N.S. Pedersen, T. Mailund, G.S. Brodal, and R. Fagerberg Proceedings of the Asia-Pacific Bioinformatics Conference (APBC) 2007, pp. 101–110
- Fast Calculation of the Quartet Distance Between Trees of Arbitrary Degrees C. Christiansen, T. Mailund, C.N.S. Pedersen, M. Randers, and M. Stissing Algorithms for Molecular Biology 2006, 1(16); doi:10.1186/1748-7188-1-16.
The first paper describes two algorithms for computing the quartet distance between general degree trees. One independent of the maximal degree, running in time
, the other dependant on it, running in time
where
is the maximal degree. Both algorithms are dynamic programming algorithms.
The second paper extends the
algorithm for binary trees to general trees, but at a running time of
— so sub quadratic in
but with a rather high complexity in
(to say nothing of the “mental complexity” of the algorithm. I would certainly not like to have to implement it).
The third paper uses a lot of combinatorical tricks to reduce the
algorithm from the first paper to
.
Our new algorithm
Our new algorithm isn’t really that new at all. It is an algorithm I came up with on the train back from a workshop where I had presented an earlier version of the algorithms in the first paper above.
I couldn’t get the complexity analysis right, though, and concluded that it was probably
so no improvement on what we already had. Thus I just put it in a drawer and forgot about it.
At least until Christian forwarded a call-for-papers to me and asked me if I had any old results lying around. I digged through my old abandoned ideas and found this one. Since it is some rather simple combinatorics, similar to the dynamic programming algorithms we had in the very first paper, it might be possible come up with a simpler
, algorithm than the one we had, I thought, or maybe just consider a sub-set of trees, like all d-ary (not necessarily binary) trees.
I didn’t have much time to put into it, but Jesper, a PhD student here at BiRC did, so I gave it to him. He basically just re-did the complexity analysis and concluded that the running time was
so without further ado we sent in the paper, and now we (well, most likely Jesper) will present it in Shanghai this summer.
–
94-116=-22

