We just got another paper accepted:

A quadratic time algorithm for computing the quartet distance between two general treesT. Mailund, J. Nielsen, and C.N.S. PedersenTo be presented at the Workshop on Bioinformatics Algorithms at IJCBS 2009 August 2009, Shanghai, China

AbstractWe derive a quadratic time and space algorithm for computing the quartet distance between a pair of general trees, i.e. trees where inner nodes can have any degree ≥ 3. The time and space complexity of our algorithm is quadratic in the number of leaves and does not depend on the degree of the inner nodes. This makes it the fastest algorithm for computing the quartet distance between general trees independent of the degree of the inner nodes.

### The quartet distance

The quartet distance is a metric for comparing two phylogenetic trees over the same set of taxa.

It works as follows: For each sub set of taxa of size four (a quartet), $$\{a,b,c,d\}$$ you consider the tree topology in the two trees. The tree topology for $$\{a,b,c,d\}$$ can be one of the four shown below. Now, for each such subset you count how often the two phylogenetic trees agree on the quartet topology and how often they do not, and the number of topologies where they differ is the quartet distance between the two trees.

There are $$n \choose 4$$ quartets in a tree with $$n$$ leaves, so explicitly enumerating the topologies and comparing them this way would take at least time in $$O(n^4)$$ but through various tricks this can be reduced.

### Algorithms for computing the quartet distance

The first algorithms for computing the quartet distance focused on binary trees. For binary trees, the *star topologi, *topology (d) above, is not possible, and that simplifies the problem a bit.

The fastest algorithm for binary trees so far is time $$O(n\log n)$$. Incidentally, my very first bioinformatics paper was an application note about an implementation of pretty much that algorithm. Not quite, my implementation was of an $$O(n\log^2n)$$ algorithm, but the last $$\log n$$ comes at a considerable code complexity, so I didn’t bother with it.

People didn’t seem to bother with non-binary trees for some reason. I guess we are just used to trees being binary in bioinformatics that we don’t think about it much (even when we should, and when we shouldn’t resolve inner nodes when we are really not sure how to resolve them).

Anyway, here at BiRC we got interested in it because we had three Master’s students who wanted to do some algorithmics in bioinformatics and we asked them to look at the quartet distance and see what they could do we general trees.

That turned out extremely successfully. They worked very well together, and in short time produced several algorithms and got them written down in papers:

**Algorithms for Computing the Quartet Distance between Trees of Arbitrary Degree***C. Christiansen, T. Mailund, C.N.S. Pedersen, and M. Randers*Proceedings of the Workshop on Algorithms in Bioinformatics (WABI), 2005, LNBI 3692, pp. 77-88-
**Computing the Quartet Distance between Evolutionary Trees of Bounded Degree***M. Stissing, C.N.S. Pedersen, T. Mailund, G.S. Brodal, and R. Fagerberg**Proceedings of the Asia-Pacific Bioinformatics Conference (APBC) 2007, pp. 101–110* **Fast Calculation of the Quartet Distance Between Trees of Arbitrary Degrees***C. Christiansen, T. Mailund, C.N.S. Pedersen, M. Randers, and M. Stissing*Algorithms for Molecular Biology 2006,**1**(16); doi:10.1186/1748-7188-1-16.

The first paper describes two algorithms for computing the quartet distance between general degree trees. One independent of the maximal degree, running in time $$O(n^3)$$, the other dependant on it, running in time $$O(n^2d^2)$$ where $$d$$ is the maximal degree. Both algorithms are dynamic programming algorithms.

The second paper extends the $$O(n\log{}n)$$ algorithm for binary trees to general trees, but at a running time of $$O(d^9n\log n)$$ — so sub quadratic in $$n$$ but with a rather high complexity in $$d^9$$ (to say nothing of the “mental complexity” of the algorithm. I would certainly not like to have to implement it).

The third paper uses a lot of combinatorical tricks to reduce the $$O(n^2d^2)$$ algorithm from the first paper to $$O(n^2d)$$.

### Our new algorithm

Our new algorithm isn’t really that new at all. It is an algorithm I came up with on the train back from a workshop where I had presented an earlier version of the algorithms in the first paper above.

I couldn’t get the complexity analysis right, though, and concluded that it was probably $$O(n^2d^2)$$ so no improvement on what we already had. Thus I just put it in a drawer and forgot about it.

At least until Christian forwarded a call-for-papers to me and asked me if I had any old results lying around. I digged through my old abandoned ideas and found this one. Since it is some rather simple combinatorics, similar to the dynamic programming algorithms we had in the very first paper, it might be possible come up with a simpler $$O(n^2d)$$, algorithm than the one we had, I thought, or maybe just consider a sub-set of trees, like all d-ary (not necessarily binary) trees.

I didn’t have much time to put into it, but Jesper, a PhD student here at BiRC did, so I gave it to him. He basically just re-did the complexity analysis and concluded that the running time was $$O(n^2)$$ so without further ado we sent in the paper, and now we (well, most likely Jesper) will present it in Shanghai this summer.

—

94-116=-22