You know, people do use neighbour joining!

Over the last couple of years, I have done a little work on phylogeny inference, including a few papers on neighbour joining.  One thing that consistently happens when you submit a paper on this -- and I bring it up because I have just gotten back reviewer reports on such a  paper -- is that at least one reviewer will tell you that neighbour joining is not interesting and one should focus on maximum likelihood / Bayesian trees instead.

Sorry to say it, but people do use neighbour joining -- I am willing to bet that there are ten times as many people using neighbour joining to infer trees than there are people using the statistical approaches -- so algorithmical improvements here do matter!

The statistical approaches are usually more accurate, and they are better at capturing the uncertainty in the inference and such, but they are slow! Not slow as in, "I'll go get a cup of coffee while the program finish", but slow as in "I'll look at the tree when I am back from my vacation".

Sure, they are fast enough for tens of leaves, but some people infer trees with thousands of leaves.  I recently got an email from a guy who tried with tens of thousands of leaves and ran out of memory using one of my tools -- it needed more than 4G so it chocked on the problem (but a student in our lap has now come up with a new algorithm that is less memory expensive so that should solve that problem).

For large trees, forget about ML or Bayesian approaches.  They do not scale (yet).

People do use neighbour joining, so shut up and review the paper for what it is, not what you want it to be. Grrr!

Tags: , , ,

3 Responses to “You know, people do use neighbour joining!”

  1. Bob O'H Says:

    Hear, hear!

    Of course this does raise questions about whether the trees are any good, but I guess you jut have to bear that in mind when interpreting them.

  2. Thomas Mailund Says:

    You always need to treat inferred data with a bit of caution, of course. Regardless of how you infer. For a ML tree you would have a sound statistical framework for this, while for a NJ tree you would rely on bootstrap or such.

    In any case, you always want to go for the most reliable method feasible to use, and my point is just that for large data sets, that is not ML trees. You run into a tradeoff between trusting the method or relying on more data, and usually you want to use as much data as you can get your hands on as well... Ah, I don't know... it is a tricky question ;-)

  3. Mailund on the Internet » It is not all bad news Says:

    [...] it is on neighbour-joining, we weren’t that optimistic.  We’ve had problems publishing on this before, but this time it was very well [...]

Leave a Reply