Skip to main content

Identifying Rogue Taxa through Reduced Consensus: NP-Hardness and Exact Algorithms

  • Conference paper
Book cover Bioinformatics Research and Applications (ISBRA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7292))

Included in the following conference series:

Abstract

A rogue taxon in a collection of phylogenetic trees is one whose position varies drastically from tree to tree. The presence of such taxa can greatly reduce the resolution of the consensus tree (e.g., the majority-rule or strict consensus) for a collection. The reduced consensus approach aims to identify and eliminate rogue taxa to produce more informative consensus trees. Given a collection of phylogenetic trees over the same leaf set, the goal is to find a set of taxa whose removal maximizes the number of internal edges in the consensus tree of the collection. We show that this problem is NP-hard for strict and majority-rule consensus. We give a polynomial-time algorithm for reduced strict consensus when the maximum degree of the strict consensus of the original trees is bounded. We describe exact integer linear programming formulations for computing reduced strict, majority and loose consensus trees. In experimental tests, our exact solutions improved over heuristic methods on several problem instances.

Supported in part by National Science Foundation grants DEB-0830012 and CCF-106029.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amenta, N., Clarke, F., John, K.S.: A Linear-Time Majority Tree Algorithm. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 216–227. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  2. Amir, A., Keselman, D.: Maximum agreement subtree in a set of evolutionary trees. SIAM Journal on Computing 26, 758–769 (1994)

    MathSciNet  Google Scholar 

  3. Bryant, D.: A classification of consensus methods for phylogenetics. In: Janowitz, M., Lapointe, F.-J., McMorris, F., Mirkin, B.B., Roberts, F. (eds.) Bioconsensus. Discrete Mathematics and Theoretical Computer Science, vol. 61, pp. 163–185. American Mathematical Society, Providence (2003)

    Google Scholar 

  4. Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining — an overview. Fundamenta Informaticae 66(1-2), 161–198 (2004)

    MathSciNet  Google Scholar 

  5. Cranston, K.A., Rannala, B.: Summarizing a posterior distribution of trees using agreement subtrees. Systematic Biology 56(4), 578 (2007)

    Article  Google Scholar 

  6. Dong, J., Fernández-Baca, D.: Constructing Large Conservative Supertrees. In: Przytycka, T.M., Sagot, M.-F. (eds.) WABI 2011. LNCS, vol. 6833, pp. 61–72. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Dong, J., Fernández-Baca, D., McMorris, F.R.: Constructing majority-rule supertrees. Algorithms in Molecular Biology 5(2) (2010)

    Google Scholar 

  8. Farach, M., Przytycka, T.M., Thorup, M.: On the agreement of many trees. Inf. Process. Lett. 55(6), 297–301 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  9. Finden, C.R., Gordon, A.D.: Obtaining common pruned trees. Journal of Classification 2(1), 255–276 (1985)

    Article  Google Scholar 

  10. Gusfield, D., Frid, Y., Brown, D.: Integer Programming Formulations and Computations Solving Phylogenetic and Population Genetic Problems with Missing or Genotypic Data. In: Lin, G. (ed.) COCOON 2007. LNCS, vol. 4598, pp. 51–64. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations. Plenum, New York (1972)

    Google Scholar 

  12. Lee, C.-M., Hung, L.-J., Chang, M.-S., Shen, C.-B., Tang, C.-Y.: An improved algorithm for the maximum agreement subtree problem. Information Processing Letters 94(5), 211–216 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  13. Margush, T., McMorris, F.R.: Consensus n-trees. Bulletin of Mathematical Biology 43(2), 239–244 (1981)

    MathSciNet  MATH  Google Scholar 

  14. Nadler, S.A., Carreno, R.A., Mejía-Madrid, H., Ullberg, J., Pagan, C., Houston, R., Hugot, J.P.: Molecular phylogeny of clade III nematodes reveals multiple origins of tissue parasitism. Parasitology 134(10), 1421–1442 (2007)

    Article  Google Scholar 

  15. Pattengale, N., Aberer, A., Swenson, K., Stamatakis, A., Moret, B.: Uncovering hidden phylogenetic consensus in large datasets. IEEE/ACM Trans. Comput. Biol. Bioinformatics 8-4(99), 1 (2011)

    Google Scholar 

  16. Redelings, B.: Bayesian phylogenies unplugged: Majority consensus trees with wandering taxa (2009)

    Google Scholar 

  17. Semple, C., Steel, M.: Phylogenetics. Oxford Lecture Series in Mathematics. Oxford University Press, Oxford (2003)

    MATH  Google Scholar 

  18. Sridhar, S., Lam, F., Blelloch, G.E., Ravi, R., Schwartz, R.: Mixed integer linear programming for maximum-parsimony phylogeny inference. IEEE/ACM Trans. Comput. Biol. Bioinformatics 5(3), 323–331 (2008)

    Article  Google Scholar 

  19. Sullivan, J., Swofford, D.L.: Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. Journal of Mammalian Evolution 4(2), 77–86 (1997)

    Article  Google Scholar 

  20. Swenson, K.M., Chen, E., Pattengale, N.D., Sankoff, D.: The Kernel of Maximum Agreement Subtrees. In: Chen, J., Wang, J., Zelikovsky, A. (eds.) ISBRA 2011. LNCS, vol. 6674, pp. 123–135. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  21. Thomson, R.C., Shaffer, H.B.: Sparse supermatrices for phylogenetic inference: taxonomy, alignment, rogue taxa, and the phylogeny of living turtles. Systematic Biology 59(1), 42 (2010)

    Article  Google Scholar 

  22. Wilkinson, M.: Common cladistic information and its consensus representation: reduced Adams and reduced cladistic consensus trees and profiles. Systematic Biology 43(3), 343 (1994)

    Google Scholar 

  23. Wilkinson, M.: More on reduced consensus methods. Systematic Biology 44(3), 435 (1995)

    Google Scholar 

  24. Wilkinson, M.: Majority-rule reduced consensus trees and their use in bootstrapping. Molecular Biology and Evolution 13(3), 437 (1996)

    Article  MathSciNet  Google Scholar 

  25. Xiao, Y., Yao, J.F.: Efficient data mining for maximal frequent subtrees. In: Proc. IEEE International Conference on Data Mining, pp. 379–386. IEEE (2003)

    Google Scholar 

  26. Zaki, M.J.: Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Trans. on Knowl. and Data Eng. 17(8), 1021–1035 (2005)

    Article  Google Scholar 

  27. Zhang, S., Wang, J.T.L.: Discovering frequent agreement subtrees from phylogenetic data. IEEE Trans. on Knowl. and Data Eng. 20, 68–82 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Deepak, A., Dong, J., Fernández-Baca, D. (2012). Identifying Rogue Taxa through Reduced Consensus: NP-Hardness and Exact Algorithms. In: Bleris, L., Măndoiu, I., Schwartz, R., Wang, J. (eds) Bioinformatics Research and Applications. ISBRA 2012. Lecture Notes in Computer Science(), vol 7292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30191-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30191-9_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30190-2

  • Online ISBN: 978-3-642-30191-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics