Reducing Multi-state to Binary Perfect Phylogeny with Applications to Missing, Removable, Inserted, and Deleted Data

  • Kristian Stevens
  • Dan Gusfield
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6293)


Multi-State Perfect Phylogeny is an extension of Binary Perfect Phylogeny where characters are allowed more than two states. In this paper we consider four problems that extend its utility: In the Missing Data (MD) Problem some entries in the input are missing and the question is whether (bounded) values can be imputed so that the resulting data has a multi-state Perfect Phylogeny; In the Character-Removal (CR) Problem we want to minimize the number of characters to remove from the data so that the resulting data has a multi-state Perfect Phylogeny; In the Missing-Data Character-Removal (MDCR) Problem we want to impute values for the missing data to minimize the solution to the resulting Character-Removal Problem; In the Insertion and Deletion (ID) Problem insertion and deletion mutational events spanning multiple characters are also allowed.

In this paper, we introduce a new general conceptual solution to these four problems. The method reduces k-state problems to binary problems with missing data. This gives a new conceptual solution to the multi-state Perfect Phylogeny problem, and conceptual solutions to the MD, CR, MDCR and ID problems for any k significantly improving previous work. Empirical evaluations of our implementations show that they are faster and effective for larger input than previously established methods for general k.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwala, R., Fernandez-Baca, D.: A polynomial-time algorithm for the perfect phylogeny problem when the number of character states is fixed. SIAM Journal on Computing 23(6), 1216–1224 (1994)CrossRefGoogle Scholar
  2. 2.
    Alekseyenko, A.V., Lee, C.J., Suchard, M.A.: Wagner and dollo: a stochastic duet by composing two parsimonious solos. Syst. Biol. 57(5), 772–784 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Buneman, P.: The recovery of trees from measures of dissimilarity. Mathematics in the archaeological and historical sciences, 387–395 (1971)Google Scholar
  4. 4.
    Fernández-Baca, D.: The perfect phylogeny problem. In: Du, D.Z., Cheng, X. (eds.) Steiner Trees in Industries. Kluwer Academic Publishers, Dordrecht (2001)Google Scholar
  5. 5.
    Gusfield, D.: Efficient algorithms for inferring evolutionary trees. Networks 21(1), 19–28 (1991)CrossRefGoogle Scholar
  6. 6.
    Gusfield, D.: The multi-state perfect phylogeny problem with missing and removable data: Solutions via integer-programming and chordal graph theory. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 236–252. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  7. 7.
    Gusfield, D., Frid, Y., Brown, D.: Integer Programming Formulations and Computations Solving Phylogenetic and Population Genetic Problems with Missing or Genotypic Data. In: Lin, G. (ed.) COCOON 2007. LNCS, vol. 4598, p. 51. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Gysel, R., Gusfield, D.: Extensions and Improvements to the Chordal Graph Approach to the Multi-state Perfect Phylogeny Problem. In: Borodovsky, M., Gogarten, J.P., Przytycka, T.M., Rajasekaran, S. (eds.) Bioinformatics Research and Applications. LNCS, vol. 6053, pp. 52–60. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Halperin, E., Karp, R.: Perfect phylogeny and haplotype assignment. In: Proceedings of the eighth annual international conference on Resaerch in computational molecular biology, pp. 10–19. ACM, New York (2004)Google Scholar
  10. 10.
    Hudson, R.: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2), 337–338 (2002)CrossRefPubMedGoogle Scholar
  11. 11.
    Kannan, S., Warnow, T.: Inferring evolutionary history from DNA sequences. In: Proceedings of 31st Annual Symposium on Foundations of Computer Science, pp. 362–371 (1990)Google Scholar
  12. 12.
    Kannan, S., Warnow, T.: A fast algorithm for the computation and enumeration of perfect phylogenies when the number of character states is fixed. In: Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms, pp. 595–603. Society for Industrial and Applied Mathematics, Philadelphia (1995)Google Scholar
  13. 13.
    Lloyd, D.: Multi-residue gaps, a class of molecular characters with exceptional reliability for phylogenetic analyses. Journal of Evolutionary Biology 4(1), 9–21 (2002)CrossRefGoogle Scholar
  14. 14.
    Pe’er, I., Pupko, T., Shamir, R., Sharan, R.: Incomplete directed perfect phylogeny. SIAM Journal on Computing 33(3), 590–607 (2004)CrossRefGoogle Scholar
  15. 15.
    Satya, R., Mukherjee, A.: The undirected incomplete perfect phylogeny problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics 5(4), 618–629 (2008)CrossRefPubMedGoogle Scholar
  16. 16.
    Semple, C., Steel, M.: Phylogenetics. Oxford University Press, USA (2003)Google Scholar
  17. 17.
    Simmons, M., Ochoterena, H.: Gaps as characters in sequence-based phylogenetic analyses. Systematic Biology 49(2), 369–381 (2000)CrossRefPubMedGoogle Scholar
  18. 18.
    Steel, M.: The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification 9(1), 91–116 (1992)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Kristian Stevens
    • 1
  • Dan Gusfield
    • 1
  1. 1.Department of Computer ScienceUniversity of CaliforniaDavisUSA

Personalised recommendations