Skip to main content

Integer Programming Formulations and Computations Solving Phylogenetic and Population Genetic Problems with Missing or Genotypic Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4598))

Abstract

Several central and well-known combinatorial problems in phylogenetics and population genetics have efficient, elegant solutions when the input is complete or consists of haplotype data, but lack efficient solutions when input is either incomplete, consists of genotype data, or is for problems generalized from decision questions to optimization questions. Unfortunately, in biological applications, these harder problems arise very often. Previous research has shown that integer-linear programming can sometimes be used to solve hard problems in practice on a range of data that is realistic for current biological applications. Here, we describe a set of related integer linear programming (ILP) formulations for several additional problems, most of which are known to be NP-hard. These ILP formulations address either the issue of missing data, or solve Haplotype Inference Problems with objective functions that model more complex biological phenomena than previous formulations. These ILP formulations solve efficiently on data whose composition reflects a range of data of current biological interest. We also assess the biological quality of the ILP solutions: some of the problems, although not all, solve with excellent quality. These results give a practical way to solve instances of some central, hard biological problems, and give practical ways to assess how well certain natural objective functions reflect complex biological phenomena. Perl code to generate the ILPs (for input to CPLEX) is on the web at wwwcsif.cs.ucdavis.edu/ gusfield.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bafna, V., Bansal, V.: Improved recombination lower bounds for haplotype data. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, Springer, Heidelberg (2005)

    Google Scholar 

  2. Bafna, V., Gusfield, D., Hannenhalli, S., Yooseph, S.: A note on efficient computation of haplotypes via perfect phylogeny. Journal of Computational Biology 11(5), 858–866 (2004)

    Article  Google Scholar 

  3. Brown, D., Harrower, I.: A new formulation for haplotype inference by pure parsimony. report cs-2005-03. Technical report, University of Waterloo, School of Computer Science (2005)

    Google Scholar 

  4. Brown, D.G., Harrower, I.M.: Integer Programming Approaches to Haplotype Inference by Pure Parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(2), 141–154 (2006)

    Article  Google Scholar 

  5. International HapMap Consortium.: A haplotype map of the human genome. Nature 437 1299–1320 (2005)

    Google Scholar 

  6. Ding, Z., Filkov, V., Gusfield, D.: A linear-time algorithm for the perfect phylogeny haplotyping problem. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 585–600. Springer, Heidelberg (2005)

    Google Scholar 

  7. Felsenstein, J.: Inferring Phylogenies. Sinauer, Sunderland, MA (2004)

    Google Scholar 

  8. Gusfield, D.: Efficient algorithms for inferring evolutionary history. Networks 21, 19–28 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  9. Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  10. Gusfield, D.: Haplotyping as Perfect Phylogeny: Conceptual Framework and Efficient Solutions (Extended Abstract). In: Proceedings of RECOMB 2002: The Sixth Annual International Conference on Computational Biology, pp. 166–175 (2002)

    Google Scholar 

  11. Gusfield, D.: Haplotype inference by pure parsimony. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Gusfield, D., Orzack, S.: Haplotype inference. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology, vol. 18, pp. 1–25. Chapman and Hall/CRC, Boca Raton (2005)

    Google Scholar 

  13. Halperin, E., Eskin, E.: Haplotype reconstruction from genotype data using Imperfect Phylogeny. Bioinformatics 20, 1842–1849 (2004)

    Article  Google Scholar 

  14. Hein, J., Schierup, M., Wiuf, C.: Gene Genealogies, Variation and Evolution: A primer in coalescent theory. Oxford University Press, Oxford (2005)

    MATH  Google Scholar 

  15. Hudson, R.: Generating samples under the Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2), 337–338 (2002)

    Article  Google Scholar 

  16. Hudson, R., Kaplan, N.: Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164 (1985)

    Google Scholar 

  17. Kimmel, G., Shamir, R.: GERBIL: Genotype resolution and block identification using likelihood. PNAS 102, 158–162 (2005)

    Article  Google Scholar 

  18. Lancia, G., Pinotti, C., Rizzi, R.: Haplotyping populations by pure parsimony: Complexity, exact and approximation algorithms. INFORMS J. on Computing, special issue on Computational Biology 16, 348–359 (2004)

    MathSciNet  Google Scholar 

  19. Lin, S., Cutler, D., Zwick, M., Chakravarti, A.: Haplotype inference in random population samples. Am. J. of Hum. Genet. 71, 1129–1137 (2002)

    Article  Google Scholar 

  20. Marchini, J., Donnelly, P., et al.: A comparison of phasing algorithms for trios and unrelated individuals. Am. J. of Human Genetics 78, 437–450 (2006)

    Article  Google Scholar 

  21. Pe’er, I., Pupko, T., Shamir, R., Sharan, R.: Incomplete directed perfect phylogeny. SIAM J. on Computing 33, 590–607 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  22. Satya, R.V., Mukherjee, A.: An optimal algorithm for perfect phylogeny haplotyping. In: Proceedings of 4th CSB Bioinformatics Conference, IEEE Computer Society Press, Los Alamitos (2005)

    Google Scholar 

  23. Satya, R.V., Mukherjee, A., Alexe, G., Parida, L., Bhanot, G.: Constructing near-perfect phylogenies with multiple homoplasy events. Bioinformatics 22, 514–522 (2006) Bioinformatics Suppl., Proceedings of ISMB 2006

    Article  Google Scholar 

  24. Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Human Genetics 78, 629–644 (2006)

    Article  Google Scholar 

  25. Semple, C., Steel, M.: Phylogenetics. Oxford University Press, Oxford (2003)

    MATH  Google Scholar 

  26. Song, Y.S., Wu, Y., Gusfield, D.: Haplotyping with one homoplasy or recombination event. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, Springer, Heidelberg (2005)

    Google Scholar 

  27. Steel, M.: The complexity of reconstructing trees from qualitative characters and subtrees. J. of Classification 9, 91–116 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  28. Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. Am. J. Human Genetics 68, 978–989 (2001)

    Article  Google Scholar 

  29. Wiuf, C.: Inference of recombination and block structure using unphased data. Genetics 166, 537–545 (2004)

    Article  Google Scholar 

  30. Wu, Y.: Personal Communication

    Google Scholar 

  31. Wu, Y., Gusfield, D.: Efficient computation of minimum recombination over genotypes (not haplotypes). In: Proceedings of Life Science Society Computational Systems Bioinformatics (CSB) 2006, pp. 145–156 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Guohui Lin

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gusfield, D., Frid, Y., Brown, D. (2007). Integer Programming Formulations and Computations Solving Phylogenetic and Population Genetic Problems with Missing or Genotypic Data. In: Lin, G. (eds) Computing and Combinatorics. COCOON 2007. Lecture Notes in Computer Science, vol 4598. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73545-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73545-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73544-1

  • Online ISBN: 978-3-540-73545-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics