Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombination Event

  • Yun S. Song
  • Yufeng Wu
  • Dan Gusfield
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3692)


The haplotype inference (HI) problem is the problem of inferring 2n haplotype pairs from n observed genotype vectors. This is a key problem that arises in studying genetic variation in populations, for example in the ongoing HapMap project [5]. In order to have a hope of finding the haplotypes that actually generated the observed genotypes, we must use some (implicit or explicit) genetic model of the evolution of the underlying haplotypes. The Perfect Phylogeny Haplotyping (PPH) model was introduced in 2002 [9] to reflect the “neutral coalescent” or “perfect phylogeny” model of haplotype evolution. The PPH problem (which can be solved in polynomial time) is to determine whether there is an HI solution where the inferred haplotypes can be derived on a perfect phylogeny (tree).

Since the introduction of the PPH model, several extensions and modifications of the PPH model have been examined. The most important modification, to model biological reality better, is to allow a limited number of biological events that violate the perfect phylogeny model. This was accomplished implicitly in [7,12] with the inclusion of several heuristics into an algorithm for the PPH problem [8]. Those heuristics are invoked when the genotype data cannot be explained with haplotypes that fit the perfect phylogeny model. In this paper, we address the issue explicitly, by allowing one recombination or homoplasy event in the model of haplotype evolution. We formalize the problems and provide a polynomial time solution for one problem, using an additional, empirically-supported assumption. We present a related framework for the second problem which gives a practical algorithm. We believe the second problem can be solved in polynomial time.


Phylogenetic Network Haplotype Inference Ancestral Recombination Graph Coalescent Model Leaf Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allen, B.L., Steel, M.: Subtree transfer operations and their induced metrics on evolutionary trees. Ann. Combin. 5, 1–13 (2001)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Bafna, V., Gusfield, D., Lancia, G., Yooseph, S.: Haplotyping as perfect phylogeny: A direct approach. J. Comput. Biol. 10, 323–340 (2003)CrossRefGoogle Scholar
  3. 3.
    Barzuza, T., Beckman, J.S., Shamir, R., Pe’er, I.: Computational problems in perfect phylogeny haplotyping: XOR genotypes and tag SNPs. In: Proc. of CPM, pp. 14–31 (2004)Google Scholar
  4. 4.
    Chung, R.H., Gusfield, D.: Empirical exploration of perfect phylogeny haplotyping and haplotypers. In: Warnow, T.J., Zhu, B. (eds.) COCOON 2003. LNCS, vol. 2697, pp. 5–19. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    International HapMap Consortium. The HapMap project. Nature 426, 789–796 (2003)Google Scholar
  6. 6.
    Ding, Z., Filkov, V., Gusfield, D.: A linear-time algorithm for the perfect phylogeny haplotyping problem. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 585–600. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Eskin, E., Halperin, E., Karp, R.: Large scale reconstruction of haplotypes from genotype data. In: Proc. of RECOMB, pp. 104–113 (2003)Google Scholar
  8. 8.
    Eskin, E., Halperin, E., Karp, R.M.: Efficient reconstruction of haplotype structure via perfect phylogeny. J. Bioinf. Comput. Biol. 1, 1–20 (2003)CrossRefGoogle Scholar
  9. 9.
    Gusfield, D.: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions (Extended Abstract). In: Proc. of RECOMB, pp. 166–175 (2002)Google Scholar
  10. 10.
    Gusfield, D.: Optimal, efficient reconstruction of Root-Unknown phylogenetic networks with constrained recombination. J. Comput. Sys. Sci. 70, 381–398 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Gusfield, D., Eddhu, S., Langley, C.: Optimal, efficient reconstruction of phylogenetic networks with constrained recombination. J. Bioinf. Comput. Biol. 2(1), 173–213 (2004)CrossRefGoogle Scholar
  12. 12.
    Halperin, E., Eskin, E.: Haplotype reconstruction from genotype data using Imperfect Phylogeny. Bioinformatics 20, 1842–1849 (2004)CrossRefGoogle Scholar
  13. 13.
    Hein, J.: Reconstructing evolution of sequences subject to recombination using parsimony. Math. Biosci. 98, 185–200 (1990)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Hudson, R.: Gene genealogies and the coalescent process. Oxford Survey of Evolutionary Biology 7, 1–44 (1990)Google Scholar
  15. 15.
    Hudson, R.: Generating samples under the Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002)CrossRefGoogle Scholar
  16. 16.
    Lin, S., Cutler, D.J., Zwick, M.E., Chakravarti, A.: Haplotype inference in random population samples. Am. J. Hum. Genet. 71, 1129–1137 (2002)CrossRefGoogle Scholar
  17. 17.
    Semple, C., Steel, M.: Phylogenetics. Oxford University Press, Oxford (2003)zbMATHGoogle Scholar
  18. 18.
    Song, Y.S.: On the combinatorics of rooted binary phylogenetic trees. Ann. Combin. 7, 365–379 (2003)zbMATHCrossRefGoogle Scholar
  19. 19.
    Song, Y.S., Hein, J.: Constructing minimal ancestral recombination graphs. J. Comput. Biol. 12, 147–169 (2005)CrossRefGoogle Scholar
  20. 20.
    Stephens, M., Smith, N.J., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001)CrossRefGoogle Scholar
  21. 21.
    Tavaré, S.: Calibrating the clock: Using stochastic processes to measure the rate of evolution. In: Lander, E., Waterman, M. (eds.) Calculating the Secrets of Life. National Academy Press, Washington (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Yun S. Song
    • 1
  • Yufeng Wu
    • 1
  • Dan Gusfield
    • 1
  1. 1.Department of Computer ScienceUniversity of CaliforniaDavisUSA

Personalised recommendations