A Hidden Markov Technique for Haplotype Reconstruction

  • Pasi Rastas
  • Mikko Koivisto
  • Heikki Mannila
  • Esko Ukkonen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3692)


We give a new algorithm for the genotype phasing problem. Our solution is based on a hidden Markov model for haplotypes. The model has a uniform structure, unlike most solutions proposed so far that model recombinations using haplotype blocks. In our model, the haplotypes can be seen as a result of iterated recombinations applied on a few founder haplotypes. We find maximum likelihood model of this type by using the EM algorithm. We show how to solve the subtleties of the EM algorithm that arise when genotypes are generated using a haplotype model. We compare our method to the well-known currently available algorithms (phase, hap, gerbil) using some standard and new datasets. Our algorithm is relatively fast and gives results that are always best or second best among the methods compared.


Hide Markov Model Haplotype Block Emission Probability Hide Data Haplotype Inference 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Clark, A.G.: Inference of haplotypes from PCR-amplified samples of dipoid populations. Molecular Biology and Evolution 7, 111–122 (1990)Google Scholar
  2. 2.
    Gusfield, D.: Haplotype inference by pure parsimony. Technical Report CSE-2003-2, Department of Computer Science, University of California (2003)Google Scholar
  3. 3.
    Excoffier, L., Slatkin, M.: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution 12, 921–927 (1995)Google Scholar
  4. 4.
    Long, J.C., Williams, R.C., Urbanek, M.: An E-M algorithm and testing strategy for multiple-locus haplotypes. American Journal of Human genetics 56, 799–810 (1995)Google Scholar
  5. 5.
    Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978–989 (2001)CrossRefGoogle Scholar
  6. 6.
    Niu, T., Qin, Z., Xu, X., Liu, J.: Bayesian haplotype inference for multiple linked single nucleotide polymorphisms. American Journal of Human Genetics 70, 157–169 (2002)CrossRefGoogle Scholar
  7. 7.
    Gusfield, D.: Haplotyping as perfect phylogeny: conceptual framework and efficient solutions. In: Research in Computational Molecular Biology (RECOMB 2002), pp. 166–175. ACM Press, New York (2002)Google Scholar
  8. 8.
    Greenspan, G., Geiger, D.: Model-based inference of haplotype block variation. In: Research in Computational Molecular Biology (RECOMB 2003), pp. 131–137. ACM Press, New York (2003)Google Scholar
  9. 9.
    Kimmel, G., Shamir, R.: Maximum likelihood resolution of multi-block genotypes. In: Research in Computational Molecular Biology (RECOMB 2004), pp. 2–9. ACM Press, New York (2004)Google Scholar
  10. 10.
    Kimmel, G., Shamir, R.: Genotype resolution and block identification using likelihood. Proceeding of the National Academy of Sciences of the United States of America (PNAS) 102, 158–162 (2005)CrossRefGoogle Scholar
  11. 11.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–285 (1989)CrossRefGoogle Scholar
  12. 12.
    Halperin, E., Eskin, E.: Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics 20, 104–113 (2004)CrossRefGoogle Scholar
  13. 13.
    Lin, S., Cutler, D.J., Zwick, M.E., Chakravarti, A.: Haplotype inference in random population samples. American Journal of Human Genetics 71, 1129–1137 (2002)CrossRefGoogle Scholar
  14. 14.
    Schwartz, R., Clark, A.G., Istrail, S.: Methods for inferring block-wise ancestral history from haploid sequences. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 44–59. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  15. 15.
    Jojic, N., Jojic, V., Heckerman, D.: Joint discovery of haplotype blocks and complex trait associations from snp sequences. In: Proceedings of the 20th conference on Uncertainty in artificial intelligence (UAI 2004), pp. 286–292. AUAI Press (2004)Google Scholar
  16. 16.
    Ukkonen, E.: Finding founder sequences from a set of recombinants. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 277–286. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  17. 17.
    McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. John Wiley and Sons, Chichester (1996)Google Scholar
  18. 18.
    Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York (1982)zbMATHGoogle Scholar
  19. 19.
    Daly, M.J., Rioux, J.D., Schaffner, S.F., et al.: High-resolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)CrossRefGoogle Scholar
  20. 20.
    Hinds, D.A., Stuve, L.L., Nilsen, G.B., et al.: Whole-genome patterns of common dna variation in three human populations. Science 307, 1072–1079 (2005)CrossRefGoogle Scholar
  21. 21.
    Koivisto, M., Perola, M., Varilo, T., et al.: An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries. In: Pacific Symposium on Biocomputing (PSB 2003), pp. 502–513. World Scientific, Singapore (2003)Google Scholar
  22. 22.
    Stephens, M., Scheet, P.: Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Americal Journal of Human Genetics 76, 449–462 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Pasi Rastas
    • 1
  • Mikko Koivisto
    • 1
  • Heikki Mannila
    • 1
  • Esko Ukkonen
    • 1
  1. 1.Department of Computer Science & HIIT Basic Research UnitUniversity of HelsinkiFinland

Personalised recommendations