Hap-seq: An Optimal Algorithm for Haplotype Phasing with Imputation Using Sequencing Data

  • Dan He
  • Buhm Han
  • Eleazar Eskin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7262)


Inference of haplotypes, or the sequence of alleles along each chromosome, is a fundamental problem in genetics and is important for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Traditionally, haplotypes were inferred from genotype data obtained from microarrays utilizing information on population haplotype frequencies inferred from either a large sample of genotyped individuals or a reference dataset such as the HapMap. Since the availability of large reference datasets, modern approaches for haplotype phasing along these lines are closely related to imputation methods. When applied to data obtained from sequencing studies, a straightforward way to obtain haplotypes is to first infer genotypes from the sequence data and then apply an imputation method. However, this approach does not take into account that alleles on the same sequence read originate from the same chromosome. Haplotype assembly approaches take advantage of this insight and predict haplotypes by assigning the reads to chromosomes in such a way that minimizes the number of conflicts between the reads and the predicted haplotypes. Unfortunately, assembly approaches require very high sequencing coverage and are usually not able to fully reconstruct the haplotypes. In this work, we present a novel approach, Hap-seq, which is simultaneously an imputation and assembly method which combines information from a reference dataset with the information from the reads using a likelihood framework. Our method applies a dynamic programming algorithm to identify the predicted haplotype which maximizes the joint likelihood of the haplotype with respect to the reference dataset and the haplotype with respect to the observed reads. We show that our method requires only low sequencing coverage and can reconstruct haplotypes containing both common and rare alleles with higher accuracy compared to the state-of-the-art imputation methods.


Haplotype Phasing Imputation Dynamic Programming Hidden Markov Model Genetic Variation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A deep catalog of human genetic variation (2010),
  2. 2.
    Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153 (2008)CrossRefGoogle Scholar
  3. 3.
    Bansal, V., Halpern, A.L., Axelrod, N., Bafna, V.: An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research 18(8), 1336 (2008)CrossRefGoogle Scholar
  4. 4.
    Beckmann, L.: Haplotype Sharing Methods. In: Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd., Chichester (2010)Google Scholar
  5. 5.
    Browning, B.L., Browning, S.R.: A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88(2), 173–182 (2011)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Browning, S.R., Browning, B.L.: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81(5), 1084–1097 (2007)CrossRefGoogle Scholar
  7. 7.
    Browning, S.R., Browning, B.L.: High-resolution detection of identity by descent in unrelated individuals. Am. J. Hum. Genet. 86(4), 526–539 (2010)CrossRefGoogle Scholar
  8. 8.
    Clark, A.G.: Inference of haplotypes from pcr-amplified samples of diploid populations. Mol. Biol. Evol. 7(2), 111–122 (1990)Google Scholar
  9. 9.
    Eskin, E., Halperin, E., Karp, R.M.: Efficient reconstruction of haplotype structure via perfect phylogeny. International Journal of Bioinformatics and Computational Biology 1(1), 1–20 (2003)CrossRefGoogle Scholar
  10. 10.
    Gusev, A., Lowe, J.K., Stoffel, M., Daly, M.J., Altshuler, D., Breslow, J.L., Friedman, J.M., Pe’er, I.: Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19(2), 318–326 (2009)CrossRefGoogle Scholar
  11. 11.
    Gusfield, D.: Haplotype Inference by Pure Parsimony. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Halperin, E., Eskin, E.: Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics 20(12), 1842–1849 (2004)CrossRefGoogle Scholar
  13. 13.
    He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., Eskin, E.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26(12), i183 (2010)CrossRefGoogle Scholar
  14. 14.
    International HapMap Consortium: A second generation human haplotype map of over 3.1 million snps. Nature 449(7164), 851–861 (2007)CrossRefGoogle Scholar
  15. 15.
    Kang, H.M., Zaitlen, N.A., Eskin, E.: Eminim: An adaptive and memory-efficient algorithm for genotype imputation. Journal of Computational Biology 17(3), 547–560 (2010)CrossRefGoogle Scholar
  16. 16.
    Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P., Axelrod, N., Huang, J., Kirkness, E.F., Denisov, G., et al.: The diploid genome sequence of an individual human. PLoS Biol. 5(10), e254 (2007)CrossRefGoogle Scholar
  17. 17.
    Li, Y., Willer, C.J., Ding, J., Scheet, P., Abecasis, G.R.: Mach: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34(8), 816–834 (2010)CrossRefGoogle Scholar
  18. 18.
    Marchini, J., Howie, B., Myers, S., McVean, G., Donnelly, P.: A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics 39(7), 906–913 (2007)CrossRefGoogle Scholar
  19. 19.
    Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P., et al.: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294(5547), 1719 (2001)CrossRefGoogle Scholar
  20. 20.
    Patterson, N., Hattangadi, N., Lane, B., Lohmueller, K.E., Hafler, D.A., Oksenberg, J.R., Hauser, S.L., Smith, M.W., O’Brien, S.J., Altshuler, D., et al.: Methods for high-density admixture mapping of disease genes. The American Journal of Human Genetics 74(5), 979–1000 (2004)CrossRefGoogle Scholar
  21. 21.
    Stephens, M., Smith, N.J., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. The American Journal of Human Genetics 68(4), 978–989 (2001)CrossRefGoogle Scholar
  22. 22.
    Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A., He, W., Chen, Y.J., Makhijani, V., Roth, G.T., et al.: The complete genome of an individual by massively parallel DNA sequencing. Nature 452(7189), 872–876 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Dan He
    • 1
  • Buhm Han
    • 1
  • Eleazar Eskin
    • 1
  1. 1.Computer Science Dept.Univ. of CaliforniaLos AngelesUSA

Personalised recommendations