Advertisement

Haplotyping a Diploid Single Individual with a Fast and Accurate Enumeration Algorithm

  • Xixi Chen
  • Jingli WuEmail author
  • Longyu Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9771)

Abstract

The minimum error correction (MEC) model is one of the important computational models for determining haplotype information from sequencing data, i.e., single individual single nucleotide polymorphism (SNP) haplotyping, haplotype reconstruction or haplotype assembly. Due to the NP-hardness of the model, a fast and accurate enumeration algorithm is proposed for solving it. The presented algorithm reconstructs the SNP sites of a pair of haplotypes one after another. It enumerates two kinds of SNP values, i.e., (0 1)T and (1 0)T, for the SNP site being reconstructed, and chooses the one with more support coming from the SNP fragments that are covering the corresponding SNP site. The experimental comparisons were conducted among the presented algorithm, the FAHR, the Fast Hare and the DGS algorithms. The results prove that our algorithm can get higher reconstruction rate than the other three algorithms.

Keywords

Sequence analysis Haplotype Minimum error correction (MEC) Algorithm Enumeration 

Notes

Acknowledgments

The authors are grateful to anonymous referees for their helpful comments. This research is supported by the National Natural Science Foundation of China under Grant No.61363035 and No.61502111, Guangxi Natural Science Foundation under Grant No. 2015GXNSFAA139288, No. 2013GXNSFBA019263 and No. 2012GXNSFAA053219, Research Fund of Guangxi Key Lab of Multisource Information Mining & Security No. 14-A-03-02 and No. 15-A-03-02, “Bagui Scholar” Project Special Funds, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

References

  1. 1.
    Bafna, V., Istrail, S., Lancia, G., Rizzi, R.: Polynomial and APX-hard cases of the individual haplotyping problem. Theoret. Comput. Sci. 335, 109–125 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Geraci, F.: A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem. Bioinformatics 26(18), 2217–2225 (2010)CrossRefGoogle Scholar
  3. 3.
    Stephens, J.C., Schneider, J.A., Tanguay, D.A., Choi, J., Acharya, T., Stanley, S.E., Jiang, R., Messer, C.J., Chew, A., Han, J.H., Duan, J., Carr, J.L., Lee, M.S., Koshy, B., Kumar, A.M., Zhang, G., Newell, W.R., Windemuth, A., Xu, C., Kalbfleisch, T.S., Shaner, S.L., Arnold, K., Schulz, V., Drysdale, C.M., Nandabalan, K., Judson, R.S., Ruano, G., Vovis, G.F.: Haplotype variation and linkage disequilibrium in 313 human genes. Science 293, 489–493 (2001)CrossRefGoogle Scholar
  4. 4.
    Wu, J.L., Liang, B.B.: A fast and accurate algorithm for diploid individual haplotype reconstruction. J. Bioinform. Comput. Biol. 11(4), 1350010 (2013)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Clark, A.G.: Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol. 7(2), 111–122 (1990)Google Scholar
  6. 6.
    Gusfield, D.: Inference of haplotypes from samples of diploid populations: complexity and algorithms. J. Comput. Biol. 8(3), 305–324 (2001)MathSciNetCrossRefGoogle Scholar
  7. 7.
    O’Neil, S.T., Emrich, S.J.: Haplotype and minimum-chimerism consensus determination using short sequence data. BMC Genom. 13(Suppl. 2), S4 (2012)CrossRefGoogle Scholar
  8. 8.
    Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: SNPs problems, complexity, and algorithms. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 182–193. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  9. 9.
    Lippert, R., Schwartza, R., Lancia, G., Istrail, S.: Algorithmic strategies for the SNPs haplotype assembly problem. Brief. Bioinform. 3(1), 23–31 (2002)CrossRefGoogle Scholar
  10. 10.
    Xie, M.Z., Chen, J.E., Wang, J.X.: Research on parameterized algorithms of the individual haplotyping problem. J. Bioinform. Comput. Biol. 5(3), 795–816 (2007)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Xie, M.Z., Wang, J.X.: An improved (and practical) parameterized algorithm for the individual haplotyping problem MFR with mate-pairs. Algorithmica 52, 250–266 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Cilibrasi, R., Iersel, L.V., Kelk, S., Tromp, J.: The complexity of the single individual SNP haplotyping problem. Algorithmica 49(1), 13–36 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Wang, R.S., Wu, L.Y., Li, Z.P., Zhang, X.S.: Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics 21(10), 2456–2462 (2005)CrossRefGoogle Scholar
  14. 14.
    He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., Eskin, E.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26(12), i183 (2010)CrossRefGoogle Scholar
  15. 15.
    Panconesi, A., Sozio, M.: Fast Hare: a fast heuristic for single individual SNP haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 266–277. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Wang, Y., Wang, E., Wang, R.S.: A clustering algorithm based on two distance functions for MEC model. Comput. Biol. Chem. 31(2), 148–150 (2007)CrossRefzbMATHGoogle Scholar
  17. 17.
    Genovese, L.M., Geraci, F., Pellegrini, M.: SpeedHap: an accurate heuristic for the single individual SNP haplotyping problem with many gaps, high reading error rate and low coverage. IEEE/ACM Trans. Comput. Biol. Bioinform. 5(4), 492–502 (2008)CrossRefGoogle Scholar
  18. 18.
    Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P., Axelrod, N., Huang, J., Kirkness, E.F., Denisov, G., Lin, Y., MacDonald, J.R., Pang, A.W., Shago, M., Stockwell, T.B., Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S.A., Busam, D.A., Beeson, K.Y., McIntosh, T.C., Remington, K.A., Abril, J.F., Gill, J., Borman, J., Rogers, Y.H., Frazier, M.E., Scherer, S.W., Strausberg, R.L., Venter, J.C.: The diploid genome sequence of an individual human. PLoS Biol. 5(10), 2113–2144 (2007)CrossRefGoogle Scholar
  19. 19.
    Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153–i159 (2008)CrossRefGoogle Scholar
  20. 20.
    Chen, Z., Fu, B., Schweller, R., Yang, B., Zhao, Z., Zhu, B.: Linear time probabilistic algorithms for the singular haplotype reconstruction problem from SNP fragments. J. Comput. Biol. 15(5), 535–546 (2008)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Aguiar, D., Istrail, S.: Haplotype assembly in polyploidy genomes and identical by descent shared tracts. Bioinformatics 29(13), i352–i360 (2013)CrossRefGoogle Scholar
  22. 22.
    Mazrouee, S., Wang, W.: FastHap: fast and accurate single individual haplotype reconstruction using fuzzy conflict graphs. Bioinformatics 30(17), i371–i378 (2014)CrossRefGoogle Scholar
  23. 23.
    Myers, G.: A dataset generator for whole genome shotgun sequencing. In: Lengauer, T., Schneider, R., Bork, P., et al. (eds.) ISMB 1999, pp. 202–210. AAAI Press, California (1999)Google Scholar
  24. 24.
    Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim—a sequencing simulator for genomics and metagenomics. PLoS ONE 3(10), e3373 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Guangxi Key Lab of Multi-source Information Mining and SecurityGuangxi Normal UniversityGuilinChina
  2. 2.College of Computer Science and Information TechnologyGuangxi Normal UniversityGuilinChina

Personalised recommendations