Abstract
In this paper we present insights on the problem of haplotype inference for large genotype datasets. Our observations are drawn from an extensive comparison of three methods for haplotype inference using several datasets taken from HapMap. The methods chosen, PTG, Haplorec, and fastPHASE, are among the best known; they are based on different approaches, and are able to deal with large amounts of data. Our analysis controls the execution time and also the accuracy of results, based on the Error Rate and the Switch Error, as well as sequence conservation patterns. The results show that (1) fastPHASE and Haplorec are both more accurate than PTG, (2) fastPHASE is computationally the most expensive of the three methods, while Haplorec may fail to resolve long sequences, and (3) all approaches do better with more conserved sequences, and tend to fail in distinct sequence sites.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adkins, R.M.: Comparison of the accuracy of methods of computational haplotype inference using a large empirical dataset. BMC Genetics, 5–22 (2004)
Xu, H., Wu, X., Spitz, M.R., Shete, S.: Comparison of haplotype inference methods using a genotypic data from unrelated individuals. International Journal of Human and Medical Genetics 58, 63–68 (2004)
Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics 78, 629–644 (2006)
Eronen, L., Geerts, F., Toivonen, H.: Haplorec: Efficient and accurate largescale reconstruction of haplotypes. BMC Bioinformatics 7, 542 (2006)
Li, Z., Zhou, W., Zhang, X.S., Chen, L.: A parsimonious tree-grow method for haplotype inference. Oxford Bioinformatics 17, 3475–3481 (2005)
Clark, A.: Inference of haplotypes from PCRamplified samples of diploid populations. Journal of Molecular Biology and Evolution 7, 111–122 (1990)
Gusfield, D.: Inference of Haplotypes from samples of diploids populations: Complexity and algorithms. Journal of Computational Biology 8, 305–323 (2001)
Gusfield, D.: Haplotype Inference by Pure Parsimony. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)
Lancia, G., Pinotti, C.M., Rizzi, R.: Haplotype Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms. INFORMS J. Computing 16, 348–359 (2004)
Halldrsson, B.V., Bafna, V., Edwards, N., Lippert, R., Yooseph, S., Istrail, S.: A Survey of Computational Methods for Determining Haplotypes. In: Istrail, S., Waterman, M.S., Clark, A. (eds.) DIMACS/RECOMB Satellite Workshop 2002. LNCS (LNBI), vol. 2983, pp. 26–47. Springer, Heidelberg (2004)
Brown, D.G., Harrower, I.M.: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 141–154 (2006)
Gusfield, D.: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. In: RECOMB, pp. 166–175 (2002)
Gusfield, D.Z., Filkov, V.: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem. Journal of Computational Biology 13, 522–553 (2006)
Stephens, M., Smith, N.J., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978–989 (2001)
Niu, T., Qin, Z.S., Xu, X., Liu, J.S.: Bayesian haplotype inference for multiple linked singlenucleotide polymorphism. American Journal of Human Genetics 70, 157–169 (2002)
Sun, S., Greenwood, C.M.T., Neal, R.M.: Haplotype inference using a Bayesian hidden Markov model. Genetic Epidemiology 31, 937–948 (2007)
Wu, L.Y., Zang, J.H., Chan, R.: Improved approach for haplotype inference based on Markov chain. Lecture Notes in Operations Research 9, 204–215 (2008)
Wang, R.S., Zhang, X.S., Sheng, L.: Haplotype inference by pure parsimony via genetic algorithm. Lecture Notes in Operations Research 5, 308–318 (2005)
Che, D., Tang, H., Song, Y.: Haplotype inference using a genetic algorithm. In: CICB, pp. 31–37 (2009)
Eronen, L., Geerts, F., Toivonen, H.: A markov chain approach to reconstruction of long haplotypes. In: Pac. Symp. Biocomput, pp.104–115 (2004)
Zhang, J.H., Wu, L.Y., Chen, J., Zhang, X.S.: A fast haplotype inference method for large population genotype data. Computational Statistics & Data Analysis 52, 4891–4902 (2008)
Stephens, M., Donnelly, P.: A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics 73, 1162–1169 (2003)
The International HapMap Consortium: The International HapMap Consortium. Nature 426, 789–796 (2003)
Lin, S., Cutler, D.J., Zwick, M.E., Chakravarti, A.: Haplotype Inference in Random Population Samples. American Journal of Human Genetics 71, 1129–1137 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rosa, R.S., Guimarães, K.S. (2010). Insights on Haplotype Inference on Large Genotype Datasets. In: Ferreira, C.E., Miyano, S., Stadler, P.F. (eds) Advances in Bioinformatics and Computational Biology. BSB 2010. Lecture Notes in Computer Science(), vol 6268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15060-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-15060-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15059-3
Online ISBN: 978-3-642-15060-9
eBook Packages: Computer ScienceComputer Science (R0)