Abstract
The individual haplotyping problem Minimum Letter Flip (MLF) is a computational problem that, given a set of aligned DNA sequence fragment data of an individual, induces the corresponding haplotypes by flipping minimum SNPs. There has been no practical exact algorithm to solve the problem. In DNA sequencing experiments, due to technical limits, the maximum length of a fragment sequenced directly is about 1kb. In consequence, with a genome-average SNP density of 1.84 SNPs per 1 kb of DNA sequence, the maximum number k 1 of SNP sites that a fragment covers is usually small. Moreover, in order to save time and money, the maximum number k 2 of fragments that cover a SNP site is usually no more than 19. Based on the properties of fragment data, the current paper introduces a new parameterized algorithm of running time \(O(nk_22^{k_2}+mlogm+mk_1)\), where m is the number of fragments, n is the number of SNP sites. The algorithm solves the MLF problem efficiently even if m and n are large, and is more practical in real biological applications.
This research was supported in part by the National Natural Science Foundation of China under Grant Nos. 60433020 and 60773111, the Program for New Century Excellent Talents in University No. NCET-05-0683, the Program for Changjiang Scholars and Innovative Research Team in University No. IRT0661, and the Scientific Research Fund of Hunan Provincial Education Department under Grant No.06C526.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Venter, J.C., Adams, M.D., Myers, E.W., et al.: The sequence of the human genome. Science 291(5507), 1304–1351 (2001)
The International HapMap Consortium: A haplotype map of the human genome. Nature 437(7063) 1299–1320 (2005)
Gabriel, S.B., Schaffner, S.F., Nguyen, H., et al.: The structure of haplotype blocks in the human genome. Science 296(5576), 2225–2229 (2002)
Stephens, J.C., Schneider, J.A., Tanguay, D.A., et al.: Haplotype variation and linkage disequilibrium in 313 human genes. Science 293(5529), 489–493 (2001)
Horikawa, Y., Oda, N., Cox, N.J., et al.: Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nature Genetics 26(2), 163–175 (2000)
Greenberg, H.J., Hart, W.E., Lancia, G.: Opportunities for combinatorial optimization in computational biology. INFORMS J. Comput. 16(3), 211–231 (2004)
Zhao, Y.Y., Wu, L.Y., Zhang, J.H., Wang, R.S., Zhang, X.S.: Haplotype assembly from aligned weighted snp fragments. Computational Biology and Chemistry 29(4), 281–287 (2005)
Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: Snps problems, complexity and algorithms. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 182–193. Springer, Heidelberg (2001)
Lippert, R., Schwartz, R., Lancia, G., Istrail, S.: Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Brief. Bioinform 3(1), 1–9 (2002)
Wang, R.S., Wu, L.Y., Li, Z.P., Zhang, X.S.: Haplotype reconstruction from snp fragments by minimum error correction. Bioinformatics 21(10), 2456–2462 (2005)
Bonizzoni, P., Vedova, G.D., Dondi, R., Li, J.: The haplotyping problem: an overview of computational models and solutions. J. Comp. Sci. Technol. 18(6), 675–688 (2003)
Chen, C., Wang, J., Cohen, B.: The strength of selection on ultraconserved elements in the human genome. The American Journal of Human Genetics 80(4), 692–704 (2007)
Huson, D.H., Halpern, A.L., Lai, Z., Myers, E.W., Reinert, K., Sutton, G.G.: Comparing assemblies using fragments and mate-pairs. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 294–306. Springer, Heidelberg (2001)
International Human Genome Sequencing Consortium: Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001)
Wernicke, S.: On the algorithmic tractability of single nucleotide polymorphism (SNP) analysis and related problems. Ph. d. thesis, Univ. Tübingen (2003)
Sanger, F., Nicklen, S., Coulson, A.R.: Dna sequencing with chain-terminating inhibitors. PNAS 74(12), 5463–5467 (1977)
Levy, S., Sutton, G., Ng, P.C., et al.: The diploid genome sequence of an individual human. PLoS Biology 5(10), October 2007, e254–e254 (2007)
Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.B., Frazer, K.A., Cox, D.R.: Whole-genome patterns of common dna variation in three human populations. Science 307(5712), 1072–1079 (2005)
Hüffner, F.: Algorithm engineering for optimal graph bipartization. In: Nikoletseas, S.E. (ed.) WEA 2005. LNCS, vol. 3503, pp. 240–252. Springer, Heidelberg (2005)
Panconesi, A., Sozio, M.: Fast hare: a fast heuristic for single individual snp haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 266–277. Springer, Heidelberg (2004)
Myers, G.: A dataset generator for whole genome shotgun sequencing. In: Lengauer, T., Schneider, R., Bork, P., Brutlag, D.L., Glasgow, J.I., Mewes, H.W., Zimmer, R. (eds.) Proc. ISMB, California, pp. 202–210. AAAI Press, Menlo Park (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xie, M., Wang, J., Chen, J. (2008). A Practical Parameterized Algorithm for the Individual Haplotyping Problem MLF. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds) Theory and Applications of Models of Computation. TAMC 2008. Lecture Notes in Computer Science, vol 4978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79228-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-79228-4_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79227-7
Online ISBN: 978-3-540-79228-4
eBook Packages: Computer ScienceComputer Science (R0)