Abstract
We propose a novel graph theoretic method to estimate haplotype population size from genotype data. The method considers only the potential sharing of haplotypes between individuals and is based on transforming the graph of potential haplotype sharing into a line graph using a minimum number of edge and vertex deletions. We show that the problems are NP complete and provide exact integer programming solutions for them. We test our approach using extensive simulations of multiple population evolution and genotypes sampling scenarios. Our computational experiments show that when most of the sharings are true sharings the problem can be solved very fast and the estimated size is very close to the true size; when many of the potential sharings do not stem from true haplotype sharing, our method gives reasonable lower bounds on the underlying number of haplotypes. In comparison, a naive approach of phasing the input genotypes provides trivial upper bounds of twice the number of genotypes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Browning, B.L., Browning, S.R.: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. American Journal of Human Genetics 84(2), 210–223 (2009)
Cai, L.: Fixed-parameter tractability of graph modification problems for hereditary properties. Information Processing Letters 58, 171–176 (1996)
Campelo, M., Campos, V., Correa, R.: On the asymmetric representatives formulation for the vertex coloring problem. Discrete Applied Mathematics 156(7), 1097–1111 (2008)
Catanzaro, D., Godi, A., Labbé, M.: A class representative model for pure parsimony haplotyping. Informs Journal of Computing 22(2), 195–209 (2009)
Clark, A.: Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology and Evolution 7, 111–122 (1990)
Even, S., Bar-Yehuda, R.: A linear-time approximation algorithm for the weighted vertex cover problem. Journal of Algorithms 2(2), 198–203 (1981)
Halldórsson, B.V., Aguiar, D., Tarpine, R., Istrail, S.: The Clark Phaseable Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS. Journal of Computational Biology 18(3), 323–333 (2011)
Halldórsson, B.V., Bafna, V., Edwards, N., Lippert, R., Yooseph, S., Istrail, S.: A Survey of Computational Methods for Determining Haplotypes. In: Istrail, S., Waterman, M.S., Clark, A. (eds.) DIMACS/RECOMB Satellite Workshop 2002. LNCS (LNBI), vol. 2983, pp. 26–47. Springer, Heidelberg (2004)
Hudson, R.R.: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2), 337–338 (2002)
Lehot, P.G.H.: An optimal algorithm to detect a line graph and output its root graph. J. ACM 21, 569–575 (1974)
Niedermeier, R., Rossmanith, P.: An efficient fixed-parameter algorithm for 3-hitting set. Journal of Discrete Algorithms 1(1), 89–102 (2003)
Prabhu, S., Pe’er, I.: Overlapping pools for high-throughput targeted resequencing. Genome Research 19, 1254–1261 (2009)
Roussopoulos, N.: A max(m, n) algorithm for determining the graph H from its line graph G. Information Processing Letters 2, 108–112 (1974)
Trevisan, L.: Non-approximability results for optimization problems on bounded degree instances. In: Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing, pp. 453–461. ACM (2001)
Van Rooij, A., Wilf, H.: The interchange graphs of a finite graph. Acta Math. Acad. Sci. Hungar. 16, 263–269 (1965)
Whitney, H.: Congruent graphs and the connectivity of graphs. American Journal of Mathematics 54, 150–162 (1932)
Yannakakis, M.: Node-and edge-deletion NP-complete problems. In: Proceedings of the Tenth Annual ACM Symposium on Theory of Computing, STOC 1978, pp. 253–264. ACM, New York (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Halldórsson, B.V., Blokh, D., Sharan, R. (2012). Estimating Population Size via Line Graph Reconstruction. In: Raphael, B., Tang, J. (eds) Algorithms in Bioinformatics. WABI 2012. Lecture Notes in Computer Science(), vol 7534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33122-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-33122-0_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33121-3
Online ISBN: 978-3-642-33122-0
eBook Packages: Computer ScienceComputer Science (R0)