Abstract
Haplotype inference from genotype data is a key computational problem in bioinformatics, since retrieving directly haplotype information from DNA samples is not feasible using existing technology. One of the methods for solving this problem uses the pure parsimony criterion, an approach known as Haplotype Inference by Pure Parsimony (HIPP). Initial work in this area was based on a number of different Integer Linear Programming (ILP) models and branch and bound algorithms. Recent work has shown that the utilization of a Boolean Satisfiability (SAT) formulation and state of the art SAT solvers represents the most efficient approach for solving the HIPP problem.
Motivated by the promising results obtained using SAT techniques, this paper investigates the utilization of modern Pseudo-Boolean Optimization (PBO) algorithms for solving the HIPP problem. The paper starts by applying PBO to existing ILP models. The results are promising, and motivate the development of a new PBO model (RPoly) for the HIPP problem, which has a compact representation and eliminates key symmetries. Experimental results indicate that RPoly outperforms the SAT-based approach on most problem instances, being, in general, significantly more efficient.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brown, D., Harrower, I.: A new integer programming formulation for the pure parsimony problem in haplotype analysis. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 254–265. Springer, Heidelberg (2004)
Brown, D., Harrower, I.: Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(2), 141–154 (2006)
Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-resolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)
Drysdale, C.M., McGraw, D.W., Stack, C.B., Stephens, J.C., Judson, R.S., Nandabalan, K., Arnold, K., Ruano, G., Liggett, S.B.: Complex promoter and coding region β 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. In: Proceedings of the National Academy of Sciences of the United States of America 97, pp. 10483–10488 (2000)
Eén, N., Sörensson, N.: Translating pseudo-Boolean constraints into SAT. Journal on Satisfiability, Boolean Modeling and Computation 2, 1–26 (2006)
Gusfield, D.: Haplotype inference by pure parsimony. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)
Gusfield, D., Orzach, S. (eds.): Handbook on Computational Molecular Biology. Chapman and Hall/CRC Computer and Information Science Series, chapter Haplotype Inference, vol. 9. CRC Press, Boca Raton (2005)
Halldórsson, B., Bafna, V., Edwards, N., Lippert, R., Yooseph, S., Istrail, S.: A survey of computational methods for determining haplotypes. In: Istrail, S., Waterman, M.S., Clark, A. (eds.) Computational Methods for SNPs and Haplotype Inference. LNCS (LNBI), vol. 2983, pp. 26–47. Springer, Heidelberg (2004)
Kroetz, D.L., Pauli-Magnus, C., Hodges, L.M., Huang, C.C., Kawamoto, M., Johns, S.J., Stryke, D., Ferrin, T.E., DeYoung, J., Taylor, T., Carlson, E.J., Herskowitz, I., Giacomini, K.M., Clark, A.G.: Sequence diversity and haplotype structure in the human ABCD1 (MDR1, multidrug resistance transporter). Pharmacogenetics 13, 481–494 (2003)
Lancia, G., Pinotti, C.M., Rizzi, R.: Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS Journal on Computing 16(4), 348–359 (2004)
Lynce, I., Marques-Silva, J.: Efficient haplotype inference with Boolean satisfiability. In: National Conference on Artificial Intelligence (AAAI) (July 2006)
Lynce, I., Marques-Silva, J.: SAT in bioinformatics: Making the case with haplotype inference. In: International Conference on Theory and Applications of Satisfiability Testing (SAT), pp. 136–141 (August 2006)
Manquinho, V., Roussel, O.: The first evaluation of Pseudo-boolean solvers (PB’05). Journal on Satisfiability, Boolean Modeling and Computation 2, 103–143 (2006)
Rieder, M.J., Taylor, S.T., Clark, A.G., Nickerson, D.A.: Sequence variation in the human angiotensin converting enzyme. Nature Genetics 22, 59–62 (1999)
Schaffner, S., Foo, C., Gabriel, S., Reich, D., Daly, M., Altshuler, D.: Calibrating a coalescent simulation of human genome sequence variation. Genome Reasearch 15, 1576–1583 (2005)
Stephens, M., Smith, N., Donelly, P.: A new statistical method for haplotype reconstruction. American Journal of Human Genetics 68, 978–989 (2001)
The International HapMap Consortium. A haplotype map of the human genome. Nature, 437, 1299–1320 (2005)
Wang, L., Xu, Y.: Haplotype inference by maximum parsimony. Bioinformatics 19(14), 1773–1780 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Graça, A., Marques-Silva, J., Lynce, I., Oliveira, A.L. (2007). Efficient Haplotype Inference with Pseudo-boolean Optimization. In: Anai, H., Horimoto, K., Kutsia, T. (eds) Algebraic Biology. AB 2007. Lecture Notes in Computer Science, vol 4545. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73433-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-73433-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73432-1
Online ISBN: 978-3-540-73433-8
eBook Packages: Computer ScienceComputer Science (R0)