Multivariate Imputation of Genotype Data Using Short and Long Range Disequilibrium

Abad-Grau, María M.; Sebastiani, Paola

doi:10.1007/978-3-540-75867-9_24

María M. Abad-Grau¹ &
Paola Sebastiani²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4739))

Included in the following conference series:

International Conference on Computer Aided Systems Theory

1353 Accesses

Abstract

Missing values in genetic data are a common issue. In this paper we explore several machine learning techniques for creating models that can be used to impute the missing genotypes using multiple genetic markers. We map the machine learning techniques to different patterns of transmission and, in particular, we contrast the effect of short and long range disequilibrium between markers. The assumption of short range disequilibrium implies that only physically close genetic variants are informative for reconstructing missing genotypes, while this assumption is relaxed in long range disequilibrium and physically distant genetic variants become informative for imputation. We evaluate the accuracy of a flexible feature selection model that fits both patterns of transmission using six real datasets of single nucleotide polymorphisms (SNP). The results show an increased accuracy compared to standard imputation models. [Supplementary material] http://bios.ugr.es/missingGenotypes

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Consortium, T.G.I.S.: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)
Article Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithm. Machine Learning 6, 37–66 (1991)
Google Scholar
Quinlan, J.R.: Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)
Article MATH Google Scholar
Sebastiani, P., Abad-Grau, M.M., Ramoni, M.F.: Learning Bayesian Networks. In: Maimon, O., Rokach, L. (eds.) Data mining and knowledge discovery handbook, pp. 193–230. Springer, Heidelberg (2005)
Chapter Google Scholar
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 121–129. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Kohavi, R., John, G.H.: The wrapper approach. In: Artificial Intelligence Journal, Springer, Heidelberg (1998)
Google Scholar
Patil, N., Berno, A., Hinds, D., Barrett, W., Doshi, J., Hacker, C., Kautzer, C., Lee, D., Marjoribanks, C., McDonough, D.: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294 (2001)
Google Scholar
Gabriel, S., Schaffner, S., Nguyen, H., Moore, J., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S.N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E., Daly, M., Altshuler, D.: The structure of haplotype blocks in the human genome. Science 296 (2002)
Google Scholar
Castellana, N., Dhamdhere, K., Sridhar, S., Schwartz, R.: Relaxing haplotype block models for association testing. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 11, pp. 454–466 (2006)
Google Scholar
Baldwin, C.T., Nolan, V.G., Wyszynski, D.F., Ma, Q.L., Sebastiani, P., Embury, S.H., Bisbee, A., Farrell, J., Farrer, L.S., Steinberg, M.H.: Association of klotho, bone morphogenic protein 6 and annexin a2 polymorphisms with sickle cell osteonecrosis. Blood 106(1), 372–375 (2005)
Article Google Scholar
John, D., Rioux, M.J.D., Silverberg, M.S., Lindblad, K., Steinhart, H., Cohen, Z., Delmonte, T., Kocher, K., Miller, K., Guschwan, S., Kulbokas, E.J., O’Leary, S., Winchester, E., Dewar, K., Green, T., Stone, V., Chow, C., Cohen, A., Langelier, D., Lapointe, G., Gaudet, D., Faith, J., Branco, N., Bull, S.B., McLeod, R.S., Griffiths, A.M., Bitton, A., Greenberg, G.R., Lander, E.S., Siminovitch, K.A., Hudson, T.J.: Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to crohn disease. Nature Genetics 29, 223–228 (2001)
Article Google Scholar
Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-resolution haplotype structure in the human genome. Nat. Genet. 29, 229–232 (2001)
Article Google Scholar
HapMap-Consortium, T.I.: The international hapmap project. Nature 426, 789–796 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Software Engineering Department, University of Granada, Granada 18071, Spain
María M. Abad-Grau
Department of Biostatistics, Boston University, Boston MA 02118, USA
Paola Sebastiani

Authors

María M. Abad-Grau
View author publications
You can also search for this author in PubMed Google Scholar
Paola Sebastiani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Roberto Moreno Díaz Franz Pichler Alexis Quesada Arencibia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abad-Grau, M.M., Sebastiani, P. (2007). Multivariate Imputation of Genotype Data Using Short and Long Range Disequilibrium. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2007. EUROCAST 2007. Lecture Notes in Computer Science, vol 4739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75867-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-540-75867-9_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75866-2
Online ISBN: 978-3-540-75867-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics