Skip to main content

Multivariate Imputation of Genotype Data Using Short and Long Range Disequilibrium

  • Conference paper
Computer Aided Systems Theory – EUROCAST 2007 (EUROCAST 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4739))

Included in the following conference series:

  • 1353 Accesses

Abstract

Missing values in genetic data are a common issue. In this paper we explore several machine learning techniques for creating models that can be used to impute the missing genotypes using multiple genetic markers. We map the machine learning techniques to different patterns of transmission and, in particular, we contrast the effect of short and long range disequilibrium between markers. The assumption of short range disequilibrium implies that only physically close genetic variants are informative for reconstructing missing genotypes, while this assumption is relaxed in long range disequilibrium and physically distant genetic variants become informative for imputation. We evaluate the accuracy of a flexible feature selection model that fits both patterns of transmission using six real datasets of single nucleotide polymorphisms (SNP). The results show an increased accuracy compared to standard imputation models. [Supplementary material] http://bios.ugr.es/missingGenotypes

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Consortium, T.G.I.S.: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)

    Article  Google Scholar 

  2. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithm. Machine Learning 6, 37–66 (1991)

    Google Scholar 

  3. Quinlan, J.R.: Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)

    Article  MATH  Google Scholar 

  4. Sebastiani, P., Abad-Grau, M.M., Ramoni, M.F.: Learning Bayesian Networks. In: Maimon, O., Rokach, L. (eds.) Data mining and knowledge discovery handbook, pp. 193–230. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 121–129. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  6. Kohavi, R., John, G.H.: The wrapper approach. In: Artificial Intelligence Journal, Springer, Heidelberg (1998)

    Google Scholar 

  7. Patil, N., Berno, A., Hinds, D., Barrett, W., Doshi, J., Hacker, C., Kautzer, C., Lee, D., Marjoribanks, C., McDonough, D.: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294 (2001)

    Google Scholar 

  8. Gabriel, S., Schaffner, S., Nguyen, H., Moore, J., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S.N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E., Daly, M., Altshuler, D.: The structure of haplotype blocks in the human genome. Science 296 (2002)

    Google Scholar 

  9. Castellana, N., Dhamdhere, K., Sridhar, S., Schwartz, R.: Relaxing haplotype block models for association testing. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 11, pp. 454–466 (2006)

    Google Scholar 

  10. Baldwin, C.T., Nolan, V.G., Wyszynski, D.F., Ma, Q.L., Sebastiani, P., Embury, S.H., Bisbee, A., Farrell, J., Farrer, L.S., Steinberg, M.H.: Association of klotho, bone morphogenic protein 6 and annexin a2 polymorphisms with sickle cell osteonecrosis. Blood 106(1), 372–375 (2005)

    Article  Google Scholar 

  11. John, D., Rioux, M.J.D., Silverberg, M.S., Lindblad, K., Steinhart, H., Cohen, Z., Delmonte, T., Kocher, K., Miller, K., Guschwan, S., Kulbokas, E.J., O’Leary, S., Winchester, E., Dewar, K., Green, T., Stone, V., Chow, C., Cohen, A., Langelier, D., Lapointe, G., Gaudet, D., Faith, J., Branco, N., Bull, S.B., McLeod, R.S., Griffiths, A.M., Bitton, A., Greenberg, G.R., Lander, E.S., Siminovitch, K.A., Hudson, T.J.: Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to crohn disease. Nature Genetics 29, 223–228 (2001)

    Article  Google Scholar 

  12. Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-resolution haplotype structure in the human genome. Nat. Genet. 29, 229–232 (2001)

    Article  Google Scholar 

  13. HapMap-Consortium, T.I.: The international hapmap project. Nature 426, 789–796 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Roberto Moreno Díaz Franz Pichler Alexis Quesada Arencibia

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abad-Grau, M.M., Sebastiani, P. (2007). Multivariate Imputation of Genotype Data Using Short and Long Range Disequilibrium. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2007. EUROCAST 2007. Lecture Notes in Computer Science, vol 4739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75867-9_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75867-9_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75866-2

  • Online ISBN: 978-3-540-75867-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics