Skip to main content

Knowledge-Constrained K-Medoids Clustering of Regulatory Rare Alleles for Burden Tests

  • Conference paper
Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2013)

Abstract

Rarely occurring genetic variants are hypothesized to influence human diseases, but statistically associating these rare variants to disease is challenging due to a lack of statistical power in most feasibly sized datasets. Several statistical tests have been developed to either collapse multiple rare variants from a genomic region into a single variable (presence/absence) or to tally the number of rare alleles within a region, relating the burden of rare alleles to disease risk. Both these approaches, however, rely on user-specification of a genomic region to generate these collapsed or burden variables, usually an entire gene. Recent studies indicate that most risk variants for common diseases are found within regulatory regions, not genes. To capture the effect of rare alleles within non-genic regulatory regions for burden tests, we contrast a simple sliding window approach with a knowledge-guided k-medoids clustering method to group rare variants into statistically powerful, biologically meaningful windows. We apply these methods to detect genomic regions that alter expression of nearby genes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., Manolio, T.A.: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences of the United States of America 106, 9362–9367 (2009)

    Article  Google Scholar 

  2. Veyrieras, J.-B., Kudaravalli, S., Kim, S.Y., Dermitzakis, E.T., Gilad, Y., Stephens, M., Pritchard, J.K.: High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genetics 4, e1000214 (2008)

    Article  Google Scholar 

  3. Durbin, R.M., Altshuler, D.L., Abecasis, G.R., Bentley, D.R., Chakravarti, A., et al.: A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)

    Article  Google Scholar 

  4. Bansal, V., Libiger, O., Torkamani, A., Schork, N.J.: Statistical analysis strategies for association studies involving rare variants. Nature Reviews Genetics 11, 773–785 (2010)

    Article  Google Scholar 

  5. Schaub, M.A., Boyle, A.P., Kundaje, A., Batzoglou, S., Snyder, M.: Linking disease associations with regulatory information in the human genome. Genome Research 22, 1748–1759 (2012)

    Article  Google Scholar 

  6. Lawrence, R., Day-Williams, A.G., Elliott, K.S., Morris, A.P., Zeggini, E.: CCRaVAT and QuTie-enabling analysis of rare variants in large-scale case control and quantitative trait association studies. BMC Bioinformatics 11, 527 (2010)

    Article  Google Scholar 

  7. Mendenhall, E.M., Bernstein, B.E.: DNA-protein interactions in high definition. Genome Biology 13, 139 (2012)

    Article  Google Scholar 

  8. Li, Y., Willer, C.J., Ding, J., Scheet, P., Abecasis, G.R.: MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic Epidemiology 34, 816–834 (2010)

    Article  Google Scholar 

  9. Ernst, J., Kheradpour, P., Mikkelsen, T.S., Shoresh, N., Ward, L.D., Epstein, C.B., Zhang, X., Wang, L., Issner, R., Coyne, M., Ku, M., Durham, T., Kellis, M., Bernstein, B.E.: Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011)

    Article  Google Scholar 

  10. Rando, O.J.: Combinatorial complexity in chromatin structure and function: revisiting the histone code. Current Opinion in Genetics & Development 22, 148–155 (2012)

    Article  Google Scholar 

  11. Kaufman, L., Rousseeuw, P.: Clustering by means of medoids (1987)

    Google Scholar 

  12. Li, B., Leal, S.M.: Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. American Journal of Human Genetics 83, 311–321 (2008)

    Article  Google Scholar 

  13. Storey, J.D., Tibshirani, R.: Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America 100, 9440–9445 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Wickham, H.: ggplot2: elegant graphics for data analysis. Springer, New York (2009)

    MATH  Google Scholar 

  15. Liu, D.J., Leal, S.M.: A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genetics 6, e1001156 (2010)

    Article  Google Scholar 

  16. Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M., Lin, X.: Rare-variant association testing for sequencing data with the sequence kernel association test. American Journal of Human Genetics 89, 82–93 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sivley, R.M., Fish, A.E., Bush, W.S. (2013). Knowledge-Constrained K-Medoids Clustering of Regulatory Rare Alleles for Burden Tests. In: Vanneschi, L., Bush, W.S., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2013. Lecture Notes in Computer Science, vol 7833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37189-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37189-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37188-2

  • Online ISBN: 978-3-642-37189-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics