Skip to main content

High-Dimensional Data in Genomics

  • Chapter
  • First Online:
Biopharmaceutical Applied Statistics Symposium

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

This paper provides a brief introduction to high-dimensional data as it arises in biopharmaceutical research, especially genomics , and offers an overview of several data analysis concepts and techniques that could be used to explore and analyze such data. An example is used to illustrate the methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Amaratunga, D., & Cabrera, J. (2004). Exploration and analysis of DNA microarray and protein array data. New York: John Wiley.

    Google Scholar 

  • Amaratunga, D., & Cabrera, J. (2009). A conditional t suite of tests for identifying differentially expressed genes in a DNA microarray experiment with little replication. Statistics in Biopharmaceutical Research, 1, 26–38.

    Article  Google Scholar 

  • Amaratunga, D., & Cabrera, J. (2016). High-dimensional data. Invited review. Journal of the National Science Foundation of Sri Lanka, 44, 3–9.

    Article  Google Scholar 

  • Amaratunga, D., Cabrera, J., Cherkas, Y., Lee, Y. S. (2012). Ensemble classifiers. In D. Fourdrinier, É. Marchand, & A. L. Rukhin (Eds.), IMS collection volume 8, contemporary developments in Bayesian analysis and statistical decision theory: A Festschrift for William E. Strawderman.

    Google Scholar 

  • Amaratunga, D., Cabrera, J., & Lee, Y. S. (2008a). Enriched random forests. Bioinformatics, 24, 2010–2014.

    Article  Google Scholar 

  • Amaratunga, D., Cabrera, J., & Kovtun, V. (2008b). Microarray learning with ABC. Biostatistics, 9, 128–136.

    Article  Google Scholar 

  • Amaratunga, D., Cabrera, J., & Shkedy, Z. (2014). Exploration and analysis of DNA microarray and other high dimensional data. New York: Wiley.

    Book  Google Scholar 

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, B, 57, 289–300.

    MathSciNet  MATH  Google Scholar 

  • Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.

    Article  MathSciNet  Google Scholar 

  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.

    MATH  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

    Article  Google Scholar 

  • Efron, B. (1981). Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika, 68, 589–599.

    Article  MathSciNet  Google Scholar 

  • Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78, 316–331.

    Article  MathSciNet  Google Scholar 

  • Efron, B., & Tibshirani, R. (1997). Improvement on cross-validation: the.632+ bootstrap method. Journal of the American Statistical Association, 92, 548–560.

    MathSciNet  MATH  Google Scholar 

  • Fisher, R. A. (1925) Statistical methods for research workers. Edinburgh: Oliver & Boyd.

    Google Scholar 

  • Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 1–22.

    Google Scholar 

  • Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principal component analysis. Biometrika, 58, 453–467.

    Article  MathSciNet  Google Scholar 

  • Kasim, K., Shkedy, Z., Kaiser, S., Hochreiter, S., Talloen, W. (2016). Applied biclustering methods for big and high-dimensional data using R. Chapman & Hall / CRC Biostatistics Series.

    Google Scholar 

  • Moechars, D., et al. (2005). Sialin-deficient mice: A novel animal model for infantile free sialic acid storage disease (ISSD). In Society for Neuroscience 35th Annual Meeting.

    Google Scholar 

  • Pavlidis, P., et al. (2004). Using the gene ontology for microarray data mining: A comparison of methods and application to age effects in human prefrontal cortex. Neurochemistry Research, 29, 1213–1222.

    Article  Google Scholar 

  • Raghavan, N., Amaratunga, D., Cabrera, J., Nie, A., Jie, Q., & McMillian, M. (2006). On methods for gene function scoring as a means of facilitating the interpretation of microarray results. Journal of Computational Biology, 13, 798–809.

    Article  MathSciNet  Google Scholar 

  • Raghavan, N., De Bondt, A., Talloen, W., Moechars, D., Göhlmann, H., & Amaratunga, D. (2007). The high-level similarity of some disparate gene expression measures. Bioinformatics, 23, 3032–3038.

    Article  Google Scholar 

  • Smyth, G. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3, Article 3.

    Google Scholar 

  • Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, B, 36, 111–147.

    MathSciNet  MATH  Google Scholar 

  • Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society, B, 64, 479–498.

    Article  MathSciNet  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, B, 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Tryputsen, V., Cabrera, J., De Bondt, A., & Amaratunga, D. (2014). Using Fisher’s method to identify enriched gene sets. Statistics in Biopharmaceutical Research, 6, 154–162.

    Google Scholar 

  • Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.

    Google Scholar 

  • Tukey, J. W. (1980). Lecture notes for statistics 411. Princeton University (Unpublished).

    Google Scholar 

  • Van Acker, N., Verheijen, F., Goris, I., Daneels, G., Schot, R., Verbeek, E., et al. (2017). Progressive leukoencephalopathy impairs neurobehavioral development in sialin-deficient mice. Experimental Neurology, 291, 106–119.

    Google Scholar 

  • Wouters, L., Goehlmann, H., Bijnens, L., Kass, S. U., Molenberghs, G., & Lewi, P. J. (2003). Graphical exploration of gene expression data: A comparative study of three multivariate methods. Biometrics, 59, 1131–1140.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dhammika Amaratunga .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Amaratunga, D., Cabrera, J. (2018). High-Dimensional Data in Genomics. In: Peace, K., Chen, DG., Menon, S. (eds) Biopharmaceutical Applied Statistics Symposium . ICSA Book Series in Statistics. Springer, Singapore. https://doi.org/10.1007/978-981-10-7820-0_4

Download citation

Publish with us

Policies and ethics