High-Dimensional Data in Genomics

Amaratunga, Dhammika; Cabrera, Javier

doi:10.1007/978-981-10-7820-0_4

Dhammika Amaratunga⁶ &
Javier Cabrera⁷

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

This paper provides a brief introduction to high-dimensional data as it arises in biopharmaceutical research, especially genomics , and offers an overview of several data analysis concepts and techniques that could be used to explore and analyze such data. An example is used to illustrate the methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amaratunga, D., & Cabrera, J. (2004). Exploration and analysis of DNA microarray and protein array data. New York: John Wiley.
Google Scholar
Amaratunga, D., & Cabrera, J. (2009). A conditional t suite of tests for identifying differentially expressed genes in a DNA microarray experiment with little replication. Statistics in Biopharmaceutical Research, 1, 26–38.
Article Google Scholar
Amaratunga, D., & Cabrera, J. (2016). High-dimensional data. Invited review. Journal of the National Science Foundation of Sri Lanka, 44, 3–9.
Article Google Scholar
Amaratunga, D., Cabrera, J., Cherkas, Y., Lee, Y. S. (2012). Ensemble classifiers. In D. Fourdrinier, É. Marchand, & A. L. Rukhin (Eds.), IMS collection volume 8, contemporary developments in Bayesian analysis and statistical decision theory: A Festschrift for William E. Strawderman.
Google Scholar
Amaratunga, D., Cabrera, J., & Lee, Y. S. (2008a). Enriched random forests. Bioinformatics, 24, 2010–2014.
Article Google Scholar
Amaratunga, D., Cabrera, J., & Kovtun, V. (2008b). Microarray learning with ABC. Biostatistics, 9, 128–136.
Article Google Scholar
Amaratunga, D., Cabrera, J., & Shkedy, Z. (2014). Exploration and analysis of DNA microarray and other high dimensional data. New York: Wiley.
Book Google Scholar
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, B, 57, 289–300.
MathSciNet MATH Google Scholar
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.
Article MathSciNet Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
MATH Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Article Google Scholar
Efron, B. (1981). Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika, 68, 589–599.
Article MathSciNet Google Scholar
Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78, 316–331.
Article MathSciNet Google Scholar
Efron, B., & Tibshirani, R. (1997). Improvement on cross-validation: the.632+ bootstrap method. Journal of the American Statistical Association, 92, 548–560.
MathSciNet MATH Google Scholar
Fisher, R. A. (1925) Statistical methods for research workers. Edinburgh: Oliver & Boyd.
Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 1–22.
Google Scholar
Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principal component analysis. Biometrika, 58, 453–467.
Article MathSciNet Google Scholar
Kasim, K., Shkedy, Z., Kaiser, S., Hochreiter, S., Talloen, W. (2016). Applied biclustering methods for big and high-dimensional data using R. Chapman & Hall / CRC Biostatistics Series.
Google Scholar
Moechars, D., et al. (2005). Sialin-deficient mice: A novel animal model for infantile free sialic acid storage disease (ISSD). In Society for Neuroscience 35th Annual Meeting.
Google Scholar
Pavlidis, P., et al. (2004). Using the gene ontology for microarray data mining: A comparison of methods and application to age effects in human prefrontal cortex. Neurochemistry Research, 29, 1213–1222.
Article Google Scholar
Raghavan, N., Amaratunga, D., Cabrera, J., Nie, A., Jie, Q., & McMillian, M. (2006). On methods for gene function scoring as a means of facilitating the interpretation of microarray results. Journal of Computational Biology, 13, 798–809.
Article MathSciNet Google Scholar
Raghavan, N., De Bondt, A., Talloen, W., Moechars, D., Göhlmann, H., & Amaratunga, D. (2007). The high-level similarity of some disparate gene expression measures. Bioinformatics, 23, 3032–3038.
Article Google Scholar
Smyth, G. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3, Article 3.
Google Scholar
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, B, 36, 111–147.
MathSciNet MATH Google Scholar
Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society, B, 64, 479–498.
Article MathSciNet Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, B, 58, 267–288.
MathSciNet MATH Google Scholar
Tryputsen, V., Cabrera, J., De Bondt, A., & Amaratunga, D. (2014). Using Fisher’s method to identify enriched gene sets. Statistics in Biopharmaceutical Research, 6, 154–162.
Google Scholar
Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.
Google Scholar
Tukey, J. W. (1980). Lecture notes for statistics 411. Princeton University (Unpublished).
Google Scholar
Van Acker, N., Verheijen, F., Goris, I., Daneels, G., Schot, R., Verbeek, E., et al. (2017). Progressive leukoencephalopathy impairs neurobehavioral development in sialin-deficient mice. Experimental Neurology, 291, 106–119.
Google Scholar
Wouters, L., Goehlmann, H., Bijnens, L., Kass, S. U., Molenberghs, G., & Lewi, P. J. (2003). Graphical exploration of gene expression data: A comparative study of three multivariate methods. Biometrics, 59, 1131–1140.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Princeton Data Analytics LLC, Bridgewater, NJ, USA
Dhammika Amaratunga
Department of Statistics, Rutgers University, Piscataway, NJ, USA
Javier Cabrera

Authors

Dhammika Amaratunga
View author publications
You can also search for this author in PubMed Google Scholar
Javier Cabrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dhammika Amaratunga .

Editor information

Editors and Affiliations

Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA, USA
Karl E. Peace
School of Social Work and Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
Ding-Geng Chen
Boston University, Cambridge, MA, USA
Sandeep Menon

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Amaratunga, D., Cabrera, J. (2018). High-Dimensional Data in Genomics. In: Peace, K., Chen, DG., Menon, S. (eds) Biopharmaceutical Applied Statistics Symposium . ICSA Book Series in Statistics. Springer, Singapore. https://doi.org/10.1007/978-981-10-7820-0_4

Download citation

DOI: https://doi.org/10.1007/978-981-10-7820-0_4
Published: 01 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7819-4
Online ISBN: 978-981-10-7820-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics