Abstract
This paper provides a brief introduction to high-dimensional data as it arises in biopharmaceutical research, especially genomics , and offers an overview of several data analysis concepts and techniques that could be used to explore and analyze such data. An example is used to illustrate the methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amaratunga, D., & Cabrera, J. (2004). Exploration and analysis of DNA microarray and protein array data. New York: John Wiley.
Amaratunga, D., & Cabrera, J. (2009). A conditional t suite of tests for identifying differentially expressed genes in a DNA microarray experiment with little replication. Statistics in Biopharmaceutical Research, 1, 26–38.
Amaratunga, D., & Cabrera, J. (2016). High-dimensional data. Invited review. Journal of the National Science Foundation of Sri Lanka, 44, 3–9.
Amaratunga, D., Cabrera, J., Cherkas, Y., Lee, Y. S. (2012). Ensemble classifiers. In D. Fourdrinier, É. Marchand, & A. L. Rukhin (Eds.), IMS collection volume 8, contemporary developments in Bayesian analysis and statistical decision theory: A Festschrift for William E. Strawderman.
Amaratunga, D., Cabrera, J., & Lee, Y. S. (2008a). Enriched random forests. Bioinformatics, 24, 2010–2014.
Amaratunga, D., Cabrera, J., & Kovtun, V. (2008b). Microarray learning with ABC. Biostatistics, 9, 128–136.
Amaratunga, D., Cabrera, J., & Shkedy, Z. (2014). Exploration and analysis of DNA microarray and other high dimensional data. New York: Wiley.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, B, 57, 289–300.
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Efron, B. (1981). Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika, 68, 589–599.
Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78, 316–331.
Efron, B., & Tibshirani, R. (1997). Improvement on cross-validation: the.632+ bootstrap method. Journal of the American Statistical Association, 92, 548–560.
Fisher, R. A. (1925) Statistical methods for research workers. Edinburgh: Oliver & Boyd.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 1–22.
Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principal component analysis. Biometrika, 58, 453–467.
Kasim, K., Shkedy, Z., Kaiser, S., Hochreiter, S., Talloen, W. (2016). Applied biclustering methods for big and high-dimensional data using R. Chapman & Hall / CRC Biostatistics Series.
Moechars, D., et al. (2005). Sialin-deficient mice: A novel animal model for infantile free sialic acid storage disease (ISSD). In Society for Neuroscience 35th Annual Meeting.
Pavlidis, P., et al. (2004). Using the gene ontology for microarray data mining: A comparison of methods and application to age effects in human prefrontal cortex. Neurochemistry Research, 29, 1213–1222.
Raghavan, N., Amaratunga, D., Cabrera, J., Nie, A., Jie, Q., & McMillian, M. (2006). On methods for gene function scoring as a means of facilitating the interpretation of microarray results. Journal of Computational Biology, 13, 798–809.
Raghavan, N., De Bondt, A., Talloen, W., Moechars, D., Göhlmann, H., & Amaratunga, D. (2007). The high-level similarity of some disparate gene expression measures. Bioinformatics, 23, 3032–3038.
Smyth, G. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3, Article 3.
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, B, 36, 111–147.
Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society, B, 64, 479–498.
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, B, 58, 267–288.
Tryputsen, V., Cabrera, J., De Bondt, A., & Amaratunga, D. (2014). Using Fisher’s method to identify enriched gene sets. Statistics in Biopharmaceutical Research, 6, 154–162.
Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.
Tukey, J. W. (1980). Lecture notes for statistics 411. Princeton University (Unpublished).
Van Acker, N., Verheijen, F., Goris, I., Daneels, G., Schot, R., Verbeek, E., et al. (2017). Progressive leukoencephalopathy impairs neurobehavioral development in sialin-deficient mice. Experimental Neurology, 291, 106–119.
Wouters, L., Goehlmann, H., Bijnens, L., Kass, S. U., Molenberghs, G., & Lewi, P. J. (2003). Graphical exploration of gene expression data: A comparative study of three multivariate methods. Biometrics, 59, 1131–1140.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Amaratunga, D., Cabrera, J. (2018). High-Dimensional Data in Genomics. In: Peace, K., Chen, DG., Menon, S. (eds) Biopharmaceutical Applied Statistics Symposium . ICSA Book Series in Statistics. Springer, Singapore. https://doi.org/10.1007/978-981-10-7820-0_4
Download citation
DOI: https://doi.org/10.1007/978-981-10-7820-0_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7819-4
Online ISBN: 978-981-10-7820-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)