Dimension reduction of gene expression data
DNA methylation of specific dinucleotides has been shown to be strongly linked with tissue age. The goal of this research is to explore different analysis techniques for microarray data in order to create a more effective predictor of age from DNA methylation level. Specifically, this study compares elastic net regression models to principal component regression, supervised principal component regression, Y-aware principal component regression, and partial least squares regression models and their ability to predict tissue age based on DNA methylation levels. It has been found that the elastic net model performs better than latent variable models when considering less than ten principal components for each method, but Y-aware principal component regression predicts more accurately (with a reasonably low testing RMSE) and captures more of the desired structure when the number of principal components increases to 20. Coding limitations inhibited forming conclusive results about the performance of supervised principal component regression as the number of components increases.
KeywordsPrincipal component analysis DNA methylation elastic net regression Y-aware PCR supervised PCR PLS regression
AMS Subject Classification62H25 62J99 62N86
Unable to display preview. Download preview PDF.
- Abdi, H. 2003. Partial least squares (PLS) regression. In Encyclopedia of social sciences research methods, ed. M. Lewis-Beck, A. Bryman, and T. Futing, 792–95. Thousand Oaks (CA): Sage.Google Scholar
- Hastie, T., R. Tibshirani, G. Sherlock, E. Michael, P. Brown, and D. Botstein, 1999. Imputing Missing Data for Gene Expression Arrays (Technical Report). Division of Biostatistics, Stanford University, Stanford, CA.Google Scholar
- Jolliffe, I. T. 1982. A note on the use of principal components in regression. Journal of the Royal Statistical Society, Series C 31 (3):300–03.Google Scholar
- Kurucz, M., A. A. Benczr, and K. Csalogny. 2007. Methods for large scale SVD with missing values. Proceedings of KDD Cup and Workshop 12:31–38.Google Scholar
- Li, H., H. Bangzheng, M. Lublin, and Y. Perez. 2016. Distributed algorithms and optimization. Stanford, CA: Stanford University.Google Scholar
- Rosipal, R. 2011. Nonlinear partial least squares: An overview. In Chemoinformatics and advanced machine learning perspectives: Complex computational methods and collaborative techniques, ed. H. Lodhi, and Y. Yamanishi, 169–89. ACCM, IGI Global. http://aiolos.um.savba.sk/~roman/Papers/npls_book11.pdf (accessed May 2016).Google Scholar
- Shlens, J. 2014. A tutorial on principal component analysis. Cornell University Library. https://arxiv.org/pdf/1404.1100.pdf (accessed April 2016).Google Scholar
- Wall, M., M. Rechtsteiner, and L. M. Rocha. 2003. Singular value decomposition and principal component analysis. In A practical approach to microarray data analysis, ed. D. P. Berrar, W. Dubitzky, and M. Granzow, 91–109. Los Alamos National Laboratory LA-UR-02-4001.Google Scholar
- Zumel, N. (2016). Principal components regression, Pt. 2: Y-aware methods [Web log comment]. https://doi.org/www.win-vector.com/blog/2016/05/pcr_part2_yaware (accessed July 2016).