Estimating and Correcting Optimism Bias in Multivariate PLS Regression: Application to the Study of the Association Between Single Nucleotide Polymorphisms and Multivariate Traits in Attention Deficit Hyperactivity Disorder
In studies involving genetic data, the correlations between X and Y scores obtained from PLS regression models can be used as measures of association between genome-level measurements, X, and phenotype-level measurements, Y. These correlations may be overestimated due to potential overfitting (i.e., they may be vulnerable to optimism bias). We evaluate the optimism bias through simulations and examine the effect of increasing sample size and strength of correlation. We assess the effectiveness of bootstrap-based and permutation-based bias correction methods. We also investigate the selection of the appropriate number of components for PLS regression. We include an analysis of genetic data consisting of genotypes and phenotypes related to Attention Deficit Hyperactivity Disorder (ADHD).
KeywordsPartial least square regression (PLSR) Optimism bias Overfitting SNPs Bootstrap
- Abdi, H.: Partial least squares regression and projection on latent structure regression (PLS regression). Wiley Interdiscip. Rev.: Comput. Stat. 2, 97–106 (2010)Google Scholar
- Kim, C.-H., Hahn, M.K., Joung, Y., Anderson, S.L., …Kim, K.-S.: A polymorphism in the norepinephrine transporter gene alters promoter activity and is associated with attention-deficit hyperactivity disorder. Proc. Natl. Acad. Sci. U.S.A. 103, 19164–19169 (2006)Google Scholar