Estimating and Correcting Optimism Bias in Multivariate PLS Regression: Application to the Study of the Association Between Single Nucleotide Polymorphisms and Multivariate Traits in Attention Deficit Hyperactivity Disorder

  • Erica CunninghamEmail author
  • Antonio Ciampi
  • Ridha Joober
  • Aurélie Labbe
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 173)


In studies involving genetic data, the correlations between X and Y scores obtained from PLS regression models can be used as measures of association between genome-level measurements, X, and phenotype-level measurements, Y. These correlations may be overestimated due to potential overfitting (i.e., they may be vulnerable to optimism bias). We evaluate the optimism bias through simulations and examine the effect of increasing sample size and strength of correlation. We assess the effectiveness of bootstrap-based and permutation-based bias correction methods. We also investigate the selection of the appropriate number of components for PLS regression. We include an analysis of genetic data consisting of genotypes and phenotypes related to Attention Deficit Hyperactivity Disorder (ADHD).


Partial least square regression (PLSR) Optimism bias Overfitting SNPs Bootstrap 


  1. Abdi, H.: Partial least squares regression and projection on latent structure regression (PLS regression). Wiley Interdiscip. Rev.: Comput. Stat. 2, 97–106 (2010)Google Scholar
  2. Denham, M.C.: Choosing the number of factors in partial least squares regression: estimating and minimizing the mean square error of prediction. J. Chemom. 14, 351–361 (2000)CrossRefGoogle Scholar
  3. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, Boca Raton (1993)CrossRefzbMATHGoogle Scholar
  4. Kim, C.-H., Hahn, M.K., Joung, Y., Anderson, S.L., …Kim, K.-S.: A polymorphism in the norepinephrine transporter gene alters promoter activity and is associated with attention-deficit hyperactivity disorder. Proc. Natl. Acad. Sci. U.S.A. 103, 19164–19169 (2006)Google Scholar
  5. Mevik, B.H., Wehrens, R.: The pls package: principal component and partial least squares regression in R. J. Stat. Softw. 18 (2), 1–24 (2007)CrossRefGoogle Scholar
  6. Sheehan, K., Lowe, N., Kirley, A., Mullins, C., Fitzgerald, M., Gill, M., Hawi, Z.: Tryptophan hydroxylase 2 (TPH2) gene variants associated with ADHD. Mol. Psychiatry 10, 944–949 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Erica Cunningham
    • 1
    Email author
  • Antonio Ciampi
    • 1
  • Ridha Joober
    • 2
  • Aurélie Labbe
    • 1
  1. 1.Department of Epidemiology, Biostatistics, and Occupational HealthMcGill UniversityMontrealCanada
  2. 2.Douglas Mental Health University InstituteVerdunCanada

Personalised recommendations