The wavelet transforms and statistical models for near infrared spectra analysis
Often extensive spectral data is collected on multiple samples with the goal of predicting one or more properties of the sample. For example, measurements can be made at hundreds of wavelengths along with the more expensive assay values. The predictor variables are often highly correlated and it is expected that only small sections of the wave are pertinent to the measured analytes. There is a need to simplify or compress the predictors to both save data storage and possibly de-noise the data prior to making predictive models. Our idea is to use a factorial design (a two-step frame work) to explore two wavelet transformations, Haar wavelets and Daubechies wavelets, with progressively better approximation to the raw data curves in combination with several statistical prediction methods, including stepwise regression, principal component regression, ridge regression and partial least squares regression. The plan is to study prediction quality using Haar-Step, Haar-PCR, Haar-PLS, Haar-Ridge, Daubechies-Step, Daubechies-PCR, Daubechies-PLS and Daubechies-Ridge. Often PLS and stepwise regression can predict substance concentrations equally well. In such situations, the preferred statistical method should be the simplest method. From our studies, we conclude that the type of wavelet is unimportant, the number of wavelets should be large enough to capture most of the variability in the wave forms, and the choice of the statistical method depends on the analyte.
KeywordsWavelet transformation Spectra data NIR prediction k-Fold cross-validation Statistical models
We acknowledge the support of National Center for Theoretical Sciences (South), Taiwan.
Conflict of interest
The authors declare no competing financial interest.
- 1.H. Wold, Soft modeling by latent variables; the nonlinear iterative partial least squares approach, in Perspectives in Probability and Statistics, Papers in Honour of M. S. Bartlett, ed. by J. Gandi (Academic Press, London, 1975)Google Scholar
- 5.R.A. Shaw, S. Low-Ting, M. Leroux, H.H. Mantsch, Toward reagent-free clinical analysis: quantitation of urine urea, creatinine, and total protein from the mid-infrared spectra of dried urine films. Clin. Chem. 46, 1493–1495 (2000)Google Scholar
- 7.M.A. Efroymson, Multiple regression analysis, in Mathematical Methods for Digital Computers, ed. by A. Ralston, H.S. Wilf (Wiley, New York, 1960)Google Scholar
- 9.A.S. Hadi, R.F. Ling, Some cautionary notes on the use of principle components regression. Am. Stat. 52, 15–19 (1998)Google Scholar
- 10.M. Stone, Cross-validatory choice and assessment of statistical predictions (with discussion). J. R. Stat. Soc. Ser. B 36, 111–147 (1974)Google Scholar
- 13.H. Abdi, Partial least square regression (PLS regression), in Encyclopedia of Measurement and Statistics, ed. by N.J. Salkind (Sage, CA, 2007), pp. 740–744Google Scholar
- 17.R.D. Tobias, An Introduction to Partial Least Squares Regression (SAS Institute Inc., Carey, 1997)Google Scholar
- 20.C.S. Burrus, R.A. Gopinath, H. Gou, Introduction to Wavelets and Wavelet Transforms: A Primer (Prentice Hall, New Jersey, 1997)Google Scholar