# The wavelet transforms and statistical models for near infrared spectra analysis

## Abstract

Often extensive spectral data is collected on multiple samples with the goal of predicting one or more properties of the sample. For example, measurements can be made at hundreds of wavelengths along with the more expensive assay values. The predictor variables are often highly correlated and it is expected that only small sections of the wave are pertinent to the measured analytes. There is a need to simplify or compress the predictors to both save data storage and possibly de-noise the data prior to making predictive models. Our idea is to use a factorial design (a two-step frame work) to explore two wavelet transformations, Haar wavelets and Daubechies wavelets, with progressively better approximation to the raw data curves in combination with several statistical prediction methods, including stepwise regression, principal component regression, ridge regression and partial least squares regression. The plan is to study prediction quality using Haar-Step, Haar-PCR, Haar-PLS, Haar-Ridge, Daubechies-Step, Daubechies-PCR, Daubechies-PLS and Daubechies-Ridge. Often PLS and stepwise regression can predict substance concentrations equally well. In such situations, the preferred statistical method should be the simplest method. From our studies, we conclude that the type of wavelet is unimportant, the number of wavelets should be large enough to capture most of the variability in the wave forms, and the choice of the statistical method depends on the analyte.

## Keywords

Wavelet transformation Spectra data NIR prediction k-Fold cross-validation Statistical models## Notes

### Acknowledgments

We acknowledge the support of National Center for Theoretical Sciences (South), Taiwan.

### Conflict of interest

The authors declare no competing financial interest.

## Supplementary material

## References

- 1.H. Wold, Soft modeling by latent variables; the nonlinear iterative partial least squares approach, in
*Perspectives in Probability and Statistics, Papers in Honour of M. S. Bartlett*, ed. by J. Gandi (Academic Press, London, 1975)Google Scholar - 2.W. Lindberg, J.-A. Persson, S. Wold, Partial least-squares method for spectrofluorimetric analysis of mixtures of humic acid and ligninsulfonate. Anal. Chem.
**55**, 643–648 (1983)CrossRefGoogle Scholar - 3.B.G. Osborne, T. Fearn, A.R. Miller, S. Douglas, Application of near infrared reflectance spectroscopy to the compositional analysis of biscuit doughs. J. Sci. Food Agric.
**35**, 99–105 (1984)CrossRefGoogle Scholar - 4.P.J. Brown, T. Fearn, M. Vannucci, Bayesian wavelet regression on curves with application to a spectroscopic calibration problem. JASA
**96**, 398–408 (2001)CrossRefGoogle Scholar - 5.R.A. Shaw, S. Low-Ting, M. Leroux, H.H. Mantsch, Toward reagent-free clinical analysis: quantitation of urine urea, creatinine, and total protein from the mid-infrared spectra of dried urine films. Clin. Chem.
**46**, 1493–1495 (2000)Google Scholar - 6.I.E. Frank, J.H. Friedman, A statistical view of some chemometrics regression tools. Technometrics
**35**, 109–135 (1993)CrossRefGoogle Scholar - 7.M.A. Efroymson, Multiple regression analysis, in
*Mathematical Methods for Digital Computers*, ed. by A. Ralston, H.S. Wilf (Wiley, New York, 1960)Google Scholar - 8.W.F. Massy, Principal components regression in exploratory statistical research. J. Am. Stat. Assoc.
**60**, 234–246 (1965)CrossRefGoogle Scholar - 9.A.S. Hadi, R.F. Ling, Some cautionary notes on the use of principle components regression. Am. Stat.
**52**, 15–19 (1998)Google Scholar - 10.M. Stone, Cross-validatory choice and assessment of statistical predictions (with discussion). J. R. Stat. Soc. Ser. B
**36**, 111–147 (1974)Google Scholar - 11.P.H. Garthwaite, An interpretation of partial least squares. JASA
**89**, 122–127 (1994)CrossRefGoogle Scholar - 12.S. de Jong, SIMPLS: an alternative approach to partial least squares regression. Chemom. Intell. Lab.
**18**, 251–263 (1993)CrossRefGoogle Scholar - 13.H. Abdi, Partial least square regression (PLS regression), in
*Encyclopedia of Measurement and Statistics*, ed. by N.J. Salkind (Sage, CA, 2007), pp. 740–744Google Scholar - 14.N.A. Butler, M.C. Denham, The peculiar shrinkage properties of partial least squares regression. J. R. Stat. Soc.
**62**, 585–593 (2000)CrossRefGoogle Scholar - 15.C. Goutis, Partial least squares algorithm yields shrinkage estimators. Ann. Stat.
**24**, 816–824 (1996)CrossRefGoogle Scholar - 16.P. Hoskuldsson, PLS regression models. J. Chemom.
**2**, 1–218 (1988)CrossRefGoogle Scholar - 17.R.D. Tobias,
*An Introduction to Partial Least Squares Regression*(SAS Institute Inc., Carey, 1997)Google Scholar - 18.A.E. Hoerl, R.W. Kennard, Ridge regression: biased estimation for nonorthogonal problems. Technometrics
**12**, 69–82 (1970)CrossRefGoogle Scholar - 19.A.C. Rencher, F.C. Pun, Inflation of R2 in best subset regression. Technometrics
**22**, 49–53 (1980)CrossRefGoogle Scholar - 20.C.S. Burrus, R.A. Gopinath, H. Gou,
*Introduction to Wavelets and Wavelet Transforms: A Primer*(Prentice Hall, New Jersey, 1997)Google Scholar - 21.D. Donoho, J. Johnstone, Ideal special adaptation by wavelet shrinkage. Biometrika
**81**(3), 425–455 (1994)CrossRefGoogle Scholar - 22.T. Hastie, R. Tibshirani, J. Friedman,
*The Elements of Statistical Learning*(Springer, Berlin, 2001)CrossRefGoogle Scholar