Advertisement

Journal of Mathematical Chemistry

, Volume 53, Issue 2, pp 551–572 | Cite as

The wavelet transforms and statistical models for near infrared spectra analysis

  • Shu-Chuan Chen
  • Dan M. Hayden
  • Stanley S. Young
Original Paper

Abstract

Often extensive spectral data is collected on multiple samples with the goal of predicting one or more properties of the sample. For example, measurements can be made at hundreds of wavelengths along with the more expensive assay values. The predictor variables are often highly correlated and it is expected that only small sections of the wave are pertinent to the measured analytes. There is a need to simplify or compress the predictors to both save data storage and possibly de-noise the data prior to making predictive models. Our idea is to use a factorial design (a two-step frame work) to explore two wavelet transformations, Haar wavelets and Daubechies wavelets, with progressively better approximation to the raw data curves in combination with several statistical prediction methods, including stepwise regression, principal component regression, ridge regression and partial least squares regression. The plan is to study prediction quality using Haar-Step, Haar-PCR, Haar-PLS, Haar-Ridge, Daubechies-Step, Daubechies-PCR, Daubechies-PLS and Daubechies-Ridge. Often PLS and stepwise regression can predict substance concentrations equally well. In such situations, the preferred statistical method should be the simplest method. From our studies, we conclude that the type of wavelet is unimportant, the number of wavelets should be large enough to capture most of the variability in the wave forms, and the choice of the statistical method depends on the analyte.

Keywords

Wavelet transformation Spectra data NIR prediction k-Fold cross-validation Statistical models 

Notes

Acknowledgments

We acknowledge the support of National Center for Theoretical Sciences (South), Taiwan.

Conflict of interest

The authors declare no competing financial interest.

Supplementary material

10910_2014_434_MOESM1_ESM.docx (56 kb)
Supplementary material 1 (docx 56 KB)

References

  1. 1.
    H. Wold, Soft modeling by latent variables; the nonlinear iterative partial least squares approach, in Perspectives in Probability and Statistics, Papers in Honour of M. S. Bartlett, ed. by J. Gandi (Academic Press, London, 1975)Google Scholar
  2. 2.
    W. Lindberg, J.-A. Persson, S. Wold, Partial least-squares method for spectrofluorimetric analysis of mixtures of humic acid and ligninsulfonate. Anal. Chem. 55, 643–648 (1983)CrossRefGoogle Scholar
  3. 3.
    B.G. Osborne, T. Fearn, A.R. Miller, S. Douglas, Application of near infrared reflectance spectroscopy to the compositional analysis of biscuit doughs. J. Sci. Food Agric. 35, 99–105 (1984)CrossRefGoogle Scholar
  4. 4.
    P.J. Brown, T. Fearn, M. Vannucci, Bayesian wavelet regression on curves with application to a spectroscopic calibration problem. JASA 96, 398–408 (2001)CrossRefGoogle Scholar
  5. 5.
    R.A. Shaw, S. Low-Ting, M. Leroux, H.H. Mantsch, Toward reagent-free clinical analysis: quantitation of urine urea, creatinine, and total protein from the mid-infrared spectra of dried urine films. Clin. Chem. 46, 1493–1495 (2000)Google Scholar
  6. 6.
    I.E. Frank, J.H. Friedman, A statistical view of some chemometrics regression tools. Technometrics 35, 109–135 (1993)CrossRefGoogle Scholar
  7. 7.
    M.A. Efroymson, Multiple regression analysis, in Mathematical Methods for Digital Computers, ed. by A. Ralston, H.S. Wilf (Wiley, New York, 1960)Google Scholar
  8. 8.
    W.F. Massy, Principal components regression in exploratory statistical research. J. Am. Stat. Assoc. 60, 234–246 (1965)CrossRefGoogle Scholar
  9. 9.
    A.S. Hadi, R.F. Ling, Some cautionary notes on the use of principle components regression. Am. Stat. 52, 15–19 (1998)Google Scholar
  10. 10.
    M. Stone, Cross-validatory choice and assessment of statistical predictions (with discussion). J. R. Stat. Soc. Ser. B 36, 111–147 (1974)Google Scholar
  11. 11.
    P.H. Garthwaite, An interpretation of partial least squares. JASA 89, 122–127 (1994)CrossRefGoogle Scholar
  12. 12.
    S. de Jong, SIMPLS: an alternative approach to partial least squares regression. Chemom. Intell. Lab. 18, 251–263 (1993)CrossRefGoogle Scholar
  13. 13.
    H. Abdi, Partial least square regression (PLS regression), in Encyclopedia of Measurement and Statistics, ed. by N.J. Salkind (Sage, CA, 2007), pp. 740–744Google Scholar
  14. 14.
    N.A. Butler, M.C. Denham, The peculiar shrinkage properties of partial least squares regression. J. R. Stat. Soc. 62, 585–593 (2000)CrossRefGoogle Scholar
  15. 15.
    C. Goutis, Partial least squares algorithm yields shrinkage estimators. Ann. Stat. 24, 816–824 (1996)CrossRefGoogle Scholar
  16. 16.
    P. Hoskuldsson, PLS regression models. J. Chemom. 2, 1–218 (1988)CrossRefGoogle Scholar
  17. 17.
    R.D. Tobias, An Introduction to Partial Least Squares Regression (SAS Institute Inc., Carey, 1997)Google Scholar
  18. 18.
    A.E. Hoerl, R.W. Kennard, Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 69–82 (1970)CrossRefGoogle Scholar
  19. 19.
    A.C. Rencher, F.C. Pun, Inflation of R2 in best subset regression. Technometrics 22, 49–53 (1980)CrossRefGoogle Scholar
  20. 20.
    C.S. Burrus, R.A. Gopinath, H. Gou, Introduction to Wavelets and Wavelet Transforms: A Primer (Prentice Hall, New Jersey, 1997)Google Scholar
  21. 21.
    D. Donoho, J. Johnstone, Ideal special adaptation by wavelet shrinkage. Biometrika 81(3), 425–455 (1994)CrossRefGoogle Scholar
  22. 22.
    T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning (Springer, Berlin, 2001)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Shu-Chuan Chen
    • 1
  • Dan M. Hayden
    • 2
  • Stanley S. Young
    • 3
  1. 1.Department of MathematicsIdaho State UniversityPocatelloUSA
  2. 2.School of Mathematical and Statistical SciencesArizona State UniversityTempeUSA
  3. 3.National Institute of Statistical Sciences, Research Triangle ParkResearch Triangle ParkUSA

Personalised recommendations