A Machine Learning and Chemometrics Assisted Interpretation of Spectroscopic Data – A NMR-Based Metabolomics Platform for the Assessment of Brazilian Propolis

  • Marcelo Maraschin
  • Amélia Somensi-Zeggio
  • Simone K. Oliveira
  • Shirley Kuhnen
  • Maíra M. Tomazzoli
  • Ana C. M. Zeri
  • Rafael Carreira
  • Miguel Rocha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7632)

Abstract

In this work, a metabolomics dataset from 1H nuclear magnetic resonance spectroscopy of Brazilian propolis was analyzed using machine learning algorithms, including feature selection and classification methods. Partial least square-discriminant analysis (PLS-DA), random forest (RF), and wrapper methods combining decision trees and rules with evolutionary algorithms (EA) showed to be complementary approaches, allowing to obtain relevant information as to the importance of a given set of features, mostly related to the structural fingerprint of aliphatic and aromatic compounds typically found in propolis, e.g., fatty acids and phenolic compounds. The feature selection and decision tree-based algorithms used appear to be suitable tools for building classification models for the Brazilian propolis metabolomics regarding its geographic origin, with consistency, high accuracy, and avoiding redundant information as to the metabolic signature of relevant compounds.

Keywords

Supervised classification techniques evolutionary algorithms Random Forest PLS-DA wrapper methods NMR-based metabolomics 

References

  1. 1.
    Raamsdonk, L.M., Teusink, B., Broadhurst, D., Zhang, N., Hayes, A., Walsh, M.C., Berden, J.A., Brindle, K.M., Kell, D.B., Rowland, J.J., Westerhoff, H.V., van Dam, K., Oliver, S.G.: A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nature Biotechnology 19, 45–50 (2001), doi:10.1038/83496CrossRefGoogle Scholar
  2. 2.
    van den Berg, R.A., Hoefsloot, H.C., Westerhuis, J.A., Smilde, A.K., van der Werf, M.J.: Centering, scaling and transformations: improving the biological information content of metabolomics data. BMC Genomics 7, 142–147 (2006), doi:10.1186/1471-2164-7-142CrossRefGoogle Scholar
  3. 3.
    Weljie, A., Newton, J., Mercier, P., Carlson, E., Slupsky, C.: Targeted profiling: quantitative analysis of 1HNMR metabolomics data. Analytical Chemistry 78, 4430–4442 (2006), doi:10.1021/ac060209gCrossRefGoogle Scholar
  4. 4.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan-Kaufmann, Burlington (2011)Google Scholar
  5. 5.
    Watson, D.G., Peyfoon, E., Zheng, L., Lu, D., Seidel, V., Johnston, B., Parkinson, J.A., Fearnley, J.: Application of principal components analysis to 1H-NMR data obtained from propolis samples of different geographical origin. Phytochemical Analysis 17, 323–331 (2006), doi: 10.1002.pcaCrossRefGoogle Scholar
  6. 6.
    Hackstadt, A.J., Hess, A.M.: Filtering for increased power for microarray data analysis. BMC Bioinformatics 10, 11–23 (2009), doi:10.1186/1471-2105-10-11CrossRefGoogle Scholar
  7. 7.
    Brodsky, L., Moussaie, A., Shahaf, N., Aharoni, A., Rogachev, I.: Evaluation of peak picking quality in LC-MS metabolomics data. Analytical Chemistry 15, 9177–9187 (2010), doi:10.1021/ac101216eCrossRefGoogle Scholar
  8. 8.
    Xia, J., Psychogios, N., Young, N., Wishart, D.S.: MetaboAnalyst: a web server for me-tabolomic data analysis and interpretation. Nucleic Acids Research 37(Web Server issue), W652–W660 (2009), doi:10.1093/nar/gkp356CrossRefGoogle Scholar
  9. 9.
    Wehrens, R., Mevik, B.H.: Pls: partial least squares regression (PLSR) and principal component regression (PCR) (2007), R package version 2.1-0 Google Scholar
  10. 10.
    Kuhn, M., Wing, J., Weston, S., Williams, A.: Caret: classification and regression training (2008), R package version 3.45Google Scholar
  11. 11.
    Liaw, A., Wiener, M.: Classification and regression by random Forest (2002), R NewsGoogle Scholar
  12. 12.
    Leyden, D.E., Cox, R.H.: Analytical Applications of NMR. John Wiley & Sons, New York (1977)Google Scholar
  13. 13.
    Waterman, P.G., Mole, S.: Analysis of Plant Metabolites. Blackwell Scientific Publications, London (1994)Google Scholar
  14. 14.
    Fan, T.W.M., Lane, A.N.: Structure-based profiling of metabolites and isotopomers by NMR. Progress in Nuclear Magnetic Resonance Spectroscopy 52, 69–117 (2008), doi:10.1016/j.pnmrs.2007.03.002CrossRefGoogle Scholar
  15. 15.
    Bertelli, D., Papotti, G., Bortolotti, L., Marcazzanb, G.L., Plessia, M.: 1H-NMR simultaneous identification of health-relevant compounds in propolis extracts. Phytochemical Analysis 23, 260–266 (2011), doi:10.1002/pca.1352CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Marcelo Maraschin
    • 1
    • 3
  • Amélia Somensi-Zeggio
    • 1
  • Simone K. Oliveira
    • 1
  • Shirley Kuhnen
    • 1
  • Maíra M. Tomazzoli
    • 1
  • Ana C. M. Zeri
    • 2
  • Rafael Carreira
    • 3
  • Miguel Rocha
    • 3
  1. 1.Plant Morphogenesis and Biochemistry LaboratoryFederal University of Santa CatarinaFlorianópolisBrazil
  2. 2.National Laboratory of BioscienceCampinasBrazil
  3. 3.CCTC, School of EngineeringUniversity of MinhoBragaPortugal

Personalised recommendations