Abstract
In this work, a metabolomics dataset from 1H nuclear magnetic resonance spectroscopy of Brazilian propolis was analyzed using machine learning algorithms, including feature selection and classification methods. Partial least square-discriminant analysis (PLS-DA), random forest (RF), and wrapper methods combining decision trees and rules with evolutionary algorithms (EA) showed to be complementary approaches, allowing to obtain relevant information as to the importance of a given set of features, mostly related to the structural fingerprint of aliphatic and aromatic compounds typically found in propolis, e.g., fatty acids and phenolic compounds. The feature selection and decision tree-based algorithms used appear to be suitable tools for building classification models for the Brazilian propolis metabolomics regarding its geographic origin, with consistency, high accuracy, and avoiding redundant information as to the metabolic signature of relevant compounds.
Chapter PDF
Similar content being viewed by others
Keywords
References
Raamsdonk, L.M., Teusink, B., Broadhurst, D., Zhang, N., Hayes, A., Walsh, M.C., Berden, J.A., Brindle, K.M., Kell, D.B., Rowland, J.J., Westerhoff, H.V., van Dam, K., Oliver, S.G.: A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nature Biotechnology 19, 45–50 (2001), doi:10.1038/83496
van den Berg, R.A., Hoefsloot, H.C., Westerhuis, J.A., Smilde, A.K., van der Werf, M.J.: Centering, scaling and transformations: improving the biological information content of metabolomics data. BMC Genomics 7, 142–147 (2006), doi:10.1186/1471-2164-7-142
Weljie, A., Newton, J., Mercier, P., Carlson, E., Slupsky, C.: Targeted profiling: quantitative analysis of 1HNMR metabolomics data. Analytical Chemistry 78, 4430–4442 (2006), doi:10.1021/ac060209g
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan-Kaufmann, Burlington (2011)
Watson, D.G., Peyfoon, E., Zheng, L., Lu, D., Seidel, V., Johnston, B., Parkinson, J.A., Fearnley, J.: Application of principal components analysis to 1H-NMR data obtained from propolis samples of different geographical origin. Phytochemical Analysis 17, 323–331 (2006), doi: 10.1002.pca
Hackstadt, A.J., Hess, A.M.: Filtering for increased power for microarray data analysis. BMC Bioinformatics 10, 11–23 (2009), doi:10.1186/1471-2105-10-11
Brodsky, L., Moussaie, A., Shahaf, N., Aharoni, A., Rogachev, I.: Evaluation of peak picking quality in LC-MS metabolomics data. Analytical Chemistry 15, 9177–9187 (2010), doi:10.1021/ac101216e
Xia, J., Psychogios, N., Young, N., Wishart, D.S.: MetaboAnalyst: a web server for me-tabolomic data analysis and interpretation. Nucleic Acids Research 37(Web Server issue), W652–W660 (2009), doi:10.1093/nar/gkp356
Wehrens, R., Mevik, B.H.: Pls: partial least squares regression (PLSR) and principal component regression (PCR) (2007), R package version 2.1-0
Kuhn, M., Wing, J., Weston, S., Williams, A.: Caret: classification and regression training (2008), R package version 3.45
Liaw, A., Wiener, M.: Classification and regression by random Forest (2002), R News
Leyden, D.E., Cox, R.H.: Analytical Applications of NMR. John Wiley & Sons, New York (1977)
Waterman, P.G., Mole, S.: Analysis of Plant Metabolites. Blackwell Scientific Publications, London (1994)
Fan, T.W.M., Lane, A.N.: Structure-based profiling of metabolites and isotopomers by NMR. Progress in Nuclear Magnetic Resonance Spectroscopy 52, 69–117 (2008), doi:10.1016/j.pnmrs.2007.03.002
Bertelli, D., Papotti, G., Bortolotti, L., Marcazzanb, G.L., Plessia, M.: 1H-NMR simultaneous identification of health-relevant compounds in propolis extracts. Phytochemical Analysis 23, 260–266 (2011), doi:10.1002/pca.1352
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maraschin, M. et al. (2012). A Machine Learning and Chemometrics Assisted Interpretation of Spectroscopic Data – A NMR-Based Metabolomics Platform for the Assessment of Brazilian Propolis. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds) Pattern Recognition in Bioinformatics. PRIB 2012. Lecture Notes in Computer Science(), vol 7632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34123-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-34123-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34122-9
Online ISBN: 978-3-642-34123-6
eBook Packages: Computer ScienceComputer Science (R0)