Functional prediction of unidentified lipids using supervised classifiers

Yetukuri, Laxman; Tikka, Jarkko; Hollmén, Jaakko; Orešič, Matej

doi:10.1007/s11306-009-0179-x

Functional prediction of unidentified lipids using supervised classifiers

Original Article
Published: 10 September 2009

Volume 6, pages 18–26, (2010)
Cite this article

Metabolomics Aims and scope Submit manuscript

Laxman Yetukuri¹,
Jarkko Tikka²,
Jaakko Hollmén² &
…
Matej Orešič¹

237 Accesses
9 Citations
Explore all metrics

Abstract

Mass spectrometry (MS)-based metabolomics studies often require handling of both identified and unidentified metabolite data. In order to avoid bias in data interpretation, it would be of advantage for the data analysis to include all available data. A practical challenge in exploratory metabolomics analysis is therefore how to interpret the changes related to unidentified peaks. In this paper, we address the challenge by predicting the class membership of unknown peaks by applying and comparing multiple supervised classifiers to selected lipidomics datasets. The employed classifiers include k-nearest neighbours (k-NN), support vector machines (SVM), partial least squares and discriminant analysis (PLS-DA) and Naive Bayes methods which are known to be effective and efficient in predicting the labels for unseen data. Here, the class label predictions are sought for unidentified lipid profiles coming from high throughput global screening in Ultra Performance Liquid Chromatography Mass Spectrometry (UPLC^TM/MS) experimental setup. Our investigation reveals that k-NN and SVM classifiers outperform both PLS-DA and Naive Bayes classifiers. Naive Bayes classifier perform poorly among all models and this observation seems logical as lipids are highly co-regulated and do not respect Naive Bayes assumptions of features being conditionally independent given the class. Common label predictions from k-NN and SVM can serve as a good starting point to explore full data and thereby facilitating exploratory studies where label information is critical for the data interpretation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Barker, M., & Rayens, W. (2003). Partial least squares for discrimination. Journal of Chemometrics, 17, 166–173.
Article CAS Google Scholar
Bijlsma, S., Bobeldijk, I., Verheij, E. R., et al. (2006). Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation. Analytical Chemistry, 78, 567–574. doi:10.1021/ac051495j.
Article CAS PubMed Google Scholar
Brereton, R. G. (2006). Consequences of sample size, variable selection, and model validation and optimisation for predicting classification ability from analytical data. TrAC Trends in Analytical Chemistry, 25, 1103–1111.
Article CAS Google Scholar
Caffrey, M., & Hogan, J. (1992). LIPIDAT: A database of lipid phase transition temperatures and enthalpy changes. DMPC Data Subset Analysis. Chemistry and Physics of Lipids, 61, 1–109.
Article CAS Google Scholar
Chang, C. -C. & Lin, C. -J. (2001). LIBSVM: A library for support vector machines. Available online: http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. New York: Wiley.
Google Scholar
Ejsing, C. S., Duchoslav, E., Sampaio, J., et al. (2006). Automated identification and quantification of glycerophospholipid molecular species by multiple precursor ion scanning. Analytical Chemistry, 78, 6202–6214.
Article CAS PubMed Google Scholar
Ekroos, K., Chernushevich, I. V., Simons, K., & Shevchenko, A. (2002). Quantitative profiling of phospholipids by multiple precursor ion scanning on a hybrid quadrupole time-of-flight mass spectrometer. Analytical Chemistry, 74, 941–949.
Article CAS PubMed Google Scholar
Fahy, E., Sud, M., Cotter, D., & Subramaniam, S. (2007). LIPID MAPS online tools for lipid research. Nucleic Acids Research, 35, W606–612.
Article PubMed Google Scholar
Han, X., & Gross, R. W. (2005). Shotgun lipidomics: Electrospray ionization mass spectrometric analysis and quantitation of cellular lipidomes directly from crude extracts of biological samples. Mass Spectrometry Reviews, 24, 367–412.
Article CAS PubMed Google Scholar
Hand, D. J., Mannila, H., & Smyth, P. (2001). Principles of data mining. Cambridge: MIT Press.
Google Scholar
Hu, C., van Dommelen, J., van der Heijden, R., et al. (2008). RPLC-Ion-Trap-FTMS method for lipid profiling of plasma: Method validation and application to p53 mutant mouse model. Journal of Proteome Research, 7, 4982–4991. doi:10.1021/pr800373m.
Article CAS PubMed Google Scholar
Katajamaa, M., Miettinen, J., & Oresic, M. (2006). MZmine: Toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics, 22, 634–636. doi:10.1093/bioinformatics/btk039.
Article CAS PubMed Google Scholar
Katajamaa, M., & Orešic, M. (2005). Processing methods for differential analysis of LC/MS profile data. BMC Bioinformatics, 6, 179–190.
Article PubMed Google Scholar
Kind, T., & Fiehn, O. (2007). Seven golden rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics, 8, 105.
Article PubMed Google Scholar
Lu, Y., Hong, S., Gotlinger, K., & Serhan, C. (2006a). Lipid mediator informatics and proteomics in inflammation-resolution. The Scientific World Journal, 6, 589–614.
CAS Google Scholar
Lu, Y., Hong, S., & Serhan, C. (2006b). Lipid mediator informatics-lipidomics: Novel pathways in mapping resolution. AAPS Journal, 8, E284–E297.
Article PubMed Google Scholar
Mertens, B. J. A., Noo, M. E. D., Tollenaar, R. A. E. M., & Deelder, A. M. (2006). Mass spectrometry proteomic diagnosis: Enacting the double cross-validatory paradigm. Journal of Computational Biology, 13(159), 1–1605. doi:10.1089/cmb.2006.13.1591.
Google Scholar
Moco, S., Vervoort, J., Moco, S., Bino, R. J., De Vos, R. C. H., & Bino, R. (2007). Metabolomics technologies and metabolite identification. TrAC Trends in Analytical Chemistry, 26, 855–866.
Article CAS Google Scholar
Pietiläinen, K. H., Sysi-Aho, M., Rissanen, A., et al. (2007). Acquired obesity is associated with changes in the serum lipidomic profile independent of genetic effects—a monozygotic twin study. PLoS ONE, 2, e218.
Article PubMed Google Scholar
Rogers, S., Scheltema, R. A., Girolami, M., & Breitling, R. (2009). Probabilistic assignment of formulas to mass peaks in metabolomics experiments. Bioinformatics, 25(51), 2–518. doi:10.1093/bioinformatics/btn642.
Google Scholar
Smit, S., Hoefsloot, H. C. J., & Smilde, A. K. (2008). Statistical data processing in clinical proteomics. Journal of Chromatography B, 866, 77–88.
Article CAS Google Scholar
Smit, S., van Breemen, M. J., Hoefsloot, H. C. J., Smilde, A. K., Aerts, J. M. F. G., & de Koster, C. G. (2007). Assessing the statistical validity of proteomics based biomarkers. Analytica Chimica Acta, 592, 210–217.
Article CAS PubMed Google Scholar
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society. Series B, 36, 111–133.
Google Scholar
Sud, M., Fahy, E., Cotter, D., et al. (2007). LMSD: LIPID MAPS structure database. Nucleic Acids Research, 35, D527–532.
Article CAS PubMed Google Scholar
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer-Verlag.
Google Scholar
Watanabe, K., Yasugi, E., & Oshima, M. (2000). How to search the glycolipid data in LIPIDBANK for Web: the newly developed lipid database. Japan Trend Glycoscience and Glycotechnology, 12, 175–184.
CAS Google Scholar
Yetukuri, L., Katajamaa, M., Medina-Gomez, G., Seppanen-Laakso, T., Vidal-Puig, A., & Oresic, M. (2007). Bioinformatics strategies for lipidomics analysis: Characterization of obesity related hepatic steatosis. BMC Systems Biology, 1, 12.
Article PubMed Google Scholar

Download references

Acknowledgments

This project was supported by the Academy of Finland (Decision # 111338).

Author information

Authors and Affiliations

VTT Technical Research Centre of Finland, Tietotie 2, P.O.Box 1000, 02044, Espoo, Finland
Laxman Yetukuri & Matej Orešič
Department of Information and Computer Science, Helsinki University of Technology, TKK, P.O. Box 5400, 02015, Espoo, Finland
Jarkko Tikka & Jaakko Hollmén

Authors

Laxman Yetukuri
View author publications
You can also search for this author in PubMed Google Scholar
Jarkko Tikka
View author publications
You can also search for this author in PubMed Google Scholar
Jaakko Hollmén
View author publications
You can also search for this author in PubMed Google Scholar
Matej Orešič
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laxman Yetukuri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yetukuri, L., Tikka, J., Hollmén, J. et al. Functional prediction of unidentified lipids using supervised classifiers. Metabolomics 6, 18–26 (2010). https://doi.org/10.1007/s11306-009-0179-x

Download citation

Received: 02 July 2009
Accepted: 25 August 2009
Published: 10 September 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s11306-009-0179-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Functional prediction of unidentified lipids using supervised classifiers

Abstract

Access this article

Similar content being viewed by others

Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software

Lipidomics from sample preparation to data analysis: a primer

The human saliva metabolome

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Functional prediction of unidentified lipids using supervised classifiers

Abstract

Access this article

Similar content being viewed by others

Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software

Lipidomics from sample preparation to data analysis: a primer

The human saliva metabolome

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation