Skip to main content
Log in

Functional prediction of unidentified lipids using supervised classifiers

  • Original Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

Mass spectrometry (MS)-based metabolomics studies often require handling of both identified and unidentified metabolite data. In order to avoid bias in data interpretation, it would be of advantage for the data analysis to include all available data. A practical challenge in exploratory metabolomics analysis is therefore how to interpret the changes related to unidentified peaks. In this paper, we address the challenge by predicting the class membership of unknown peaks by applying and comparing multiple supervised classifiers to selected lipidomics datasets. The employed classifiers include k-nearest neighbours (k-NN), support vector machines (SVM), partial least squares and discriminant analysis (PLS-DA) and Naive Bayes methods which are known to be effective and efficient in predicting the labels for unseen data. Here, the class label predictions are sought for unidentified lipid profiles coming from high throughput global screening in Ultra Performance Liquid Chromatography Mass Spectrometry (UPLCTM/MS) experimental setup. Our investigation reveals that k-NN and SVM classifiers outperform both PLS-DA and Naive Bayes classifiers. Naive Bayes classifier perform poorly among all models and this observation seems logical as lipids are highly co-regulated and do not respect Naive Bayes assumptions of features being conditionally independent given the class. Common label predictions from k-NN and SVM can serve as a good starting point to explore full data and thereby facilitating exploratory studies where label information is critical for the data interpretation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Barker, M., & Rayens, W. (2003). Partial least squares for discrimination. Journal of Chemometrics, 17, 166–173.

    Article  CAS  Google Scholar 

  • Bijlsma, S., Bobeldijk, I., Verheij, E. R., et al. (2006). Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation. Analytical Chemistry, 78, 567–574. doi:10.1021/ac051495j.

    Article  CAS  PubMed  Google Scholar 

  • Brereton, R. G. (2006). Consequences of sample size, variable selection, and model validation and optimisation for predicting classification ability from analytical data. TrAC Trends in Analytical Chemistry, 25, 1103–1111.

    Article  CAS  Google Scholar 

  • Caffrey, M., & Hogan, J. (1992). LIPIDAT: A database of lipid phase transition temperatures and enthalpy changes. DMPC Data Subset Analysis. Chemistry and Physics of Lipids, 61, 1–109.

    Article  CAS  Google Scholar 

  • Chang, C. -C. & Lin, C. -J. (2001). LIBSVM: A library for support vector machines. Available online: http://www.csie.ntu.edu.tw/~cjlin/libsvm.

  • Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. New York: Wiley.

    Google Scholar 

  • Ejsing, C. S., Duchoslav, E., Sampaio, J., et al. (2006). Automated identification and quantification of glycerophospholipid molecular species by multiple precursor ion scanning. Analytical Chemistry, 78, 6202–6214.

    Article  CAS  PubMed  Google Scholar 

  • Ekroos, K., Chernushevich, I. V., Simons, K., & Shevchenko, A. (2002). Quantitative profiling of phospholipids by multiple precursor ion scanning on a hybrid quadrupole time-of-flight mass spectrometer. Analytical Chemistry, 74, 941–949.

    Article  CAS  PubMed  Google Scholar 

  • Fahy, E., Sud, M., Cotter, D., & Subramaniam, S. (2007). LIPID MAPS online tools for lipid research. Nucleic Acids Research, 35, W606–612.

    Article  PubMed  Google Scholar 

  • Han, X., & Gross, R. W. (2005). Shotgun lipidomics: Electrospray ionization mass spectrometric analysis and quantitation of cellular lipidomes directly from crude extracts of biological samples. Mass Spectrometry Reviews, 24, 367–412.

    Article  CAS  PubMed  Google Scholar 

  • Hand, D. J., Mannila, H., & Smyth, P. (2001). Principles of data mining. Cambridge: MIT Press.

    Google Scholar 

  • Hu, C., van Dommelen, J., van der Heijden, R., et al. (2008). RPLC-Ion-Trap-FTMS method for lipid profiling of plasma: Method validation and application to p53 mutant mouse model. Journal of Proteome Research, 7, 4982–4991. doi:10.1021/pr800373m.

    Article  CAS  PubMed  Google Scholar 

  • Katajamaa, M., Miettinen, J., & Oresic, M. (2006). MZmine: Toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics, 22, 634–636. doi:10.1093/bioinformatics/btk039.

    Article  CAS  PubMed  Google Scholar 

  • Katajamaa, M., & Orešic, M. (2005). Processing methods for differential analysis of LC/MS profile data. BMC Bioinformatics, 6, 179–190.

    Article  PubMed  Google Scholar 

  • Kind, T., & Fiehn, O. (2007). Seven golden rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics, 8, 105.

    Article  PubMed  Google Scholar 

  • Lu, Y., Hong, S., Gotlinger, K., & Serhan, C. (2006a). Lipid mediator informatics and proteomics in inflammation-resolution. The Scientific World Journal, 6, 589–614.

    CAS  Google Scholar 

  • Lu, Y., Hong, S., & Serhan, C. (2006b). Lipid mediator informatics-lipidomics: Novel pathways in mapping resolution. AAPS Journal, 8, E284–E297.

    Article  PubMed  Google Scholar 

  • Mertens, B. J. A., Noo, M. E. D., Tollenaar, R. A. E. M., & Deelder, A. M. (2006). Mass spectrometry proteomic diagnosis: Enacting the double cross-validatory paradigm. Journal of Computational Biology, 13(159), 1–1605. doi:10.1089/cmb.2006.13.1591.

    Google Scholar 

  • Moco, S., Vervoort, J., Moco, S., Bino, R. J., De Vos, R. C. H., & Bino, R. (2007). Metabolomics technologies and metabolite identification. TrAC Trends in Analytical Chemistry, 26, 855–866.

    Article  CAS  Google Scholar 

  • Pietiläinen, K. H., Sysi-Aho, M., Rissanen, A., et al. (2007). Acquired obesity is associated with changes in the serum lipidomic profile independent of genetic effects—a monozygotic twin study. PLoS ONE, 2, e218.

    Article  PubMed  Google Scholar 

  • Rogers, S., Scheltema, R. A., Girolami, M., & Breitling, R. (2009). Probabilistic assignment of formulas to mass peaks in metabolomics experiments. Bioinformatics, 25(51), 2–518. doi:10.1093/bioinformatics/btn642.

    Google Scholar 

  • Smit, S., Hoefsloot, H. C. J., & Smilde, A. K. (2008). Statistical data processing in clinical proteomics. Journal of Chromatography B, 866, 77–88.

    Article  CAS  Google Scholar 

  • Smit, S., van Breemen, M. J., Hoefsloot, H. C. J., Smilde, A. K., Aerts, J. M. F. G., & de Koster, C. G. (2007). Assessing the statistical validity of proteomics based biomarkers. Analytica Chimica Acta, 592, 210–217.

    Article  CAS  PubMed  Google Scholar 

  • Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society. Series B, 36, 111–133.

    Google Scholar 

  • Sud, M., Fahy, E., Cotter, D., et al. (2007). LMSD: LIPID MAPS structure database. Nucleic Acids Research, 35, D527–532.

    Article  CAS  PubMed  Google Scholar 

  • Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer-Verlag.

    Google Scholar 

  • Watanabe, K., Yasugi, E., & Oshima, M. (2000). How to search the glycolipid data in LIPIDBANK for Web: the newly developed lipid database. Japan Trend Glycoscience and Glycotechnology, 12, 175–184.

    CAS  Google Scholar 

  • Yetukuri, L., Katajamaa, M., Medina-Gomez, G., Seppanen-Laakso, T., Vidal-Puig, A., & Oresic, M. (2007). Bioinformatics strategies for lipidomics analysis: Characterization of obesity related hepatic steatosis. BMC Systems Biology, 1, 12.

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

This project was supported by the Academy of Finland (Decision # 111338).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laxman Yetukuri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yetukuri, L., Tikka, J., Hollmén, J. et al. Functional prediction of unidentified lipids using supervised classifiers. Metabolomics 6, 18–26 (2010). https://doi.org/10.1007/s11306-009-0179-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11306-009-0179-x

Keywords

Navigation