Data Treatment for LC-MS Untargeted Analysis

  • Samantha Riccadonna
  • Pietro Franceschi
Part of the Methods in Molecular Biology book series (MIMB, volume 1738)


Liquid chromatography-mass spectrometry (LC-MS) untargeted experiments require complex chemometrics strategies to extract information from the experimental data. Here we discuss “data preprocessing”, the set of procedures performed on the raw data to produce a data matrix which will be the starting point for the subsequent statistical analysis. Data preprocessing is a crucial step on the path to knowledge extraction, which should be carefully controlled and optimized in order to maximize the output of any untargeted metabolomics investigation.

Key words

Preprocessing Peak picking Retention time correction Metadata Quality check Missing values 


  1. 1.
    Ardrey RE (2003) Liquid chromatography – mass spectrometry: an introduction. John Wiley & Sons, Chichester, UKCrossRefGoogle Scholar
  2. 2.
    Patti GJ, Yanes O, Siuzdak G (2012) Innovation: metabolomics: the apogee of the omics trilogy. Nat Rev Mol Cell Biol 13:263–269CrossRefGoogle Scholar
  3. 3.
    Alonso A, Marsal S, Julià A (2015) Analytical methods in untargeted metabolomics: state of the art in 2015. Front Bioeng Biotechnol 3:23CrossRefGoogle Scholar
  4. 4.
    Sansone S-A, Rocca-Serra P, Field D et al (2012) Toward interoperable bioscience data. Nat Genet 44:121–126CrossRefGoogle Scholar
  5. 5.
    R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Google Scholar
  6. 6.
    De Vos RCH, Moco S, Lommen A et al (2007) Untargeted large-scale plant metabolomics using liquid chromatography coupled to mass spectrometry. Nat Protoc 2:778–791CrossRefGoogle Scholar
  7. 7.
    Rafiei A, Atefeh R, Lekha S (2014) Comparison of peak-picking workflows for untargeted liquid chromatography/high-resolution mass spectrometry metabolomics data analysis. Rapid Commun Mass Spectrom 29:119–127CrossRefGoogle Scholar
  8. 8.
    Yu T, Park Y, Johnson JM, Jones DP (2009) apLCMS–adaptive processing of high-resolution LC/MS data. Bioinformatics 25:1930–1936CrossRefGoogle Scholar
  9. 9.
    Gorrochategui E, Jaumot J, Tauler R (2015) A protocol for LC-MS metabolomic data processing using chemometric tools. Protocol Exchange.
  10. 10.
    Katajamaa M, Oresic M (2007) Data processing for mass spectrometry-based metabolomics. J Chromatogr A 1158:318–328CrossRefGoogle Scholar
  11. 11.
    Lange E, Tautenhahn R, Neumann S, Gröpl C (2008) Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements. BMC Bioinformatics 9:375CrossRefGoogle Scholar
  12. 12.
    Rocca-Serra P, Salek RM, Arita M et al (2016) Data standards can boost metabolomics research, and if there is a will, there is a way. Metabolomics 12:14CrossRefGoogle Scholar
  13. 13.
    González-Beltrán A, Neumann S, Maguire E et al (2014) The Risa R/bioconductor package: integrative data analysis from experimental metadata and back again. BMC Bioinformatics 15(Suppl 1):S11CrossRefGoogle Scholar
  14. 14.
    Rocca-Serra P, Brandizi M, Maguire E et al (2010) ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics 26:2354–2356CrossRefGoogle Scholar
  15. 15.
    Kessner D, Chambers M, Burke R et al (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24:2534–2536CrossRefGoogle Scholar
  16. 16.
    Chambers MC, Maclean B, Burke R et al (2012) A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol 30:918–920CrossRefGoogle Scholar
  17. 17.
    Smith CA, Want EJ, O’Maille G et al (2006) XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem 78:779–787CrossRefGoogle Scholar
  18. 18.
    Tautenhahn R, Böttcher C, Neumann S (2008) Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics 9:504CrossRefGoogle Scholar
  19. 19.
    Benton HP, Want EJ, Ebbels TMD (2010) Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data. Bioinformatics 26:2488–2489CrossRefGoogle Scholar
  20. 20.
    Franceschi P, Masuero D, Vrhovsek U et al (2012) A benchmark spike-in data set for biomarker identification in metabolomics: spike-in metabolomics apple data set. J Chemom 26:16–24CrossRefGoogle Scholar
  21. 21.
    Zhang J, Gonzalez E, Hestilow T et al (2009) Review of peak detection algorithms in liquid-chromatography-mass spectrometry. Curr Genomics 10:388–401CrossRefGoogle Scholar
  22. 22.
    Aberg KM, Alm E, Torgrip RJO (2009) The correspondence problem for metabonomics datasets. Anal Bioanal Chem 394:151–162CrossRefGoogle Scholar
  23. 23.
    Smith R, Ventura D, Prince JT (2015) LC-MS alignment in theory and practice: a comprehensive algorithmic review. Brief Bioinform 16:104–117CrossRefGoogle Scholar
  24. 24.
    Koch S, Bueschl C, Doppler M et al (2016) MetMatch: a semi-automated software tool for the comparison and alignment of LC-HRMS data from different metabolomics experiments. Meta.
  25. 25.
    Brodsky L, Moussaieff A, Shahaf N et al (2010) Evaluation of peak picking quality in LC-MS metabolomics data. Anal Chem 82:9177–9187CrossRefGoogle Scholar
  26. 26.
    Patti GJ, Tautenhahn R, Siuzdak G (2012) Meta-analysis of untargeted metabolomic data from multiple profiling experiments. Nat Protoc 7:508–516CrossRefGoogle Scholar
  27. 27.
    Prince JT, Marcotte EM (2006) Chromatographic alignment of ESI-LC-MS proteomics data sets by ordered bijective interpolated warping. Anal Chem 78:6140–6152CrossRefGoogle Scholar
  28. 28.
    Martens L, Chambers M, Sturm M et al (2011) mzML–a community standard for mass spectrometry data. Mol Cell Proteomics 10:R110.000133CrossRefGoogle Scholar
  29. 29.
    Wilhelm M, Kirchner M, Steen JAJ, Steen H (2012) mz5: space- and time-efficient storage of mass spectrometry data sets. Mol Cell Proteomics 11:O111.011379CrossRefGoogle Scholar
  30. 30.
    Bouyssié D, Dubois M, Nasso S et al (2015) mzDB: a file format using multiple indexing strategies for the efficient analysis of large LC-MS/MS and SWATH-MS data sets. Mol Cell Proteomics 14:771–781CrossRefGoogle Scholar
  31. 31.
    Krzywinski M, Altman N (2014) Points of significance: designing comparative experiments. Nat Methods 11:597–598CrossRefGoogle Scholar
  32. 32.
    Krzywinski M, Altman N (2014) Points of significance: analysis of variance and blocking. Nat Methods 11:699–700CrossRefGoogle Scholar
  33. 33.
    Krzywinski M, Altman N, Blainey P (2014) Points of significance: nested designs. Nat Methods 11:977–978CrossRefGoogle Scholar
  34. 34.
    Krzywinski M, Altman N (2014) Points of significance: two-factor designs. Nat Methods 11:1187–1188CrossRefGoogle Scholar
  35. 35.
    Altman N, Krzywinski M (2014) Points of significance: sources of variation. Nat Methods 12:5–6CrossRefGoogle Scholar
  36. 36.
    Blainey P, Krzywinski M, Altman N (2014) Points of significance: replication. Nat Methods 11:879–880CrossRefGoogle Scholar
  37. 37.
    Altman N, Krzywinski M (2015) Points of significance: split plot design. Nat Methods 12:165–166CrossRefGoogle Scholar
  38. 38.
    Haug K, Salek RM, Conesa P et al (2013) MetaboLights–an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Res 41:D781–D786CrossRefGoogle Scholar
  39. 39.
    Libiseller G, Dvorzak M, Kleb U et al (2015) IPO: a tool for automated optimization of XCMS parameters. BMC Bioinformatics 16:118CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Computational Biology Unit, Research and Innovation CentreFondazione E. MachTrentoItaly

Personalised recommendations