Abstract
Liquid chromatography coupled with mass spectrometry (LC–MS) is a popular analytical platform for metabolomic studies. Accurate and sensitive feature detection is a key step before further analysis. It is still challenging due to the large quantity and high complexity of LC–MS data sets. Pure ion chromatogram (PIC) consists of ions produced from metabolite without interferences. Therefore, hierarchical density-based spatial clustering of applications with noise (HDBSCAN) was applied to extract PICs from LC to MS data sets in this study. Since metabolites generate high-density and continuous ions in both m/z and elution time axes, HDBSCAN can cluster ions of the same metabolite into the same group and avoid the definition of m/z tolerance. Compared to centWave and PITracer, the proposed method achieved higher recall and comparable levels of precision for feature detection on simulated, MM48 and Arabidopsis thaliana (L.) Heynh data sets. It was implemented in Python and opensourced at http://www.github.com/zmzhang/HPIC.
Graphic Abstract
Similar content being viewed by others
References
Zhou B, Feng Xiao J, Tuli L, Ressom HW (2012) LC–MS-based metabolomics. Mol BioSyst 8:470–481. https://doi.org/10.1039/C1MB05350G
Xiao JF, Zhou B, Ressom HW (2012) Metabolite identification and quantitation in LC–MS/MS-based metabolomics. TrAC, Trends Anal Chem 32:1–14. https://doi.org/10.1016/j.trac.2011.08.009
Gorrochategui E, Jaumot J, Lacorte S, Tauler R (2016) Data analysis strategies for targeted and untargeted LC–MS metabolomic studies: overview and workflow. TrAC, Trends Anal Chem 82:425–442. https://doi.org/10.1016/j.trac.2016.07.004
Katajamaa M, Orešič M (2005) Processing methods for differential analysis of LC/MS profile data. BMC Bioinform 6:179. https://doi.org/10.1186/1471-2105-6-179
Lommen A, Kools HJ (2012) MetAlign 3.0: performance enhancement by efficient use of advances in computer hardware. Metabolomics 8:719–726. https://doi.org/10.1007/s11306-011-0369-1
Lommen A (2009) MetAlign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing. Anal Chem 81:3079–3086. https://doi.org/10.1021/ac900036d
Wei X, Sun W, Shi X et al (2011) MetSign: a computational platform for high-resolution mass spectrometry-based metabolomics. Anal Chem 83:7668–7675. https://doi.org/10.1021/ac2017025
Melamud E, Vastag L, Rabinowitz JD (2010) Metabolomic analysis and visualization engine for LC–MS data. Anal Chem 82:9818–9826. https://doi.org/10.1021/ac1021166
Röst HL, Sachsenberg T, Aiche S et al (2016) OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods 13:741–748. https://doi.org/10.1038/nmeth.3959
Sturm M, Bertsch A, Gröpl C et al (2008) OpenMS—an open-source software framework for mass spectrometry. BMC Bioinform 9:163. https://doi.org/10.1186/1471-2105-9-163
Röst HL, Schmitt U, Aebersold R, Malmström L (2014) pyOpenMS: a python-based interface to the OpenMS mass-spectrometry algorithm library. Proteomics 14:74–77. https://doi.org/10.1002/pmic.201300246
Tautenhahn R, Patti GJ, Rinehart D, Siuzdak G (2012) XCMS online: a web-based platform to process untargeted metabolomic data. Anal Chem 84:5035–5039. https://doi.org/10.1021/ac300698c
Smith CA, Want EJ, O’Maille G et al (2006) XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem 78:779–787. https://doi.org/10.1021/ac051437y
Pluskal T, Castillo S, Villar-Briones A, Orešič M (2010) MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinform 11:395. https://doi.org/10.1186/1471-2105-11-395
Katajamaa M, Miettinen J, Orešič M (2006) MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics 22:634–636. https://doi.org/10.1093/bioinformatics/btk039
Fu H-Y, Guo X-M, Zhang Y-M et al (2017) AntDAS: automatic data analysis strategy for UPLC–QTOF-based nontargeted metabolic profiling analysis. Anal Chem 89:11083–11090. https://doi.org/10.1021/acs.analchem.7b03160
Tautenhahn R, Böttcher C, Neumann S (2008) Highly sensitive feature detection for high resolution LC/MS. BMC Bioinform 9:504. https://doi.org/10.1186/1471-2105-9-504
Mihaleva VV, Vorst O, Maliepaard C et al (2008) Accurate mass error correction in liquid chromatography time-of-flight mass spectrometry based metabolomics. Metabolomics 4:171–182. https://doi.org/10.1007/s11306-008-0108-4
Åberg KM, Torgrip RJO, Kolmert J et al (2008) Feature detection and alignment of hyphenated chromatographic–mass spectrometric data: extraction of pure ion chromatograms using Kalman tracking. J Chromatogr A 1192:139–146. https://doi.org/10.1016/j.chroma.2008.03.033
Tengstrand E, Lindberg J, Åberg KM (2014) TracMass 2—a modular suite of tools for processing chromatography-full scan mass spectrometry data. Anal Chem 86:3435–3442. https://doi.org/10.1021/ac403905h
Conley CJ, Smith R, Torgrip RJO et al (2014) Massifquant: open-source Kalman filter-based XC–MS isotope trace feature detection. Bioinformatics 30:2636–2643. https://doi.org/10.1093/bioinformatics/btu359
Wang S-Y, Kuo C-H, Tseng YJ (2015) Ion trace detection algorithm to extract pure ion chromatograms to improve untargeted peak detection quality for liquid chromatography/time-of-flight mass spectrometry-based metabolomics data. Anal Chem 87:3048–3055. https://doi.org/10.1021/ac504711d
Ji H, Lu H, Zhang Z (2016) Pure ion chromatogram extraction via optimal k-means clustering. RSC Adv 6:56977–56985. https://doi.org/10.1039/C6RA08409E
Ji H, Zeng F, Xu Y et al (2017) KPIC2: an effective framework for mass spectrometry-based metabolomics using pure ion chromatograms. Anal Chem 89:7631–7640. https://doi.org/10.1021/acs.analchem.7b01547
Wang H, Song M (2011) Ckmeans. 1d. dp: optimal k-means clustering in one dimension by dynamic programming. R J 3:29–33
Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pei J, Tseng VS, Cao L et al (eds) Advances in knowledge discovery and data mining. Springer, Berlin Heidelberg, pp 160–172
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad U (eds) Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, Portland, Oregon, pp 226–231
Campello RJGB, Moulavi D, Zimek A, Sander J (2015) Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans Knowl Discov Data 10:1–51. https://doi.org/10.1145/2733381
Zhang Z-M, Tong X, Peng Y et al (2015) Multiscale peak detection in wavelet space. Analyst 140:7955–7964. https://doi.org/10.1039/C5AN01816A
Tong X, Zhang Z, Zeng F et al (2016) Recursive wavelet peak detection of analytical signals. Chromatographia 79:1247–1255. https://doi.org/10.1007/s10337-016-3155-4
Wang R, Ji H, Ma P et al (2017) Fast pure ion chromatograms extraction method for LC–MS. Chemom Intell Lab Syst 170:68–74. https://doi.org/10.1016/j.chemolab.2017.10.001
Bielow C, Aiche S, Andreotti S, Reinert K (2011) MSSimulator: simulation of mass spectrometry data. J Proteome Res 10:2922–2929. https://doi.org/10.1021/pr200155f
Kuhl C, Tautenhahn R, Böttcher C et al (2012) CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal Chem 84:283–289. https://doi.org/10.1021/ac202450g
Haug K, Salek RM, Conesa P et al (2012) MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Res 41:D781–D786. https://doi.org/10.1093/nar/gks1004
Acknowledgements
This work is financially supported by the National Natural Science Foundation of China (Grant Numbers. 21305163, 21375151, 21675174, and 21873116) and the Yunnan Provincial Tobacco Monopoly Bureau China (Grant Number. 2019530000241019).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhu, H., Chen, Y., Liu, C. et al. Feature Extraction for LC–MS via Hierarchical Density Clustering. Chromatographia 82, 1449–1457 (2019). https://doi.org/10.1007/s10337-019-03766-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10337-019-03766-1