, Volume 82, Issue 10, pp 1449–1457 | Cite as

Feature Extraction for LC–MS via Hierarchical Density Clustering

  • Huimin Zhu
  • Yi Chen
  • Cha Liu
  • Rong Wang
  • Gaokun Zhao
  • Binbin Hu
  • Hongchao Ji
  • Zhi-Min ZhangEmail author
  • Hongmei LuEmail author


Liquid chromatography coupled with mass spectrometry (LC–MS) is a popular analytical platform for metabolomic studies. Accurate and sensitive feature detection is a key step before further analysis. It is still challenging due to the large quantity and high complexity of LC–MS data sets. Pure ion chromatogram (PIC) consists of ions produced from metabolite without interferences. Therefore, hierarchical density-based spatial clustering of applications with noise (HDBSCAN) was applied to extract PICs from LC to MS data sets in this study. Since metabolites generate high-density and continuous ions in both m/z and elution time axes, HDBSCAN can cluster ions of the same metabolite into the same group and avoid the definition of m/z tolerance. Compared to centWave and PITracer, the proposed method achieved higher recall and comparable levels of precision for feature detection on simulated, MM48 and Arabidopsis thaliana (L.) Heynh data sets. It was implemented in Python and opensourced at

Graphic Abstract


LC–MS Pure ion chromatogram HDBSCAN Feature extraction 



This work is financially supported by the National Natural Science Foundation of China (Grant Numbers. 21305163, 21375151, 21675174, and 21873116) and the Yunnan Provincial Tobacco Monopoly Bureau China (Grant Number. 2019530000241019).

Compliance with ethical standards

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

10337_2019_3766_MOESM1_ESM.docx (29 kb)
Supplementary material 1 (DOCX 28 kb)


  1. 1.
    Zhou B, Feng Xiao J, Tuli L, Ressom HW (2012) LC–MS-based metabolomics. Mol BioSyst 8:470–481. CrossRefGoogle Scholar
  2. 2.
    Xiao JF, Zhou B, Ressom HW (2012) Metabolite identification and quantitation in LC–MS/MS-based metabolomics. TrAC, Trends Anal Chem 32:1–14. CrossRefGoogle Scholar
  3. 3.
    Gorrochategui E, Jaumot J, Lacorte S, Tauler R (2016) Data analysis strategies for targeted and untargeted LC–MS metabolomic studies: overview and workflow. TrAC, Trends Anal Chem 82:425–442. CrossRefGoogle Scholar
  4. 4.
    Katajamaa M, Orešič M (2005) Processing methods for differential analysis of LC/MS profile data. BMC Bioinform 6:179. CrossRefGoogle Scholar
  5. 5.
    Lommen A, Kools HJ (2012) MetAlign 3.0: performance enhancement by efficient use of advances in computer hardware. Metabolomics 8:719–726. CrossRefGoogle Scholar
  6. 6.
    Lommen A (2009) MetAlign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing. Anal Chem 81:3079–3086. CrossRefGoogle Scholar
  7. 7.
    Wei X, Sun W, Shi X et al (2011) MetSign: a computational platform for high-resolution mass spectrometry-based metabolomics. Anal Chem 83:7668–7675. CrossRefGoogle Scholar
  8. 8.
    Melamud E, Vastag L, Rabinowitz JD (2010) Metabolomic analysis and visualization engine for LC–MS data. Anal Chem 82:9818–9826. CrossRefGoogle Scholar
  9. 9.
    Röst HL, Sachsenberg T, Aiche S et al (2016) OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods 13:741–748. CrossRefGoogle Scholar
  10. 10.
    Sturm M, Bertsch A, Gröpl C et al (2008) OpenMS—an open-source software framework for mass spectrometry. BMC Bioinform 9:163. CrossRefGoogle Scholar
  11. 11.
    Röst HL, Schmitt U, Aebersold R, Malmström L (2014) pyOpenMS: a python-based interface to the OpenMS mass-spectrometry algorithm library. Proteomics 14:74–77. CrossRefGoogle Scholar
  12. 12.
    Tautenhahn R, Patti GJ, Rinehart D, Siuzdak G (2012) XCMS online: a web-based platform to process untargeted metabolomic data. Anal Chem 84:5035–5039. CrossRefGoogle Scholar
  13. 13.
    Smith CA, Want EJ, O’Maille G et al (2006) XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem 78:779–787. CrossRefGoogle Scholar
  14. 14.
    Pluskal T, Castillo S, Villar-Briones A, Orešič M (2010) MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinform 11:395. CrossRefGoogle Scholar
  15. 15.
    Katajamaa M, Miettinen J, Orešič M (2006) MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics 22:634–636. CrossRefGoogle Scholar
  16. 16.
    Fu H-Y, Guo X-M, Zhang Y-M et al (2017) AntDAS: automatic data analysis strategy for UPLC–QTOF-based nontargeted metabolic profiling analysis. Anal Chem 89:11083–11090. CrossRefGoogle Scholar
  17. 17.
    Tautenhahn R, Böttcher C, Neumann S (2008) Highly sensitive feature detection for high resolution LC/MS. BMC Bioinform 9:504. CrossRefGoogle Scholar
  18. 18.
    Mihaleva VV, Vorst O, Maliepaard C et al (2008) Accurate mass error correction in liquid chromatography time-of-flight mass spectrometry based metabolomics. Metabolomics 4:171–182. CrossRefGoogle Scholar
  19. 19.
    Åberg KM, Torgrip RJO, Kolmert J et al (2008) Feature detection and alignment of hyphenated chromatographic–mass spectrometric data: extraction of pure ion chromatograms using Kalman tracking. J Chromatogr A 1192:139–146. CrossRefGoogle Scholar
  20. 20.
    Tengstrand E, Lindberg J, Åberg KM (2014) TracMass 2—a modular suite of tools for processing chromatography-full scan mass spectrometry data. Anal Chem 86:3435–3442. CrossRefGoogle Scholar
  21. 21.
    Conley CJ, Smith R, Torgrip RJO et al (2014) Massifquant: open-source Kalman filter-based XC–MS isotope trace feature detection. Bioinformatics 30:2636–2643. CrossRefGoogle Scholar
  22. 22.
    Wang S-Y, Kuo C-H, Tseng YJ (2015) Ion trace detection algorithm to extract pure ion chromatograms to improve untargeted peak detection quality for liquid chromatography/time-of-flight mass spectrometry-based metabolomics data. Anal Chem 87:3048–3055. CrossRefGoogle Scholar
  23. 23.
    Ji H, Lu H, Zhang Z (2016) Pure ion chromatogram extraction via optimal k-means clustering. RSC Adv 6:56977–56985. CrossRefGoogle Scholar
  24. 24.
    Ji H, Zeng F, Xu Y et al (2017) KPIC2: an effective framework for mass spectrometry-based metabolomics using pure ion chromatograms. Anal Chem 89:7631–7640. CrossRefGoogle Scholar
  25. 25.
    Wang H, Song M (2011) Ckmeans. 1d. dp: optimal k-means clustering in one dimension by dynamic programming. R J 3:29–33CrossRefGoogle Scholar
  26. 26.
    Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pei J, Tseng VS, Cao L et al (eds) Advances in knowledge discovery and data mining. Springer, Berlin Heidelberg, pp 160–172CrossRefGoogle Scholar
  27. 27.
    Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad U (eds) Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, Portland, Oregon, pp 226–231Google Scholar
  28. 28.
    Campello RJGB, Moulavi D, Zimek A, Sander J (2015) Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans Knowl Discov Data 10:1–51. CrossRefGoogle Scholar
  29. 29.
    Zhang Z-M, Tong X, Peng Y et al (2015) Multiscale peak detection in wavelet space. Analyst 140:7955–7964. CrossRefGoogle Scholar
  30. 30.
    Tong X, Zhang Z, Zeng F et al (2016) Recursive wavelet peak detection of analytical signals. Chromatographia 79:1247–1255. CrossRefGoogle Scholar
  31. 31.
    Wang R, Ji H, Ma P et al (2017) Fast pure ion chromatograms extraction method for LC–MS. Chemom Intell Lab Syst 170:68–74. CrossRefGoogle Scholar
  32. 32.
    Bielow C, Aiche S, Andreotti S, Reinert K (2011) MSSimulator: simulation of mass spectrometry data. J Proteome Res 10:2922–2929. CrossRefGoogle Scholar
  33. 33.
    Kuhl C, Tautenhahn R, Böttcher C et al (2012) CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal Chem 84:283–289. CrossRefGoogle Scholar
  34. 34.
    Haug K, Salek RM, Conesa P et al (2012) MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Res 41:D781–D786. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Chemistry and Chemical EngineeringCentral South UniversityChangshaChina
  2. 2.Yunnan Academy of Tobacco Agricultural SciencesKunmingChina

Personalised recommendations