Feature Extraction for LC–MS via Hierarchical Density Clustering
- 64 Downloads
Liquid chromatography coupled with mass spectrometry (LC–MS) is a popular analytical platform for metabolomic studies. Accurate and sensitive feature detection is a key step before further analysis. It is still challenging due to the large quantity and high complexity of LC–MS data sets. Pure ion chromatogram (PIC) consists of ions produced from metabolite without interferences. Therefore, hierarchical density-based spatial clustering of applications with noise (HDBSCAN) was applied to extract PICs from LC to MS data sets in this study. Since metabolites generate high-density and continuous ions in both m/z and elution time axes, HDBSCAN can cluster ions of the same metabolite into the same group and avoid the definition of m/z tolerance. Compared to centWave and PITracer, the proposed method achieved higher recall and comparable levels of precision for feature detection on simulated, MM48 and Arabidopsis thaliana (L.) Heynh data sets. It was implemented in Python and opensourced at http://www.github.com/zmzhang/HPIC.
KeywordsLC–MS Pure ion chromatogram HDBSCAN Feature extraction
This work is financially supported by the National Natural Science Foundation of China (Grant Numbers. 21305163, 21375151, 21675174, and 21873116) and the Yunnan Provincial Tobacco Monopoly Bureau China (Grant Number. 2019530000241019).
Compliance with ethical standards
Conflict of interest
All authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- 22.Wang S-Y, Kuo C-H, Tseng YJ (2015) Ion trace detection algorithm to extract pure ion chromatograms to improve untargeted peak detection quality for liquid chromatography/time-of-flight mass spectrometry-based metabolomics data. Anal Chem 87:3048–3055. https://doi.org/10.1021/ac504711d CrossRefGoogle Scholar
- 27.Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad U (eds) Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, Portland, Oregon, pp 226–231Google Scholar