A Decomposition Based Multi-objective Genetic Programming Algorithm for Classification of Highly Imbalanced Tandem Mass Spectrometry
- 87 Downloads
Preprocessing tandem mass spectra to classify the signal and noise peaks plays a crucial role for improving the accuracy of most peptide identification algorithms. As a CID tandem mass spectra dataset is highly imbalanced with high noise ratio and a small number of signal peaks (low signal to noise ratio), a classification strategy which is able to maintain the performance trade-off between the minority (signal) and the majority (noise) class accuracies prior to peptide identification is required. Therefore, this paper proposes a Multi-Objective Genetic Programming (MOGP) approach based on the idea of MOEA/D, named MOGP/D, to evolve a Pareto front of classifiers along the optimal trade-off surface that offers the best compromises between objectives. In comparison with an NSGA-II base MOGP method, called NSGP, with decreasing the signal to noise ratio, MOGP/D produces better solutions in the region of interest (centre of the Pareto front) according to the hypervolume indicator on the training sets. Moreover, the best compromise solution achieved by the proposed method is compared with the best single objective GP and the best of NSGP, and the results show that MOGP/D retains a reasonable number of signal peaks and filters more noise peaks compared to the other two methods. To further evaluate the effectiveness of MOGP/D, the preprocessed MS/MS data is submitted to the mostly used de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed multi-objective GP method improves the reliability of peptide identification compared to the single objective GP.
KeywordsGenetic programming Multi-objective optimisation Imbanalced binary classification Tandem mass spectrometry
- 2.Azari, S., Zhang, M., Xue, B., Peng, L.: Genetic programming for preprocessing tandem mass spectra to improve the reliability of peptide identification. In: Vellasco, M. (ed.) 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018. IEEE (2018)Google Scholar
- 6.Nguyen, B.H., Xue, B., Andreae, P., Ishibuchi, H., Zhang, M.: Multiple reference points-based decomposition for multiobjective feature selection in classification: static and dynamic mechanisms. IEEE Trans. Evol. Comput. 1(1), 170–184 (2020). https://doi.org/10.1109/TEVC.2019.2913831CrossRefGoogle Scholar
- 11.Riquelme, N., Von Lücken, C., Baran, B.: Performance metrics in multi-objective optimization. In: 2015 Latin American Computing Conference (CLEI), pp. 1–11. IEEE (2015)Google Scholar