A Decomposition Based Multi-objective Genetic Programming Algorithm for Classification of Highly Imbalanced Tandem Mass Spectrometry

  • Samaneh AzariEmail author
  • Bing Xue
  • Mengjie Zhang
  • Lifeng Peng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12047)


Preprocessing tandem mass spectra to classify the signal and noise peaks plays a crucial role for improving the accuracy of most peptide identification algorithms. As a CID tandem mass spectra dataset is highly imbalanced with high noise ratio and a small number of signal peaks (low signal to noise ratio), a classification strategy which is able to maintain the performance trade-off between the minority (signal) and the majority (noise) class accuracies prior to peptide identification is required. Therefore, this paper proposes a Multi-Objective Genetic Programming (MOGP) approach based on the idea of MOEA/D, named MOGP/D, to evolve a Pareto front of classifiers along the optimal trade-off surface that offers the best compromises between objectives. In comparison with an NSGA-II base MOGP method, called NSGP, with decreasing the signal to noise ratio, MOGP/D produces better solutions in the region of interest (centre of the Pareto front) according to the hypervolume indicator on the training sets. Moreover, the best compromise solution achieved by the proposed method is compared with the best single objective GP and the best of NSGP, and the results show that MOGP/D retains a reasonable number of signal peaks and filters more noise peaks compared to the other two methods. To further evaluate the effectiveness of MOGP/D, the preprocessed MS/MS data is submitted to the mostly used de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed multi-objective GP method improves the reliability of peptide identification compared to the single objective GP.


Genetic programming Multi-objective optimisation Imbanalced binary classification Tandem mass spectrometry 


  1. 1.
    Sheng, Q., et al.: Preprocessing significantly improves the peptide/protein identification sensitivity of high-resolution isobarically labeled tandem mass spectrometry data. Mol. Cell. Proteomics 14(2), 405–417 (2015)CrossRefGoogle Scholar
  2. 2.
    Azari, S., Zhang, M., Xue, B., Peng, L.: Genetic programming for preprocessing tandem mass spectra to improve the reliability of peptide identification. In: Vellasco, M. (ed.) 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018. IEEE (2018)Google Scholar
  3. 3.
    Azari, S., Xue, B., Zhang, M., Peng, L.: Preprocessing tandem mass spectra using genetic programming for peptide identification. J. Am. Soc. Mass Spectrom. 30, 1–14 (2019)CrossRefGoogle Scholar
  4. 4.
    Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Reusing genetic programming for ensemble selection in classification of unbalanced data. IEEE Trans. Evol. Comput. 18(6), 893–908 (2013)CrossRefGoogle Scholar
  5. 5.
    Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans. Evol. Comput. 17(3), 368–386 (2012)CrossRefGoogle Scholar
  6. 6.
    Nguyen, B.H., Xue, B., Andreae, P., Ishibuchi, H., Zhang, M.: Multiple reference points-based decomposition for multiobjective feature selection in classification: static and dynamic mechanisms. IEEE Trans. Evol. Comput. 1(1), 170–184 (2020). Scholar
  7. 7.
    Ma, X., Zhang, Q., Tian, G., Yang, J., Zhu, Z.: On tchebycheff decomposition approaches for multiobjective evolutionary optimization. IEEE Trans. Evol. Comput. 22(2), 226–244 (2017)CrossRefGoogle Scholar
  8. 8.
    Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Wessels, H.J.C.T., et al.: A comprehensive full factorial LC-MS/MS proteomics benchmark data set. Proteomics 12(14), 2276–2281 (2012)CrossRefGoogle Scholar
  10. 10.
    Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)CrossRefGoogle Scholar
  11. 11.
    Riquelme, N., Von Lücken, C., Baran, B.: Performance metrics in multi-objective optimization. In: 2015 Latin American Computing Conference (CLEI), pp. 1–11. IEEE (2015)Google Scholar
  12. 12.
    Paul, S., Das, S.: Simultaneous feature selection and weighting-an evolutionary multi-objective optimization approach. Pattern Recogn. Lett. 65, 51–59 (2015)CrossRefGoogle Scholar
  13. 13.
    Ma, B., et al.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17(20), 2337–2342 (2003)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Samaneh Azari
    • 1
    Email author
  • Bing Xue
    • 1
  • Mengjie Zhang
    • 1
  • Lifeng Peng
    • 2
  1. 1.School of Engineering and Computer ScienceVictoria University of WellingtonWellingtonNew Zealand
  2. 2.Centre for Biodiscovery and School of Biological SciencesVictoria University of WellingtonWellingtonNew Zealand

Personalised recommendations