A Fragmentation Event Model for Peptide Identification by Mass Spectrometry

  • Yu Lin
  • Yantao Qiao
  • Shiwei Sun
  • Chungong Yu
  • Gongjin Dong
  • Dongbo Bu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4955)


We present in this paper a novel fragmentation event model for peptide identification by tandem mass spectrometry. Most current peptide identification techniques suffer from the inaccuracies in the predicted theoretical spectrum, which is due to insufficient understanding of the ion generation process, especially the b/y ratio puzzle.

  To overcome this difficulty, we propose a novel fragmentation event model, which is based on the abundance of fragmentation events rather than ion intensities. Experimental results demonstrate that this model helps improve database searching methods. On LTQ data set, when we control the false-positive rate to be 5%, our fragmentation event model has a significantly higher true positive rate (0.83) than SEQUEST (0.73). Comparison with Mascot exhibits similar results, which means that our model can effectively identify the false positive peptide-spectrum pairs reported by SEQUEST and Mascot.

This fragmentation event model can also be used to solve the problem of missing peak encountered by De Novo methods. To our knowledge, this is the first time the fragmentation preference for peptide bonds is used to overcome the missing-peak difficulty.



True Positive Rate Relative Entropy Tandem Mass Spectrum Theoretical Spectrum Fragmentation Event 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bafna, V., Edwards, N.: Scope: a probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics 17(1), S13–S21 (2001)CrossRefGoogle Scholar
  2. 2.
    Bartels, C.: Fast algorithm for peptide sequencing by mass spectroscopy. Biomed Environ Mass Spectrom 19(6), 363–368 (1990)CrossRefGoogle Scholar
  3. 3.
    Chen, T., Kao, M.Y., Rush, J., Church, G.M.: A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comput. Bio. 8(3), 325–337 (2001)CrossRefGoogle Scholar
  4. 4.
    Craig, R., Beavis, R.C.: Tandem: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467 (2004)CrossRefGoogle Scholar
  5. 5.
    Dancik, V., Addona, T.A., Clauser, K.R., Vath, J.E., Pevzner, P.A.: De novo peptide sequencing via tandem mass spectrometry. J. Comput. Bio. 6(3–4), 327–342 (1999)CrossRefGoogle Scholar
  6. 6.
    Elias, J.E., Gibbon, F.D., King, O.D., Roth, F.P., Gygi, S.P.: Intensity-based protein identification by machine learning from a library of tandem bass spectra. Nat. Biotechnol. 23(2), 214–214 (2004)CrossRefGoogle Scholar
  7. 7.
    Elias, J.E., Hass, W., Faherty, B.K., Gygi, S.P.: Comparative evaluation of mass spectrometry platforms used in large-scale proteomic investigations. Nature Methods 2(9), 667–675 (2005)CrossRefGoogle Scholar
  8. 8.
    Eng, J.K., McCormack, A.L., Yates, J.R.: An approach to correlate tandem massspectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass. Spect. 5, 976–989 (1994)CrossRefGoogle Scholar
  9. 9.
    Resing, K.A., et al.: Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. Anal. Chem. 76(13), 3556–3568 (2004)CrossRefGoogle Scholar
  10. 10.
    Frank, A., Pevzner, P.A.: Pepnovo: de novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77(4), 964–973 (2005)CrossRefGoogle Scholar
  11. 11.
    Frank, A., Tanner, S., Bafna, V., Pevzner, P.A.: Peptide sequence tags for fast database search in mass-spectrometry. J. Proteome. Res. 4(4), 1287–1295 (2005)CrossRefGoogle Scholar
  12. 12.
    Hines, W.M., Falick, A.M., Burlingame, A.L., Gibson, B.W.: Patternbased algorithm for peptide sequencing from tandem high energy collision-induced dissociation mass spectra. J. Am. Soc. Mass. Spect. 3, 326–336 (1992)CrossRefGoogle Scholar
  13. 13.
    Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. on Information Theory 37(1), 145–151 (1991)CrossRefzbMATHGoogle Scholar
  14. 14.
    Lu, B., Chen, T.: A suboptimal algorithm for de novo peptide sequencing via tandem mass spectrometry. J. Comput. Bio. 10(1), 1–12 (2003)CrossRefGoogle Scholar
  15. 15.
    Lu, B., Chen, T.: Algorithms for de novo peptide sequencing via tandem mass spectrometry. Drug Discovery Today: BioSilico 2, 85–90 (2004)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Ma, B., Zhang, K., Hendrie, C., Li, M., Doherty-Kirby, A., Lajoie, G.: Peaks: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17(20), 2337–2342 (2003)CrossRefGoogle Scholar
  17. 17.
    Matthiesen, R.: Methods, algorithms and tools in computational proteomics: a practical point of view. proteomics 7(16), 2815–2832 (2007)CrossRefGoogle Scholar
  18. 18.
    Matthiesen, R., Bunkenborg, J., Stensballe, A., Jensen, O.N.: Database-independent, database-dependent, and extended interpretation of peptide mass spectra in vems v2.0. Proteomics 4(9), 2583–2593 (2004)CrossRefGoogle Scholar
  19. 19.
    Paizs, B., Suhai, S.: Towards understanding the tandem mass spectra of protonated oligopeptides. 1: mechanism of amide bond cleavage. J. Am. Soc. Mass. Spect. 15(1), 103–113 (2004)CrossRefGoogle Scholar
  20. 20.
    Peng, J., Elias, J.E., Thoreen, J.E., Licklider, L.J., Gygi, S.P.: Evaluation of multidimensional chromotography coupled with tandem mass spectrometry (lc/lc-ms/ms) for large-scale protein anaysis: the yeast proteome. J. Proteome. Res. 2(1), 43–50 (2003)CrossRefGoogle Scholar
  21. 21.
    Perkins, D.N., Pappin, D.J., Creasy, D.M., Cottrell, J.S.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18), 3551–3567 (1999)CrossRefGoogle Scholar
  22. 22.
    Schutz, F., Kapp, E.A., Simpson, R.J., Speed, T.P.: Deriving statistical models for predicting peptide tandem ms product ion intensities. Proteomics 31, 1479–1483 (2003)Google Scholar
  23. 23.
    Tabb, D.L., Smith, L.L., Breci, L.A., Wysocki, W.H., Yates, J.R.: Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. Anal. Chem. 75(5), 1155–1163 (2003)CrossRefGoogle Scholar
  24. 24.
    Wan, Y., Chen, T.: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 163–173. Springer, Heidelberg (2005)Google Scholar
  25. 25.
    Wysocki, V.H., Tsaprailis, G., Smith, L.L., Breci, L.A.: Mobile and localized protons: a framework for understanding peptide dissociation. J. Mass Spectrom 35(12), 1399–1406 (2000)CrossRefGoogle Scholar
  26. 26.
    Yates, J.R.: Mass spectrometry and the age of the proteome. J. Mass Spectrom 33(1), 1–19 (1998)CrossRefGoogle Scholar
  27. 27.
    Yu, C., Lin, Y., Sun, S., Cai, J., Zhang, J., Bu, D., Zhang, Z., Chen, R.: An iterative algorithm to quantify factors influencing peptide fragmentation during tandem mass spectrometry. J. Bioinform. Comput. Biol. 5(2), 297–311 (2007)CrossRefGoogle Scholar
  28. 28.
    Zhang, N., Aebersold, R., Schwikowski, B.: Probid: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data. Proteomics 2(10), 1406–1412 (2002)CrossRefGoogle Scholar
  29. 29.
    Zhang, Z., Sun, S., Zhu, X., Chang, S., liu, X., Yu, C., Bu, D., Chen, R.: A novel scoring schema for peptide identification by searching protein sequence databases using tandem mass spectrometry data. BMC Bioinformatics 7(222) (2006)Google Scholar
  30. 30.
    Zhang, Z.Q.: Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76(14), 3908–3922 (2004)CrossRefGoogle Scholar
  31. 31.
    Zhu, H., Bilgin, M., Snyder, M.: Proteomics. Annu. Rev. Biochem. 72, 783–812 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Yu Lin
    • 1
  • Yantao Qiao
    • 1
  • Shiwei Sun
    • 1
  • Chungong Yu
    • 1
  • Gongjin Dong
    • 1
  • Dongbo Bu
    • 1
    • 2
  1. 1.Bioinformatics Group, Key Lab of Intelligent Software Systems, Institute of Computing TechnologyChinese Academy of SciencesChina
  2. 2.Bioinformatics LabUniversity of WaterlooOntarioCanada

Personalised recommendations