A Systematic Statistical Analysis of Ion Trap Tandem Mass Spectra in View of Peptide Scoring

  • Jacques Colinge
  • Alexandre Masselot
  • Jérôme Magnin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2812)


Tandem mass spectrometry has become central in proteomics projects. In particular, it is of prime importance to design sensitive and selective score functions to reliably identify peptides in databases. By using a huge collection of 140 000+ peptide MS/MS spectra, we systematically study the importance of many characteristics of a match (peptide sequence/spectrum) to include in a score function. Besides classical match characteristics, we investigate the value of new characteristics such as amino acid dependence and consecutive fragment matches. We finally select a combination of promising characteristics and show that the corresponding score function achieves very low false positive rates while being very sensitive, thereby enabling highly automated peptide identification in large proteomics projects. We compare our results to widely used protein identification systems and show a significant reduction in false positives.


Matrix Assisted Laser Desorption Ionization False Positive Rate Score Function True Positive Rate Systematic Statistical Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderson, D.C., Li, W., Payan, D.G., Noble, W.S.: A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J. Proteome Res. 2, 137–146 (2003)CrossRefGoogle Scholar
  2. 2.
    Bafna, V., Edwards, N.: SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics 17, S13–S21 (2001)Google Scholar
  3. 3.
    Colinge, J., Masselot, A., Giron, M., Dessingy, T., Magnin, J.: OLAV: Towards high-throughput MS/MS data identification. Proteomics (August 2003) (to appear)Google Scholar
  4. 4.
    Dancik, V., Addona, T.A., Clauser, K.R., Vath, J.E., Pevzner, P.A.: De novo peptide sequencing via tandem mass spectrometry: a graph-theoretical approach. J. Comp. Biol. 6, 327–342 (1999)CrossRefGoogle Scholar
  5. 5.
    Durbin, R., et al.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)zbMATHCrossRefGoogle Scholar
  6. 6.
    Eng, J.K., McCormack, A.J., Yates III, J.R.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994)CrossRefGoogle Scholar
  7. 7.
    Field, H.L., Fenyö, D., Beavis, R.C.: RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identifications, and archives data in a relational database. Proteomics 2, 36–47 (2002)CrossRefGoogle Scholar
  8. 8.
    Havilio, M., Haddad, Y., Smilansky, Z.: Intensity-based statistical scorer for tandem mass spectrometry. Anal. Chem. 75, 435–444 (2003)CrossRefGoogle Scholar
  9. 9.
    Henzel, W.J., et al.: Identifying protein from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proc. Natl. Acad. Sci. USA 90, 5011–5015 (1993)CrossRefGoogle Scholar
  10. 10.
    James, P.: Mass Spectrometry. Proteome Research. Springer, Berlin (2000)Google Scholar
  11. 11.
    Johnson, R.S., et al.: Collision-induced fragmentation of (m + h) +  ions of peptides. Side chain specific sequence ions. Intl. J. Mass Spectrom. and Ion Processes 86, 137–154 (1988)CrossRefGoogle Scholar
  12. 12.
    Keller, A., Nesvizhskii, A.I., Kolker, E., Aebersold, R.: Empirical statistical model to estimate the accuracy of peptide identification made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002)CrossRefGoogle Scholar
  13. 13.
    Keller, A., Purvine, S., Nesvizhskii, A.I., Stolyar, S., Goodlett, D.R., Kolker, E.: Experimental protein mixture for validating tandem mass spectral analysis. OMICS 6, 207–212 (2002)CrossRefGoogle Scholar
  14. 14.
    Liebler, D.C., Hansen, B.T., Davey, S.W., Tiscareno, L., Mason, D.E.: Peptide sequence motif analysis of tandem MS data with the SALSA algorithm. Anal. Chem. 74, 203–210 (2002)CrossRefGoogle Scholar
  15. 15.
    Masselot, A., Magnin, J., Giron, M., Dessingy, T., Ferrer, D., Colinge, J.: OLAV: General applicability of model-based MS/MS peptide score functions. In: Proc. 51st Am. Soc. Mass Spectrom., Montreal (2003)Google Scholar
  16. 16.
    McCormack, A.L., et al.: Direct analysis and identification of proteins in mixture by LC/MS/MS and database searching at the low-femtomole level. Anal. Chem. 69, 767–776 (1997)CrossRefGoogle Scholar
  17. 17.
    Moore, R.E., Young, M.K., Lee, T.D.: Qscore: An algorithm for evaluating sequest database search results. J. Am. Soc. Mass Spectrom. 13, 378–386 (2002)CrossRefGoogle Scholar
  18. 18.
    Papayannopoulos, I.A.: The interpretation of collision-induced dissociation mass spectra of peptides. Mass Spectrometry Review 14, 49–73 (1995)CrossRefGoogle Scholar
  19. 19.
    Papin, D.J., Hojrup, P., Bleasby, A.J.: Rapid identification of proteins by peptide-mass fingerprinting. Curr. Biol. 3, 327–332 (1993)CrossRefGoogle Scholar
  20. 20.
    Perkins, D.N., Pappin, D.J., Creasy, D.M., Cottrell, J.S.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999)CrossRefGoogle Scholar
  21. 21.
    Petritis, K., Kangas, L.J., Fergusson, P.L., Anderson, G.A., Paša-Tolić, L., Lipton, M.S., Auberry, K.J., Strittmatter, E.F., Shen, Y., Zhao, R., Smith, R.D.: Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analysis. Anal. Chem. 75, 1039–1048 (2003)CrossRefGoogle Scholar
  22. 22.
    Poor, H.V.: An Introduction to Signal Detection and Estimation. Springer, New York (1994)zbMATHGoogle Scholar
  23. 23.
    Sadygov, R.G., Eng, J., Durr, E., Saraf, A., McDonald, H., MacCoss, M.J., Yates, J.: Code development to improve the efficiency of automated MS/MS spectra interpretation. J. Proteome Res. 1, 211–215 (2002)CrossRefGoogle Scholar
  24. 24.
    Schütz, F., Kapp, E.A., Eddes, J.E., Simpson, R.J., Speed, T.P., Speed, T.P.: Deriving statistical models for predicting fragment ion intensities. In: Proc. 51st Am. Soc. Mass Spectrom., Montreal (2003)Google Scholar
  25. 25.
    Skilling, J.K.: Improved methods of identifying peptides and protein by mass spectrometry. European Patent Application EP 1,047,107,A2 (1999)Google Scholar
  26. 26.
    Snyder, P.: Interpreting Protein Mass Spectra. Oxford University Press, Washington (2000)Google Scholar
  27. 27.
    Tabb, D.L., Smith, L.L., Breci, L.A., Wysocki, V.H., Lin, D., Yates, J.: Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. Anal. Chem. 75, 1155–1163 (2003)CrossRefGoogle Scholar
  28. 28.
    Yates, J., Eng, J.K.: Identification of nucleotides, amino acids, or carbohydrates by mass spectrometry. United States Patent 6,017,693 (1994)Google Scholar
  29. 29.
    Zhang, N., Aebersold, R., Schwikowski, B.: ProbId: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data. Proteomics 2, 1406–1412 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Jacques Colinge
    • 1
  • Alexandre Masselot
    • 1
  • Jérôme Magnin
    • 1
  1. 1.GeneProt Inc.MeyrinSwitzerland

Personalised recommendations