Summary
Accurate identification of peptides is a current challenge in mass spectrometry (MS)-based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptide identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptide is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cannon, W. R., Jarman, K. H., Webb-Robertson, B. J., Baxter, D. J., Oehmen, C. S., Jarman, K. D., Heredia-Langner, A., Auberry, K. J., and Anderson, G. A. (2005) Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data. J. Proteome Res. 4, 1687–1698
Pappin, D., Rahman, D., Hansen, H., Bartlet-Jones, M., Jeffery, W., and Bleasby, A. (1996) Chemistry, mass spectrometry and peptide-mass databases: Evolution of methods for the rapid identification and mapping of cellular proteins. Mass Spectrom. Biol. Sci. 135–150
Yates, J. R., III, Eng, J. K., McCormack, A.L., and Schieltz, D. (1995) Method to correlate tandem mass spectra of modified pep-tides to amino acid sequences in the protein database. Anal. Chem. 67, 1426–1436
Anderson, D. C., Li, W., Payan, D. G., and Noble, W. S. (2003) A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J Proteome Res. 2, 137–146.
Keller, A., Nesvizhskii, A. I., Kolker, E., and Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392
Moore, R. E., Young, M. K., and Lee, T. D. (2002) Qscore: an algorithm for evaluating SEQUEST database search results. J. Am. Soc. Mass Spectrom. 13, 378–386
Strittmatter, E. F., Kangas, L. J., Petritis, K., Mottaz, H. M., Anderson, G. A., Shen, Y., Jacobs, J. M., Camp, D. G., II, and Smith, R. D. (2004) Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry. J. Proteome Res. 3, 760–769
Cristianini, N., and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, Cambridge
Vapnik, V. (1995) The Nature of Statistical Learning Theory, Springer, New York
Scholkopf, B., Tsuda, K., and Ve r t, J. (ed.) (2004) Kernel Methods in Computational Biology, MIT Press, Cambridge
Keller, A., Purvine, S., Nesvizhskii, A. I., Stolyar, S., Goodlett, D. R., and Kolker, E. (2002) Experimental protein mixture for validating tandem mass spectral analysis. Omics. 6, 207–212
Guyon, I., Weston, J., Barnhill, S., and Vap- nik, V. (2002) Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422
Acknowledgments
This research was funded by the US Department of Energy (DOE) Office of Advanced Scientific Computing Research under contract No. 47901. The Pacific Northwest National Laboratory is operated by Battelle for U.S. DOE under contract DE-AC06– 76RLO 1830.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Webb-Robertson, BJ.M. (2009). Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search. In: Lipton, M.S., Paša-Tolic, L. (eds) Mass Spectrometry of Proteins and Peptides. Methods In Molecular Biology, vol 492. Humana Press. https://doi.org/10.1007/978-1-59745-493-3_28
Download citation
DOI: https://doi.org/10.1007/978-1-59745-493-3_28
Publisher Name: Humana Press
Print ISBN: 978-1-934115-48-0
Online ISBN: 978-1-59745-493-3
eBook Packages: Springer Protocols