Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search

Webb-Robertson, Bobbie-Jo M.

doi:10.1007/978-1-59745-493-3_28

Bobbie-Jo M. Webb-Robertson⁴

Part of the book series: Methods In Molecular Biology ((MIMB,volume 492))

4536 Accesses
8 Citations

Summary

Accurate identification of peptides is a current challenge in mass spectrometry (MS)-based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptide identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptide is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cannon, W. R., Jarman, K. H., Webb-Robertson, B. J., Baxter, D. J., Oehmen, C. S., Jarman, K. D., Heredia-Langner, A., Auberry, K. J., and Anderson, G. A. (2005) Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data. J. Proteome Res. 4, 1687–1698
Article CAS PubMed Google Scholar
Pappin, D., Rahman, D., Hansen, H., Bartlet-Jones, M., Jeffery, W., and Bleasby, A. (1996) Chemistry, mass spectrometry and peptide-mass databases: Evolution of methods for the rapid identification and mapping of cellular proteins. Mass Spectrom. Biol. Sci. 135–150
Google Scholar
Yates, J. R., III, Eng, J. K., McCormack, A.L., and Schieltz, D. (1995) Method to correlate tandem mass spectra of modified pep-tides to amino acid sequences in the protein database. Anal. Chem. 67, 1426–1436
Article CAS PubMed Google Scholar
Anderson, D. C., Li, W., Payan, D. G., and Noble, W. S. (2003) A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J Proteome Res. 2, 137–146.
Article CAS PubMed Google Scholar
Keller, A., Nesvizhskii, A. I., Kolker, E., and Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392
Article CAS PubMed Google Scholar
Moore, R. E., Young, M. K., and Lee, T. D. (2002) Qscore: an algorithm for evaluating SEQUEST database search results. J. Am. Soc. Mass Spectrom. 13, 378–386
Article CAS PubMed Google Scholar
Strittmatter, E. F., Kangas, L. J., Petritis, K., Mottaz, H. M., Anderson, G. A., Shen, Y., Jacobs, J. M., Camp, D. G., II, and Smith, R. D. (2004) Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry. J. Proteome Res. 3, 760–769
Article CAS PubMed Google Scholar
Cristianini, N., and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, Cambridge
Google Scholar
Vapnik, V. (1995) The Nature of Statistical Learning Theory, Springer, New York
Google Scholar
Scholkopf, B., Tsuda, K., and Ve r t, J. (ed.) (2004) Kernel Methods in Computational Biology, MIT Press, Cambridge
Google Scholar
Keller, A., Purvine, S., Nesvizhskii, A. I., Stolyar, S., Goodlett, D. R., and Kolker, E. (2002) Experimental protein mixture for validating tandem mass spectral analysis. Omics. 6, 207–212
Article CAS PubMed Google Scholar
Guyon, I., Weston, J., Barnhill, S., and Vap- nik, V. (2002) Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422
Article Google Scholar

Download references

Acknowledgments

This research was funded by the US Department of Energy (DOE) Office of Advanced Scientific Computing Research under contract No. 47901. The Pacific Northwest National Laboratory is operated by Battelle for U.S. DOE under contract DE-AC06– 76RLO 1830.

Author information

Authors and Affiliations

Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA, USA
Bobbie-Jo M. Webb-Robertson

Authors

Bobbie-Jo M. Webb-Robertson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fundamental & Computational Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
Mary S. Lipton
Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
Ljiljana Paša-Tolic

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Webb-Robertson, BJ.M. (2009). Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search. In: Lipton, M.S., Paša-Tolic, L. (eds) Mass Spectrometry of Proteins and Peptides. Methods In Molecular Biology, vol 492. Humana Press. https://doi.org/10.1007/978-1-59745-493-3_28

Download citation

DOI: https://doi.org/10.1007/978-1-59745-493-3_28
Publisher Name: Humana Press
Print ISBN: 978-1-934115-48-0
Online ISBN: 978-1-59745-493-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics