Skip to main content

Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search

  • Protocol
Book cover Mass Spectrometry of Proteins and Peptides

Part of the book series: Methods In Molecular Biology ((MIMB,volume 492))

Summary

Accurate identification of peptides is a current challenge in mass spectrometry (MS)-based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptide identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptide is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cannon, W. R., Jarman, K. H., Webb-Robertson, B. J., Baxter, D. J., Oehmen, C. S., Jarman, K. D., Heredia-Langner, A., Auberry, K. J., and Anderson, G. A. (2005) Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data. J. Proteome Res. 4, 1687–1698

    Article  CAS  PubMed  Google Scholar 

  2. Pappin, D., Rahman, D., Hansen, H., Bartlet-Jones, M., Jeffery, W., and Bleasby, A. (1996) Chemistry, mass spectrometry and peptide-mass databases: Evolution of methods for the rapid identification and mapping of cellular proteins. Mass Spectrom. Biol. Sci. 135–150

    Google Scholar 

  3. Yates, J. R., III, Eng, J. K., McCormack, A.L., and Schieltz, D. (1995) Method to correlate tandem mass spectra of modified pep-tides to amino acid sequences in the protein database. Anal. Chem. 67, 1426–1436

    Article  CAS  PubMed  Google Scholar 

  4. Anderson, D. C., Li, W., Payan, D. G., and Noble, W. S. (2003) A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J Proteome Res. 2, 137–146.

    Article  CAS  PubMed  Google Scholar 

  5. Keller, A., Nesvizhskii, A. I., Kolker, E., and Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392

    Article  CAS  PubMed  Google Scholar 

  6. Moore, R. E., Young, M. K., and Lee, T. D. (2002) Qscore: an algorithm for evaluating SEQUEST database search results. J. Am. Soc. Mass Spectrom. 13, 378–386

    Article  CAS  PubMed  Google Scholar 

  7. Strittmatter, E. F., Kangas, L. J., Petritis, K., Mottaz, H. M., Anderson, G. A., Shen, Y., Jacobs, J. M., Camp, D. G., II, and Smith, R. D. (2004) Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry. J. Proteome Res. 3, 760–769

    Article  CAS  PubMed  Google Scholar 

  8. Cristianini, N., and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, Cambridge

    Google Scholar 

  9. Vapnik, V. (1995) The Nature of Statistical Learning Theory, Springer, New York

    Google Scholar 

  10. Scholkopf, B., Tsuda, K., and Ve r t, J. (ed.) (2004) Kernel Methods in Computational Biology, MIT Press, Cambridge

    Google Scholar 

  11. Keller, A., Purvine, S., Nesvizhskii, A. I., Stolyar, S., Goodlett, D. R., and Kolker, E. (2002) Experimental protein mixture for validating tandem mass spectral analysis. Omics. 6, 207–212

    Article  CAS  PubMed  Google Scholar 

  12. Guyon, I., Weston, J., Barnhill, S., and Vap- nik, V. (2002) Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422

    Article  Google Scholar 

Download references

Acknowledgments

This research was funded by the US Department of Energy (DOE) Office of Advanced Scientific Computing Research under contract No. 47901. The Pacific Northwest National Laboratory is operated by Battelle for U.S. DOE under contract DE-AC06– 76RLO 1830.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Webb-Robertson, BJ.M. (2009). Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search. In: Lipton, M.S., Paša-Tolic, L. (eds) Mass Spectrometry of Proteins and Peptides. Methods In Molecular Biology, vol 492. Humana Press. https://doi.org/10.1007/978-1-59745-493-3_28

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-493-3_28

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-934115-48-0

  • Online ISBN: 978-1-59745-493-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics