Skip to main content

Protein Identification from Tandem Mass Spectra by Database Searching

  • Protocol
  • First Online:
Protein Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1558))

Abstract

Protein identification from tandem mass spectra is one of the most versatile and widely used proteomics workflows, able to identify proteins, characterize post-translational modifications, and provide semiquantitative measurements of relative protein abundance. This manuscript describes the concepts, prerequisites, and methods required to analyze a tandem mass spectrometry dataset in order to identify its proteins, by using a tandem mass spectrometry search engine to search protein sequence databases. The discussion includes instructions for extraction, preparation, and formatting of spectral datafiles, selection of appropriate search parameter settings, and basic interpretation of the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See http://tools.proteomecenter.org/wiki/index.php?title=Formats:mzXML.

  2. 2.

    See http://proteowizard.sourceforge.net.

References

  1. Aebersold R, Mann M (2003) Mass spectrometry-based proteomics. Nature 422(6928):198–207

    Article  CAS  PubMed  Google Scholar 

  2. McDonald WH, Yates JR (2003) Shotgun proteomics: integrating technologies to answer biological questions. Curr Opin Mol Ther 5(3):302–309

    CAS  PubMed  Google Scholar 

  3. Sadygov RG, Cociorva D, Yates JR (2004) Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat Methods 1(3):195–202

    Article  CAS  PubMed  Google Scholar 

  4. Johnson R, Davis M, Taylor J et al (2005) Informatics for protein identification by mass spectrometry. Methods 35(3):223–236

    Article  CAS  PubMed  Google Scholar 

  5. Maccoss M (2005) Computational analysis of shotgun proteomics data. Curr Opin Chem Biol 9(1):88–94

    Article  CAS  PubMed  Google Scholar 

  6. Nesvizhskii AI (2007) Mass spectrometry data analysis in proteomics, Methods in Molecular Biology, vol 367, Humana Press, chap Protein Identification by Tandem Mass Spectrometry and Sequence Database Searching, pp 87–119

    Google Scholar 

  7. Deutsch EW, Lam H, Aebersold R (2008) Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. Physiol Genomics 33(1):18–25

    Article  CAS  PubMed  Google Scholar 

  8. Taylor A, Johnson RS (1997) Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom 11:1067–1075

    Article  CAS  PubMed  Google Scholar 

  9. Chen T, Kao MY, Tepel M et al (2001) A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J Comput Biol 8(3):325–337

    Article  CAS  PubMed  Google Scholar 

  10. Bafna V, Edwards N (2003) On de novo interpretation of tandem mass spectra for peptide identification. In: RECOMB ‘03: Proceedings of the seventh annual international conference on research in computational molecular biology. ACM Press, pp 9–18

    Google Scholar 

  11. Frank A, Pevzner P (2005) Pepnovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem 77(4):964–973

    Article  CAS  PubMed  Google Scholar 

  12. Mann M, Wilm M (1994) Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal Chem 66(24):4390–4399

    Article  CAS  PubMed  Google Scholar 

  13. Tanner S, Shu H, Frank A et al (2005) Inspect: identification of post translationally modified peptides from tandem mass spectra. Anal Chem 77(14):4626–4639

    Article  CAS  PubMed  Google Scholar 

  14. Tabb DL, Ma ZQ, Martin DB et al (2008) DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. J Proteome Res 7(9):3838–3846

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Dass C (2001) Principles and Practice of Biological Mass Spectrometry. John Wiley & Sons Inc.

    Google Scholar 

  16. Perkins DN, Pappin DJ, Creasy DM et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567

    Article  CAS  PubMed  Google Scholar 

  17. Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989

    Article  CAS  PubMed  Google Scholar 

  18. Craig R, Beavis RC (2004) Tandem: matching proteins with tandem mass spectra. Bioinformatics 20:1466–1467

    Article  CAS  PubMed  Google Scholar 

  19. Kim S, Pevzner PA (2014) MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 5:5277

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Edwards NJ (2013) PepArML: a meta-search peptide identification platform for tandem mass spectra. Curr Protoc Bioinformatics 44(13):23.1–2323

    Google Scholar 

  21. Slagel J, Mendoza L, Shteynberg D et al (2015) Processing shotgun proteomics data on the amazon cloud with the Trans-Proteomic pipeline. Mol Cell Proteomics 14(2):399–404

    Article  CAS  PubMed  Google Scholar 

  22. The UniProt Consortium (2010) The universal protein resource (UniProt) in 2010. Nucleic Acids Res 38(Database Issue):D142–D148

    Article  Google Scholar 

  23. Edwards NJ (2007) Novel peptide identification from tandem mass spectra using ESTs and sequence database compression. Mol Syst Biol 3(102)

    Google Scholar 

  24. Keller A, Eng J, Zhang N et al (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 1(17)

    Google Scholar 

  25. Kessner D, Chambers M, Burke R et al (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24(21):2534–2536

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Edwards N, Wu X, Tseng CW (2009) An unsupervised, model-free, machine-learning combiner for peptide identifications from tandem mass spectra. Clin Proc 5(1)

    Google Scholar 

  27. MacLean B, Eng JK, Beavis RC et al (2006) General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinformatics 22(22):2830–2832

    Article  CAS  PubMed  Google Scholar 

  28. Geer LY, Markey SP, Kowalak JA et al (2004) Open mass spectrometry search algorithm. J Proteome Res 3:958–964

    Article  CAS  PubMed  Google Scholar 

  29. Tabb DL, Fernando CG, Chambers MC (2007) Myrimatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 6(2):654–661

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Reiter L, Claassen M, Schrimpf SP et al (2009) Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry. Mol Cell Proteomics 8(11):2405–2417

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Frese CK, Altelaar AFM, Hennrich ML et al (2011) Improved peptide identification by targeted fragmentation using CID, HCD and ETD on an LTQ-orbitrap velos. J Proteome Res 10(5):2377–2388

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathan J. Edwards .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Edwards, N.J. (2017). Protein Identification from Tandem Mass Spectra by Database Searching. In: Wu, C., Arighi, C., Ross, K. (eds) Protein Bioinformatics. Methods in Molecular Biology, vol 1558. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6783-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6783-4_17

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6781-0

  • Online ISBN: 978-1-4939-6783-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics