Skip to main content

Peptide Sequence Tags for Fast Database Search in Mass-Spectrometry

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2005)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3500))

Abstract

Filtration techniques, in the form of rapid elimination of candidate sequences while retaining the true one, are key ingredients of database searches in genomics. Although SEQUEST and Mascot are sometimes referred to as “BLAST for mass-spectrometry”, the key algorithmic idea of BLAST (filtration) was never implemented in these tools. As a result MS/MS protein identification tools are becoming too time-consuming for many applications including search for post-translationally modified peptides. Moreover, matching millions of spectra against all known proteins will soon make these tools too slow in the same way that “genome vs. genome” comparisons instantly made BLAST too slow. We describe the development of filters for MS/MS database searches that dramatically reduce the running time and effectively remove the bottlenecks in searching the huge space of protein modifications. Our approach, based on a probability model for determining the accuracy of sequence tags, achieves superior results compared to GutenTag, a popular tag generation algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aebersold, R., Mann, M.: Mass spectrometry-based proteomics. Nature 422, 198–207 (2003)

    Article  Google Scholar 

  2. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Communications of the ACM 18, 333–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bafna, V., Edwards, N.: SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics 17(suppl. 1), 13–21 (2001)

    Google Scholar 

  4. Bafna, V., Edwards, N.: On de-novo interpretation of tandem mass spectra for peptide identification. In: Proceedings of the Seventh Annual International Conference on Computational Molecular Biology, pp. 9–18 (2003)

    Google Scholar 

  5. Chen, T., Kao, M.Y., Tepel, M., Rush, J., Church, G.M.: A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 8, 325–337 (2001)

    Article  Google Scholar 

  6. Colinge, J., Masselot, A., Giron, M., Dessingy, T., Magnin, J.: OLAV: towards high-throughput tandem mass spectrometry data identification. Proteomics 3, 1454–1463 (2003)

    Article  Google Scholar 

  7. Cormen, T.H., Leiserson, C.H., Rivest, R.L., Stein, C.: Introduction to Algorrithms, 2nd edn. MIT Press, Cambridge (2001)

    Google Scholar 

  8. Creasy, D.M., Cottrell, J.S.: Error tolerant searching of uninterpreted tandem mass spectrometry data. Proteomics 2, 1426–1434 (2002)

    Article  Google Scholar 

  9. Dancík, V., Addona, T.A., Clauser, K.R., Vath, J.E., Pevzner, P.A.: De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 6, 327–342 (1999)

    Article  Google Scholar 

  10. Day, R.M., Borziak, A., Gorin, A.: Ppm-chain de novo peptide identification program comparable in performance to sequest. In: Proceedings of 2004 IEEE Computational Systems in Bioinformatics (CSB 2004), pp. 505–508 (2004)

    Google Scholar 

  11. Elias, J.E., Gibbons, F.D., King, O.D., Roth, F.P., Gygi, S.P.: Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22, 214–219 (2004)

    Article  Google Scholar 

  12. Eng, J.K., McCormack, A.L., Yates, J.R.: An Approach to Correlate Tandem Mass-Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. Journal of The American Society For Mass Spectrometry 5, 976–989 (1994)

    Article  Google Scholar 

  13. Frank, A., Pevzner, P.: Pepnovo: De novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77, 964–973 (2005)

    Article  Google Scholar 

  14. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  15. Havilio, M., Haddad, Y., Smilansky, Z.: Intensity-based statistical scorer for tandem mass spectrometry. Anal. Chem. 75, 435–444 (2003)

    Article  Google Scholar 

  16. Hernandez, P., Gras, R., Frey, J., Appel, R.D.: Popitam: towards new heuristic strategies to improve protein identification from tandem mass spectrometry data. Proteomics 3, 870–878 (2003)

    Article  Google Scholar 

  17. Keller, A., Purvine, S., Nesvizhskii, A.I., Stolyar, S., Goodlett, D.R., Kolker, E.: Experimental protein mixture for validating tandem mass spectral analysis. OMICS 6, 207–212 (2002)

    Article  Google Scholar 

  18. Lu, B., Chen, T.: A suboptimal algorithm for de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 10, 1–12 (2003)

    Article  MathSciNet  Google Scholar 

  19. Lu, B., Chen, T.: A suffix tree approach to the interpretation of tandem mass spectra: applications to peptides of non-specific digestion and post-translational modifications. Bioinformatics 19(suppl. 2), 113–121 (2003)

    Google Scholar 

  20. Lu, B., Chen, T.: Algorithms for de novo peptide sequencing via tandem mass spectrometry. Drug Discovery Today: BioSilico 2, 85–90 (2004)

    Article  MathSciNet  Google Scholar 

  21. Lubeck, O., Sewell, C., Gu, S., Chen, X., Cai, D.: New computational approaches for de novo peptide sequencing from MS/MS experiments. IEEE Proc. on Challenges in Biomedical Informatics 90, 1868–1874 (2002)

    Google Scholar 

  22. Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty-Kirby, A., Lajoie, G.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid. Commun. Mass. Spectrom. 17, 2337–2342 (2003)

    Article  Google Scholar 

  23. MacCoss, M.J., Wu, C.C., Yates, J.R.: Probability-based validation of protein identifications using a modified SEQUEST algorithm. Anal. Chem. 74, 5593–5599 (2002)

    Article  Google Scholar 

  24. Mann, M., Jensen, O.N.: Proteomic analysis of post-translational modifications. Nat. Biotechnol. 21, 255–261 (2003)

    Article  Google Scholar 

  25. Mann, M., Wilm, M.: Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Analytical Chemistry 66, 4390–4399 (1994)

    Article  Google Scholar 

  26. Nesvizhskii, A.I., Keller, A., Kolker, E., Aebersold, R.: A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003)

    Article  Google Scholar 

  27. Perkins, D.N., Pappin, D.J., Creasy, D.M., Cottrell, J.S.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999)

    Article  Google Scholar 

  28. Pevzner, P.A., Mulyukov, Z., Dancik, V., Tang, C.L.: Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. Genome Res. 11, 290–299 (2001)

    Article  Google Scholar 

  29. Prince, J.T., Carlson, M.W., Wang, R., Lu, P., Marcotte, E.M.: The need for a public proteomics repository (commentary). Nature Biotechnology (April 2004)

    Google Scholar 

  30. Razumovskaya, J., Olman, V., Xu, D., Uberbacher, E., VerBerkmoes, N.C., Hettich, R.L., Xu, Y.: A computational method for assessing peptide-identification reliability in tandem mass spectrometry analysis with sequest. Proteomics 4, 961–969 (2004)

    Article  Google Scholar 

  31. Sadygov, R.G., Yates, J.R.: A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal. Chem. 75, 3792–3798 (2003)

    Article  Google Scholar 

  32. Schutz, F., Kapp, E.A., Simpson, R.J., Speed, T.P.: Deriving statistical models for predicting peptide tandem ms product ion intensities. Biochem. Soc. Trans. 31, 1479–1483 (2003)

    Article  Google Scholar 

  33. Searle, B.C., Dasari, S., Turner, M., Reddy, A.P., Choi, D., Wilmarth, P.A., McCormack, A.L., David, L.L., Nagalla, S.R.: High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. Anal. Chem. 76, 2220–2230 (2004)

    Article  Google Scholar 

  34. Shevchenko, A., Sunyaev, S., Liska, A., Bork, P., Shevchenko, A.: Nanoelectrospray tandem mass spectrometry and sequence similarity searching for identification of proteins from organisms with unknown genomes. Methods Mol. Biol. 211, 221–234 (2003)

    Google Scholar 

  35. Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain (1994), http://www-2.cs.cmu.edu/homedirjrs/~jrspapers.html

  36. Sunyaev, S., Liska, A.J., Golod, A., Shevchenko, A., Shevchenko, A.: MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. Anal. Chem. 75, 1307–1315 (2003)

    Article  Google Scholar 

  37. Tabb, D.L., Saraf, A., Yates, J.R.: GutenTag: High-throughput sequence tagging via an empirically derived fragmentation model. Anal. Chem. 75, 6415–6421 (2003)

    Article  Google Scholar 

  38. Tabb, D.L., Smith, L.L., Breci, L.A., Wysocki, V.H., Lin, D., Yates, J.R.: Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. Anal. Chem. 75, 1155–1163 (2003)

    Article  Google Scholar 

  39. Tanner, S., Shu, H., Frank, A., Mumby, M., Pevzner, P., Bafna, V.: Inspect: Fast and accurate identification of post-translationally modified peptides from tandem mass spectra (2005) (submitted)

    Google Scholar 

  40. Taylor, J.A., Johnson, R.S.: Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid. Commun. Mass. Spectrom. 11, 1067–1075 (1997)

    Article  Google Scholar 

  41. Taylor, J.A., Johnson, R.S.: Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem. 73, 2594–2604 (2001)

    Article  Google Scholar 

  42. Yates, J.R., Eng, J.K., McCormack, A.L.: Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Anal. Chem. 67, 3202–3210 (1995)

    Article  Google Scholar 

  43. Yates, J.R., Eng, J.K., McCormack, A.L., Schieltz, D.: Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67, 1426–1436 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Frank, A., Tanner, S., Pevzner, P. (2005). Peptide Sequence Tags for Fast Database Search in Mass-Spectrometry. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_25

Download citation

  • DOI: https://doi.org/10.1007/11415770_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25866-7

  • Online ISBN: 978-3-540-31950-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics