Peptide Sequence Tags for Fast Database Search in Mass-Spectrometry

Frank, Ari; Tanner, Stephen; Pevzner, Pavel

doi:10.1007/11415770_25

Ari Frank²⁵,
Stephen Tanner²⁶ &
Pavel Pevzner²⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3500))

Included in the following conference series:

Annual International Conference on Research in Computational Molecular Biology

1145 Accesses
4 Citations

Abstract

Filtration techniques, in the form of rapid elimination of candidate sequences while retaining the true one, are key ingredients of database searches in genomics. Although SEQUEST and Mascot are sometimes referred to as “BLAST for mass-spectrometry”, the key algorithmic idea of BLAST (filtration) was never implemented in these tools. As a result MS/MS protein identification tools are becoming too time-consuming for many applications including search for post-translationally modified peptides. Moreover, matching millions of spectra against all known proteins will soon make these tools too slow in the same way that “genome vs. genome” comparisons instantly made BLAST too slow. We describe the development of filters for MS/MS database searches that dramatically reduce the running time and effectively remove the bottlenecks in searching the huge space of protein modifications. Our approach, based on a probability model for determining the accuracy of sequence tags, achieves superior results compared to GutenTag, a popular tag generation algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aebersold, R., Mann, M.: Mass spectrometry-based proteomics. Nature 422, 198–207 (2003)
Article Google Scholar
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Communications of the ACM 18, 333–340 (1975)
Article MATH MathSciNet Google Scholar
Bafna, V., Edwards, N.: SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics 17(suppl. 1), 13–21 (2001)
Google Scholar
Bafna, V., Edwards, N.: On de-novo interpretation of tandem mass spectra for peptide identification. In: Proceedings of the Seventh Annual International Conference on Computational Molecular Biology, pp. 9–18 (2003)
Google Scholar
Chen, T., Kao, M.Y., Tepel, M., Rush, J., Church, G.M.: A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 8, 325–337 (2001)
Article Google Scholar
Colinge, J., Masselot, A., Giron, M., Dessingy, T., Magnin, J.: OLAV: towards high-throughput tandem mass spectrometry data identification. Proteomics 3, 1454–1463 (2003)
Article Google Scholar
Cormen, T.H., Leiserson, C.H., Rivest, R.L., Stein, C.: Introduction to Algorrithms, 2nd edn. MIT Press, Cambridge (2001)
Google Scholar
Creasy, D.M., Cottrell, J.S.: Error tolerant searching of uninterpreted tandem mass spectrometry data. Proteomics 2, 1426–1434 (2002)
Article Google Scholar
Dancík, V., Addona, T.A., Clauser, K.R., Vath, J.E., Pevzner, P.A.: De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 6, 327–342 (1999)
Article Google Scholar
Day, R.M., Borziak, A., Gorin, A.: Ppm-chain de novo peptide identification program comparable in performance to sequest. In: Proceedings of 2004 IEEE Computational Systems in Bioinformatics (CSB 2004), pp. 505–508 (2004)
Google Scholar
Elias, J.E., Gibbons, F.D., King, O.D., Roth, F.P., Gygi, S.P.: Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22, 214–219 (2004)
Article Google Scholar
Eng, J.K., McCormack, A.L., Yates, J.R.: An Approach to Correlate Tandem Mass-Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. Journal of The American Society For Mass Spectrometry 5, 976–989 (1994)
Article Google Scholar
Frank, A., Pevzner, P.: Pepnovo: De novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77, 964–973 (2005)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
MATH Google Scholar
Havilio, M., Haddad, Y., Smilansky, Z.: Intensity-based statistical scorer for tandem mass spectrometry. Anal. Chem. 75, 435–444 (2003)
Article Google Scholar
Hernandez, P., Gras, R., Frey, J., Appel, R.D.: Popitam: towards new heuristic strategies to improve protein identification from tandem mass spectrometry data. Proteomics 3, 870–878 (2003)
Article Google Scholar
Keller, A., Purvine, S., Nesvizhskii, A.I., Stolyar, S., Goodlett, D.R., Kolker, E.: Experimental protein mixture for validating tandem mass spectral analysis. OMICS 6, 207–212 (2002)
Article Google Scholar
Lu, B., Chen, T.: A suboptimal algorithm for de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 10, 1–12 (2003)
Article MathSciNet Google Scholar
Lu, B., Chen, T.: A suffix tree approach to the interpretation of tandem mass spectra: applications to peptides of non-specific digestion and post-translational modifications. Bioinformatics 19(suppl. 2), 113–121 (2003)
Google Scholar
Lu, B., Chen, T.: Algorithms for de novo peptide sequencing via tandem mass spectrometry. Drug Discovery Today: BioSilico 2, 85–90 (2004)
Article MathSciNet Google Scholar
Lubeck, O., Sewell, C., Gu, S., Chen, X., Cai, D.: New computational approaches for de novo peptide sequencing from MS/MS experiments. IEEE Proc. on Challenges in Biomedical Informatics 90, 1868–1874 (2002)
Google Scholar
Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty-Kirby, A., Lajoie, G.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid. Commun. Mass. Spectrom. 17, 2337–2342 (2003)
Article Google Scholar
MacCoss, M.J., Wu, C.C., Yates, J.R.: Probability-based validation of protein identifications using a modified SEQUEST algorithm. Anal. Chem. 74, 5593–5599 (2002)
Article Google Scholar
Mann, M., Jensen, O.N.: Proteomic analysis of post-translational modifications. Nat. Biotechnol. 21, 255–261 (2003)
Article Google Scholar
Mann, M., Wilm, M.: Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Analytical Chemistry 66, 4390–4399 (1994)
Article Google Scholar
Nesvizhskii, A.I., Keller, A., Kolker, E., Aebersold, R.: A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003)
Article Google Scholar
Perkins, D.N., Pappin, D.J., Creasy, D.M., Cottrell, J.S.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999)
Article Google Scholar
Pevzner, P.A., Mulyukov, Z., Dancik, V., Tang, C.L.: Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. Genome Res. 11, 290–299 (2001)
Article Google Scholar
Prince, J.T., Carlson, M.W., Wang, R., Lu, P., Marcotte, E.M.: The need for a public proteomics repository (commentary). Nature Biotechnology (April 2004)
Google Scholar
Razumovskaya, J., Olman, V., Xu, D., Uberbacher, E., VerBerkmoes, N.C., Hettich, R.L., Xu, Y.: A computational method for assessing peptide-identification reliability in tandem mass spectrometry analysis with sequest. Proteomics 4, 961–969 (2004)
Article Google Scholar
Sadygov, R.G., Yates, J.R.: A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal. Chem. 75, 3792–3798 (2003)
Article Google Scholar
Schutz, F., Kapp, E.A., Simpson, R.J., Speed, T.P.: Deriving statistical models for predicting peptide tandem ms product ion intensities. Biochem. Soc. Trans. 31, 1479–1483 (2003)
Article Google Scholar
Searle, B.C., Dasari, S., Turner, M., Reddy, A.P., Choi, D., Wilmarth, P.A., McCormack, A.L., David, L.L., Nagalla, S.R.: High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. Anal. Chem. 76, 2220–2230 (2004)
Article Google Scholar
Shevchenko, A., Sunyaev, S., Liska, A., Bork, P., Shevchenko, A.: Nanoelectrospray tandem mass spectrometry and sequence similarity searching for identification of proteins from organisms with unknown genomes. Methods Mol. Biol. 211, 221–234 (2003)
Google Scholar
Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain (1994), http://www-2.cs.cmu.edu/homedirjrs/~jrspapers.html
Sunyaev, S., Liska, A.J., Golod, A., Shevchenko, A., Shevchenko, A.: MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. Anal. Chem. 75, 1307–1315 (2003)
Article Google Scholar
Tabb, D.L., Saraf, A., Yates, J.R.: GutenTag: High-throughput sequence tagging via an empirically derived fragmentation model. Anal. Chem. 75, 6415–6421 (2003)
Article Google Scholar
Tabb, D.L., Smith, L.L., Breci, L.A., Wysocki, V.H., Lin, D., Yates, J.R.: Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. Anal. Chem. 75, 1155–1163 (2003)
Article Google Scholar
Tanner, S., Shu, H., Frank, A., Mumby, M., Pevzner, P., Bafna, V.: Inspect: Fast and accurate identification of post-translationally modified peptides from tandem mass spectra (2005) (submitted)
Google Scholar
Taylor, J.A., Johnson, R.S.: Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid. Commun. Mass. Spectrom. 11, 1067–1075 (1997)
Article Google Scholar
Taylor, J.A., Johnson, R.S.: Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem. 73, 2594–2604 (2001)
Article Google Scholar
Yates, J.R., Eng, J.K., McCormack, A.L.: Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Anal. Chem. 67, 3202–3210 (1995)
Article Google Scholar
Yates, J.R., Eng, J.K., McCormack, A.L., Schieltz, D.: Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67, 1426–1436 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0114, USA
Ari Frank & Pavel Pevzner
Department of Bioinformatics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0419, USA
Stephen Tanner

Authors

Ari Frank
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Tanner
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Pevzner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, 108-8639, Minato-ku, Tokyo, Japan
Satoru Miyano
Broad Institute of MIT and Harvard, 320 Charles Street, 02141-2023, Cambridge, MA, USA
Jill Mesirov
Computational Genomics Laboratory, Department of Bioengineering, Boston University, 44 Cummington St., 02215, Boston, MA, USA
Simon Kasif
Center for Molecular Biology and Computer Sciecne Department, Brown University, 115 Waterman St., 02912, Providence, RI, USA
Sorin Istrail
University of California, San Diego, USA
Pavel A. Pevzner
Department of Molecular and Computational Biology, University of Southern California, 1050 Childs Way, 90089-2910, Los Angeles, CA, USA
Michael Waterman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Frank, A., Tanner, S., Pevzner, P. (2005). Peptide Sequence Tags for Fast Database Search in Mass-Spectrometry. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_25

Download citation

DOI: https://doi.org/10.1007/11415770_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25866-7
Online ISBN: 978-3-540-31950-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics