Generating Peptide Candidates from Amino-Acid Sequence Databases for Protein Identification via Mass Spectrometry

Edwards, Nathan; Lippert, Ross

doi:10.1007/3-540-45784-4_6

Nathan Edwards⁶ &
Ross Lippert⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2452))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1113 Accesses
11 Citations

Abstract

Protein identification via mass spectrometry forms the foundation of high-throughput proteomics. Tandem mass spectrometry, when applied to a complex mixture of peptides, selects and fragments each peptide to reveal its amino-acid sequence structure. The successful analysis of such an experiment typically relies on amino-acid sequence databases to provide a set of biologically relevant peptides to examine. A key sub-problem, then, for amino-acid sequence database search engines that analyze tandem mass spectra is to efficiently generate all the peptide candidates from a sequence database with mass equal to one of a large set of observed peptide masses. We demonstrate that to solve the problem efficiently, we must deal with substring redundancy in the amino-acid sequence database and focus our attention on looking up the observed peptide masses quickly. We show that it is possible, with some preprocessing and memory overhead, to solve the peptide candidate generation problem in time asymptotically proportional to the size of the sequence database and the number of peptide candidates output.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

V. Bafna and N. Edwards. Scope: A probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics, 17(Suppl. 1):S13–S21, 2001.
Google Scholar
T. Chen, M. Kao, M. Tepel, J. Rush, and G. Church. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. In ACMSIAM Symposium on Discrete Algorithms, 2000.
Google Scholar
M. Cieliebak, T. Erlebach, S. Lipták, J. Stoye, and E. Welzl. Algorithmic complexity of protein identification: Combinatorics of weighted strings. Submitted to Discrete Applied Mathematics special issue on Combinatorics of Searching, Sorting, and Coding., 2002.
Google Scholar
J. Cottrell and C. Sutton. The identification of electrophoretically separated proteins by peptide mass fingerprinting. Methods in Molecular Biology, 61:67–82, 1996.
Google Scholar
V. Dancik, T. Addona, K. Clauser, J. Vath, and P. Pevzner. De novo peptide sequencing via tandem mass spectrometry. Journal of Computational Biology, 6:327–342, 1999.
Article Google Scholar
J. Eng, A. McCormack, and J. Yates. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of American Society of Mass Spectrometry, 5:976–989, 1994.
Article Google Scholar
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
Google Scholar
P. James, M. Quadroni, E. Carafoli, and G. Gonnet. Protein identification in dna databases by peptide mass fingerprinting. Protein Science, 3(8):1347–1350, 1994.
Article Google Scholar
S. Kurtz. Reducing the space requirement of suffix trees. Software-Practice and xperience, 29(13):1149–1171, 1999.
Article Google Scholar
D. Pappin. Peptide mass fingerprinting using maldi-tof mass spectrometry. Methods in Molecular Biology, 64:165–173, 1997.
Google Scholar
D. Pappin, P. Hojrup, and A. Bleasby. Rapid identification of proteins by peptidemass fingerprinting. Currents in Biology, 3(6):327–332, 1993.
Article Google Scholar
D. Perkins, D. Pappin, D. Creasy, and J. Cottrell. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20(18):3551–3567, 1997.
Article Google Scholar
P. Pevzner, V. Dancik, and C. Tang. Mutation-tolerant protein identification by mass-spectrometry. In R. Shamir, S. Miyano, S. Istrail, P. Pevzner, and M. Waterman, editors, International Conference on Computational Molecular Biology (RECOMB), pages 231–236. ACM Press, 2000.
Google Scholar
J. Taylor and R. Johnson. Sequence database searches via de novo peptide sequencing by mass spectrometry. Rapid Communications in Mass Spectrometry, 11:1067–1075, 1997.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Celera Genomics, 45 West Gude Drive, Rockville, MD
Nathan Edwards & Ross Lippert

Authors

Nathan Edwards
View author publications
You can also search for this author in PubMed Google Scholar
Ross Lippert
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IMIM-UPF-CRG, Dr. Aiguader 80, 08003, Barcelona, Spain
Roderic Guigó
Department of Computer Science, University of California, 95616, Davis, CA, USA
Dan Gusfield

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Edwards, N., Lippert, R. (2002). Generating Peptide Candidates from Amino-Acid Sequence Databases for Protein Identification via Mass Spectrometry. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_6

Download citation

DOI: https://doi.org/10.1007/3-540-45784-4_6
Published: 10 October 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44211-0
Online ISBN: 978-3-540-45784-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics