Skip to main content

A Better Scoring Model for De Novo Peptide Sequencing: The Symmetric Difference Between Explained and Measured Masses

  • Conference paper
  • First Online:
  • 1371 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9838))

Abstract

Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the ideal de novo peptide sequencing problem: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. The (real, noisy) de novo peptide sequencing problem therefore has a noisy input: a few of the prefix and suffix masses of the peptide are missing and a few others are given in addition. For this setting we ask for an amino acid string that explains the given masses as accurately as possible. Past approaches interpreted accuracy by searching for a string that explains as many masses as possible. We feel, however, that it is not only bad to not explain a mass that appears, but also to explain a mass that does not appear. That is, we propose to minimize the symmetric difference between the set of given masses and the set of masses that the string explains. For this new optimization problem, we propose an efficient algorithm that computes both the best and the k best solutions. Experiments on measurements of 342 synthesized peptides show that our approach leads to better results compared to finding a string that explains as many given masses as possible.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chen, T., Kao, M.-Y., Tepel, M., Rush, J., Church, G.M.: A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000) (2000). (Conference version of [2])

    Google Scholar 

  2. Chen, T., Kao, M.-Y., Tepel, M., Rush, J., Church, G.M.: A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 8(3), 325–337 (2001). (Journal version of [1])

    Article  MATH  Google Scholar 

  3. Dančík, V., Addona, T.A., Clauser, K.R., Vath, J.E., Pevzner, P.A.: De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 6(3–4), 327–342 (1999)

    Google Scholar 

  4. Eppstein, D.: Finding the k shortest paths. SIAM J. Comput. 28(2), 652–673 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  5. Gabow, H., Maheshwari, S., Osterweil, L.: On two problems in the generation of program test paths. IEEE Trans. Softw. Eng. SE–2(3), 227–231 (1976)

    Article  MathSciNet  Google Scholar 

  6. Hughes, C., Ma, B., Lajoie, G.A.: De novo sequencing methods in proteomics. Proteome Bioinf. 604, 105–121 (2010)

    Article  Google Scholar 

  7. Jeong, K., Kim, S., Pevzner, P.A.: UniNovo: a universal tool for de novo peptide sequencing. Bioinformatics 29(16), 1953–1962 (2013). (Oxford, England)

    Article  Google Scholar 

  8. Kinter, M., Sherman, N.E.: Protein Sequencing and Identication Using Tandem Mass Spectrometry. Wiley-Interscience, New York (2000)

    Book  Google Scholar 

  9. Lu, B., Chen, T.: A suboptimal algorithm for de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 10(1), 1–12 (2003)

    Article  Google Scholar 

  10. Ma, B., Zhang, K., Liang, C.: An effctive algorithm for the peptide de novo sequencing from MS/MS spectrum. Comb. Pattern Matching 2676, 266–277 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  11. Mo, L., Dutta, D., Wan, Y., Chen, T.: MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry. Anal. Chem. 79(13), 4870–4878 (2007)

    Article  Google Scholar 

  12. Röst, H.L., Rosenberger, G., Navarro, P., Gillet, L., Miladinović, S.M., Schubert, O.T., Wolski, W., Collins, B.C., Malmström, J., Malmström, L., Aebersold, R.: OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32(3), 219–223 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank Tomas Hruz, George Rosenberger, and Hannes Röst for helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Tschager .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gillet, L., Rösch, S., Tschager, T., Widmayer, P. (2016). A Better Scoring Model for De Novo Peptide Sequencing: The Symmetric Difference Between Explained and Measured Masses. In: Frith, M., Storm Pedersen, C. (eds) Algorithms in Bioinformatics. WABI 2016. Lecture Notes in Computer Science(), vol 9838. Springer, Cham. https://doi.org/10.1007/978-3-319-43681-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43681-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43680-7

  • Online ISBN: 978-3-319-43681-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics