Skip to main content
Log in

PRIME: A Mass Spectrum Data Mining Tool for De Nova Sequencing and PTMs Identification

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

sequencing is one of the most promising proteomics techniques for identification of protein post-translation modifications (PTMs) in studying protein regulations and functions. We have developed a computer tool PRIME for identification of b and y ions in tandem mass spectra, a key challenging problem in de novo sequencing. PRIME utilizes a feature that ions of the same and different types follow different mass-difference distributions to separate b from y ions correctly. We have formulated the problem as a graph partition problem. A linear integer-programming algorithm has been implemented to solve the graph partition problem rigorously and efficiently. The performance of PRIME has been demonstrated on a large amount of simulated tandem mass spectra derived from Yeast genome and its power of detecting PTMs has been tested on 216 simulated phosphopeptides.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gooley A A, Packer N H. The Importance of Co- and Post-Translational Modifications in Proteome Projects. Proteome Research: New Frontiers in Functional Genomics, Wilkins M R et al. (eds.), 1997, Springer-Verlag, pp.65-91.

  2. Mann M, Jensen O N. Proteomic analysis of post-translational modifications. Nat. Biotechnol., 2003, 21(3): 255–61.

    Article  Google Scholar 

  3. Jensen O N. Modification-specific proteomics: Characterization of post-translational modifications by mass spectrometry. Curr. Opin. Chem. Biol., 2004, 8(1): 33–41.

    Google Scholar 

  4. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature, 2003, 422(6928): 198–207.

    Google Scholar 

  5. MacCoss M J, McDonald W H, Saraf A et al. Shotgun identification of protein modifications from protein complexes and lens tissue. In Proc. Natl. Acad. Sci., U.S.A., 2002, 99(12): 7900–7905.

  6. Tabb D L, Smith L L, Breci L A et al. Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. Anal. Chem., 2003, 75(5): 1155–1163.

    Article  Google Scholar 

  7. Dancik V, Addona T A, Clauser K R, Vath J E, Pevzner P A. De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol., 1999, 6(3-4): 327–342.

    Article  Google Scholar 

  8. Taylor J A, Johnson R S. Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom, 1997, 11(9): 1067–1075.

    Article  Google Scholar 

  9. Pevzner P A, Dancik V, Tang C L. Mutation-tolerant protein identification by mass spectrometry. J. Comput. Biol., 2000, 7(6): 777-787.

    Article  Google Scholar 

  10. Chen T, Kao M Y, Tepel M et al. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol., 2001, 8(3): 325–337.

    Article  Google Scholar 

  11. Taylor J A, Johnson R S. Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem., 2001, 73(11): 2594–2604.

    Article  Google Scholar 

  12. Lu B, Chen T. A suboptimal algorithm for de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol., 2003, 10(1): 1–12.

    Article  Google Scholar 

  13. Ma B, Zhang K, Hendrie C et al. PEAKS: Powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom, 2003, 17(20): 2337–2342.

    Article  Google Scholar 

  14. Bartels C. Fast algorithm for peptide sequencing by mass spectroscopy. Biomed. Environ. Mass Spectrom, 1990, 19: 363–368.

    Article  Google Scholar 

  15. Fernandez-de-Cossio J, Gonzalez J, Betancourt L et al. Automated interpretation of high-energy collision-induced dissociation spectra of singly protonated peptides by “SeqMS”, a software aid for de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom, 1998, 12(23): 1867–1878.

    Article  Google Scholar 

  16. Yan B, Pan C, Olman V N, Hettich R L, Xu Y. A Graph-theoretic Approach to Separation of b and y Ions in Tandem Mass Spectra. In Proc. 2004 IEEE Computational Systems Bioinformatics (CSB), Stanford, USA, 2004, pp.236–244.

  17. Lougee-Heimer R. The common optimization interface for operations research: Promoting open-source software in the operations research community. In IBM Journal of Research and Development, 2003, pp.57–66.

  18. Ralphs T K, Ladányi L, Saltzman M J. Parallel branch, cut, and price for large-scale discrete optimization. Mathematical Programming, 2003, 98: 253–280.

    Article  MathSciNet  Google Scholar 

  19. Mann M, Ong S E, Gronborg M et al. Analysis of protein phosphorylation using mass spectrometry: Deciphering the phosphoproteome. Trends Biotechnol., 2002, 20(6): 261–268.

    Article  Google Scholar 

  20. Ficarro S B, McCleland M L, Stukenberg P T et al. Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Nat. Biotechnol., 2002, 20(3): 301-305.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Xu.

Additional information

This research was supported in part by the National Science Foundation of U.S.A (Grant Nos.NSF/DBI-0354771 and #NSF/ITR-IIS-0407204). It was also funded in part by the U.S. Department of Energy's Genomes to Life program (http://doegenomestolife.org/) under project, “Carbon Sequestration in Synechococcus sp.: From Molecular Machines to Hierarchical Modeling” (www.genomes2life.org).

Bo Yan received his Ph.D. degree in chemistry from Peking University. He is now working in the Computational Systems Biology Lab at University of Georgia, USA. His research interests include Monte Carlo simulations, graph theory, computational biology/chemistry and bioinformatics.

You-Xing Qu received his Ph.D. degree in biophysics from Peking University, China. Currently he is working in the Computational Systems Biology Lab at the University of Georgia, USA. His research interests include computational biology, protein folding, structural biology, and biophysics.

Feng-Lou Mao received his Ph.D. degree in computational chemistry from Peking University in 2001. He is now a postdoc researcher at University of Georgia, USA. His current research interests include bioinformatics, systems biology and computational biology.

Victor N. Olman is a Senior Research Scientist in Biochemistry and Molecular Biology Department of UGA. He got the Ph.D. degree in mathematics from S. Petersburg University, Russia. Right now his main interests are in the field of mathematical applications in bioinformatics that include methods of mathematical statistics, graph theory, simulation and modeling of dynamic systems. He is a member of American Statistical Association.

Ying Xu is a chair professor of bioinformatics and computational biology in the Biochemistry and Molecular Biology Department, and the director of the Institute of Bioinformatics, University of Georgia, USA. Before joining UGA in Sept 2003, he was a senior staff scientist and group leader at Oak Ridge National Laboratory, USA, where he still holds a joint position. He also holds guest or research professor positions at the University of Tennessee at Knoxville of USA, Jilin University and Zhejiang University of China, and an adjunct professor position in the Computer Science Department of UGA. Ying Xu received his undergraduate and graduate education in computer science from Jilin University, and Ph.D. degree in theoretical computer science from the University of Colorado at Boulder of USA in 1991. He is interested in both bioinformatics tool development and study of biological problems using in silico approaches. His current research interests include (a) computational inference and modeling of biological pathways and networks, (b) protein structure prediction and modeling, (c) large-scale biological data mining, and (d) microbial & cancer bioinformatics.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, B., Qu, YX., Mao, FL. et al. PRIME: A Mass Spectrum Data Mining Tool for De Nova Sequencing and PTMs Identification. J Comput Sci Technol 20, 483–490 (2005). https://doi.org/10.1007/s11390-005-0483-5

Download citation

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-005-0483-5

Navigation