Computational Proteomics

  • Debasree Sarkar
  • Sudipto Saha


Mass Spectrometry (MS) based high throughput proteomics generates huge amount of data, which necessitates the use of computational tools and statistical software for interpreting their biological significance. Herein, we have explored the application of computational proteomics in the bottom-up approach for MS-based protein identification and quantitation. Commonly used scoring systems for interaction proteomics and various tools used in metaproteomic analyses have also been documented. Finally, community standards for proteomics data handling and publicly available proteomics data repositories have been discussed.


Proteomics Data Protein Sequence Database Spectral Count Shotgun Proteomics Human Protein Reference Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Colinge J, Bennett KL (2007) Introduction to computational proteomics. PLoS Comput Biol 3(7):e114CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Nilsson T, Mann M, Aebersold R et al (2010) Mass spectrometry in high-throughput proteomics: ready for the big time. Nat Methods 7(9):681–685. doi: 10.1038/nmeth0910-681 CrossRefPubMedGoogle Scholar
  3. 3.
    Cottrell JS (2011) Protein identification using MS/MS data. J Proteomics 74(10):1842–1851. doi: 10.1016/j.jprot.2011.05.014 CrossRefPubMedGoogle Scholar
  4. 4.
    Perkins DN, Pappin DJ, Creasy DM et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18):3551–3567CrossRefPubMedGoogle Scholar
  5. 5.
    Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5(11):976–989. doi: 10.1016/1044-0305(94)80016-2 CrossRefPubMedGoogle Scholar
  6. 6.
    Fenyö D, Beavis RC (2003) A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal Chem 75(4):768–774CrossRefPubMedGoogle Scholar
  7. 7.
    Searle BC (2010) Scaffold: a bioinformatic tool for validating MS/MS-based proteomic studies. Proteomics 10(6):1265–1269. doi: 10.1002/pmic.200900437 CrossRefPubMedGoogle Scholar
  8. 8.
    Neubert H, Bonnert TP, Rumpel K et al (2008) Label-free detection of differential protein expression by LC/MALDI mass spectrometry. J Proteome Res 7(6):2270–2279. doi: 10.1021/pr700705u CrossRefPubMedGoogle Scholar
  9. 9.
    Nesvizhskii AI, Aebersold R (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics 4(10):1419–1440CrossRefPubMedGoogle Scholar
  10. 10.
    Geer LY, Markey SP, Kowalak JA (2004) Open mass spectrometry search algorithm. J Proteome Res 3(5):958–964CrossRefPubMedGoogle Scholar
  11. 11.
    Rabilloud T, Lelong C (2011) Two-dimensional gel electrophoresis in proteomics: a tutorial. J Proteomics 74(10):1829–1841. doi: 10.1016/j.jprot.2011.05.040 CrossRefPubMedGoogle Scholar
  12. 12.
    Paoletti AC, Parmely TJ, Tomomori-Sato C (2006) Quantitative proteomic analysis of distinct mammalian mediator complexes using normalized spectral abundance factors. Proc Natl Acad Sci 103(50):18928–18933CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Lu P, Vogel C, Wang R (2007) Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol 25(1):117–124CrossRefPubMedGoogle Scholar
  14. 14.
    Ntai I, Kim K, Fellers RT et al (2014) Applying label-free quantitation to top down proteomics. Anal Chem 86(10):4961–4968. doi: 10.1021/ac500395k CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Müller T, Schrötter A, Loosse C et al (2011) Sense and nonsense of pathway analysis software in proteomics. J Proteome Res 10(12):5398–5408. doi: 10.1021/pr200654k CrossRefPubMedGoogle Scholar
  16. 16.
    Nikitin A, Egorov S, Daraselia N et al (2003) Pathway studio – the analysis and navigation of molecular networks. Bioinformatics 19(16):2155–2157CrossRefPubMedGoogle Scholar
  17. 17.
    Kim MS, Pinto SM, Getnet D et al (2014) A draft map of the human proteome. Nature 509(7502):575–581. doi: 10.1038/nature13302 CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Saha S, Kaur P, Ewing RM (2010) The bait compatibility index: computational bait selection for interaction proteomics experiments. J Proteome Res 9(10):4972–4981. doi: 10.1021/pr100267t CrossRefPubMedGoogle Scholar
  19. 19.
    Gavin AC, Aloy P, Grandi P et al (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature 440(7084):631–636CrossRefPubMedGoogle Scholar
  20. 20.
    Sardiu ME, Cai Y, Jin J (2008) Probabilistic assembly of human protein interaction networks from label-free quantitative proteomics. Proc Natl Acad Sci 105(5):1454–1459. doi: 10.1073/pnas.0706983105 CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Dazard JE, Saha S, Ewing RM (2012) ROCS: a reproducibility index and confidence score for interaction proteomics studies. BMC Bioinforma 13:128. doi: 10.1186/1471-2105-13-128 CrossRefGoogle Scholar
  22. 22.
    Sowa ME, Bennett EJ, Gygi SP et al (2009) Defining the human deubiquitinating enzyme interaction landscape. Cell 138(2):389–403. doi: 10.1016/j.cell.2009.04.042 CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Mellacheruvu D, Wright Z, Couzens AL et al (2013) The CRAPome: a contaminant repository for affinity purification-mass spectrometry data. Nat Methods 10(8):730–736. doi: 10.1038/nmeth.2557 CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Choi H, Larsen B, Lin ZY et al (2011) SAINT: probabilistic scoring of affinity purification-mass spectrometry data. Nat Methods 8(1):70–73. doi: 10.1038/nmeth.1541 CrossRefPubMedGoogle Scholar
  25. 25.
    Teo G, Liu G, Zhang J et al (2014) SAINTexpress: improvements and additional features in Significance Analysis of INTeractome software. J Proteomics 100:37–43. doi: 10.1016/j.jprot.2013.10.023 CrossRefPubMedGoogle Scholar
  26. 26.
    Mathivanan S, Periaswamy B, Gandhi TK et al (2006) An evaluation of human protein-protein interaction data in the public domain. BMC Bioinforma 7(5):S19CrossRefGoogle Scholar
  27. 27.
    Goel R, Muthusamy B, Pandey A et al (2011) Human protein reference database and human proteinpedia as discovery resources for molecular biotechnology. Mol Biotechnol 48(1):87–95. doi: 10.1007/s12033-010-9336-8 CrossRefPubMedGoogle Scholar
  28. 28.
    Ruepp A, Waegele B, Lechner M et al (2010) CORUM: the comprehensive resource of mammalian protein complexes – 2009. Nucleic Acids Res 38(Database issue):D497–D501. doi: 10.1093/nar/gkp914 CrossRefPubMedGoogle Scholar
  29. 29.
    Orchard S, Ammari M, Aranda B et al (2014) The MIntAct project – IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 42(Database issue):D358–D363. doi: 10.1093/nar/gkt1115 CrossRefPubMedGoogle Scholar
  30. 30.
    Salwinski L, Miller CS, Smith AJ et al (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res 32(Database issue):D449–D451CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Oughtred R, Chatr-Aryamontri A, Breitkreutz BJ (2016) BioGRID: a resource for studying biological interactions in yeast. Cold Spring Harb Protoc 2016(1):pdb.top080754. doi: 10.1101/pdb.top080754
  32. 32.
    Szklarczyk D, Franceschini A, Wyder S et al (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43(Database issue):D447–D452. doi: 10.1093/nar/gku1003 CrossRefPubMedGoogle Scholar
  33. 33.
    Huttlin EL, Ting L, Bruckner RJ et al (2015) The BioPlex network: a systematic exploration of the human interactome. Cell 162:425–440CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Yang X, Boehm JS, Yang X et al (2011) A public genome-scale lentiviral expression library of human ORFs. Nat Methods 8(8):659–661. doi: 10.1038/nmeth.1638 CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Hettich RL, Pan C, Chourey K et al (2013) Metaproteomics: harnessing the power of high performance mass spectrometry to identify the suite of proteins that control metabolic activities in microbial communities. Anal Chem 85(9):4203–4214. doi: 10.1021/ac303053e CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Abraham PE, Giannone RJ, Xiong W et al (2014) Metaproteomics: extracting and mining proteome information to characterize metabolic activities in microbial communities. Curr Protoc Bioinformatics 46:13.26:13.26.1–13.26.14Google Scholar
  37. 37.
    Mesuere B, Debyser G, Aerts M et al (2015) The Unipept metaproteomics analysis pipeline. Proteomics 15(8):1437–1442. doi: 10.1002/pmic.201400361 CrossRefPubMedGoogle Scholar
  38. 38.
    Muth T, Behne A, Heyer R et al (2015) The MetaProteomeAnalyzer: a powerful open-source software suite for metaproteomics data analysis and interpretation. J Proteome Res 14(3):1557–1565. doi: 10.1021/pr501246w CrossRefPubMedGoogle Scholar
  39. 39.
    Penzlin A, Lindner MS, Doellinger J et al (2014) Pipasic: similarity and expression correction for strain-level identification and quantification in metaproteomics. Bioinformatics 30(12):i149–i156. doi: 10.1093/bioinformatics/btu267 CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Taylor CF, Paton NW, Lilley KS et al (2007) The minimum information about a proteomics experiment (MIAPE). Biotechnology 25(8):887–893Google Scholar
  41. 41.
    Hermjakob H, Montecchi-Palazzi L, Bader G et al (2004) The HUPO PSI’s molecular interaction format – a community standard for the representation of protein interaction data. Nat Biotechnol 22(2):177–183CrossRefPubMedGoogle Scholar
  42. 42.
    Demir E, Cary MP, Paley S et al (2010) The BioPAX community standard for pathway data sharing. Nat Biotechnol 28(9):935–942. doi: 10.1038/nbt.1666 CrossRefPubMedPubMedCentralGoogle Scholar
  43. 43.
    Riffle M, Eng JK (2009) Proteomics data repositories. Proteomics 9(20):4653–4663. doi: 10.1002/pmic.200900216 CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Vizcaíno JA, Côté RG, Csordas A et al (2013) The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res 41(Database issue):D1063–D1069. doi: 10.1093/nar/gks1262 CrossRefPubMedGoogle Scholar
  45. 45.
    Smith BE, Hill JA, Gjukich MA (2011) Tranche distributed repository and Methods Mol Biol 696:123–145CrossRefPubMedGoogle Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  1. 1.Centre of Excellence in BioinformaticsBose InstituteKolkataIndia

Personalised recommendations