Advertisement

Quantitative Proteomics Data in the Public Domain: Challenges and Opportunities

  • Andrew F. Jarnuczak
  • Tobias Ternent
  • Juan Antonio VizcaínoEmail author
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1977)

Abstract

Mass spectrometry based proteomics is no longer only a qualitative discipline, and can be successfully employed to obtain a truly multidimensional view of the proteome. In particular, systematic protein expression profiling is now a routine part of many studies in the field and beyond. The large growth in the number of quantitative studies is accompanied by a trend to share publicly the associated analysis results and the underlying raw data. This trend, established and strongly supported by public repositories such as the PRIDE database at the European Bioinformatics Institute, opens up enormous possibilities to explore the data beyond the original publications, for instance by reusing, reanalyzing, and performing different flavors of meta-analysis studies. To help researchers and scientists realize about this potential, here we describe the mainstream public proteomics resources containing quantitative proteomics data, including the processed analysis results and/or the underlying raw data. We then present and discuss the most important points to consider when attempting to (re)use proteomics data in the public domain. We conclude by highlighting potential pitfalls of (re)using quantitative data and discuss some of our own experiences in this context.

Key words

Mass spectrometry Data (re)analysis Quantitative proteomics Data repository PRIDE database 

Abbreviations

CPTAC

Clinical Proteomic Tumor Analysis Consortium

DDA

Data-dependent acquisition

DIA

Data-independent acquisition

iTRAQ

Isobaric tags for relative and absolute quantification

MIAME

Minimum information about a microarray experiment

MIAPE

Minimum information about a proteomics experiment

MRM

Multiple reaction monitoring

MS

Mass spectrometry

PRM

Parallel reaction monitoring

PSM

Peptide spectrum match

PTM

Posttranslational modification

PX

ProteomeXchange

SILAC

Stable isotope labeling by amino acids in cell culture

TMT

Tandem mass tag

Notes

Acknowledgements

The authors want to acknowledge financial support from the Wellcome Trust [grant numbers WT101477MA and 208391/Z/17/Z] and from EMBL core funds.

References

  1. 1.
    Larance M, Lamond AI (2015) Multidimensional proteomics for cell biology. Nat Rev Mol Cell Biol 16:269–280.  https://doi.org/10.1038/nrm3970CrossRefPubMedGoogle Scholar
  2. 2.
    Wang J, Mouradov D, Wang X et al (2017) Colorectal cancer cell line proteomes are representative of primary tumors and predict drug sensitivity. Gastroenterology 153:1082–1095.  https://doi.org/10.1053/j.gastro.2017.06.008CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Lawless C, Holman SW, Brownridge P et al (2016) Direct and absolute quantification of over 1800 yeast proteins via selected reaction monitoring. Mol Cell Proteomics 15:130–122.  https://doi.org/10.1074/mcp.M115.054288CrossRefGoogle Scholar
  4. 4.
    Lahtvee P-J, Sánchez BJ, Smialowska A et al (2017) Absolute quantification of protein and mRNA abundances demonstrate variability in gene-specific translation efficiency in yeast. Cell Syst 4:495–504.e5.  https://doi.org/10.1016/j.cels.2017.03.003CrossRefPubMedGoogle Scholar
  5. 5.
    Guo T, Kouvonen P, Koh CC et al (2015) Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps. Nat Med 21:407–413CrossRefGoogle Scholar
  6. 6.
    Kulak NA, Pichler G, Paron I et al (2014) Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat Methods 11:319–324.  https://doi.org/10.1038/nmeth.2834CrossRefPubMedGoogle Scholar
  7. 7.
    Navarro P, Kuharev J, Gillet LC et al (2016) A multicenter study benchmarks software tools for label-free proteome quantification. Nat Biotechnol 34:1130–1136.  https://doi.org/10.1038/nbt.3685CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Tyanova S, Temu T, Cox J (2016) The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat Protoc 11:2301–2319.  https://doi.org/10.1038/nprot.2016.136CrossRefPubMedGoogle Scholar
  9. 9.
    Hebert AS, Richards AL, Bailey DJ et al (2014) The one hour yeast proteome. Mol Cell Proteomics 13:339–347.  https://doi.org/10.1074/mcp.M113.034769CrossRefPubMedGoogle Scholar
  10. 10.
    Perry RH, Cooks RG, Noll RJ (2008) Orbitrap mass spectrometry: instrumentation, ion motion and applications. Mass Spectrom Rev 27:661–699.  https://doi.org/10.1002/mas.20186CrossRefPubMedGoogle Scholar
  11. 11.
    Vizcaíno JA, Csordas A, del-Toro N et al (2016) 2016 update of the PRIDE database and its related tools. Nucleic Acids Res 44:D447–D456.  https://doi.org/10.1093/nar/gkv1145CrossRefPubMedGoogle Scholar
  12. 12.
    Martens L, Hermjakob H, Jones P et al (2005) PRIDE: the proteomics identifications database. Proteomics 5:3537–3545.  https://doi.org/10.1002/pmic.200401303CrossRefPubMedGoogle Scholar
  13. 13.
    Deutsch EW, Csordas A, Sun Z et al (2017) The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res 45:D1100–D1106.  https://doi.org/10.1093/nar/gkw936CrossRefPubMedGoogle Scholar
  14. 14.
    Vizcaíno JA, Deutsch EW, Wang R et al (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 32:223–226.  https://doi.org/10.1038/nbt.2839CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Okuda S, Watanabe Y, Moriya Y et al (2017) jPOSTrepo: an international standard data repository for proteomes. Nucleic Acids Res 45:D1107–D1111.  https://doi.org/10.1093/nar/gkw1080CrossRefPubMedGoogle Scholar
  16. 16.
    Vaudel M, Verheggen K, Csordas A et al (2016) Exploring the potential of public proteomics data. Proteomics 16:214–225.  https://doi.org/10.1002/pmic.201500295CrossRefPubMedGoogle Scholar
  17. 17.
    Martens L, Vizcaíno JA (2017) A golden age for working with public proteomics data. Trends Biochem Sci 42:333–341.  https://doi.org/10.1016/j.tibs.2017.01.001CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Perez-Riverol Y, Alpi E, Wang R et al (2015) Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics 15:930–949.  https://doi.org/10.1002/pmic.201400302CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Craig R, Cortens JP, Beavis RC (2004) Open source system for analyzing, validating, and storing protein identification data. J Proteome Res 3:1234–1242.  https://doi.org/10.1021/PR049882HCrossRefPubMedGoogle Scholar
  20. 20.
    Desiere F, Deutsch EW, King NL et al (2006) The PeptideAtlas project. Nucleic Acids Res 34:D655–D658.  https://doi.org/10.1093/nar/gkj040CrossRefPubMedGoogle Scholar
  21. 21.
    Farrah T, Deutsch EW, Kreisberg R et al (2012) PASSEL: the PeptideAtlas SRMexperiment library. Proteomics 12:1170–1175.  https://doi.org/10.1002/pmic.201100515CrossRefPubMedGoogle Scholar
  22. 22.
    Jones AR, Eisenacher M, Mayer G et al (2012) The mzIdentML data standard for mass spectrometry-based proteomics results. Mol Cell Proteomics 11:M111.014381.  https://doi.org/10.1074/mcp.M111.014381CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Griss J, Jones AR, Sachsenberg T et al (2014) The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Mol Cell Proteomics 13:2765–2775.  https://doi.org/10.1074/mcp.O113.036681CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Martens L, Chambers M, Sturm M et al (2011) mzML—a community standard for mass spectrometry data. Mol Cell Proteomics 10:R110.000133.  https://doi.org/10.1074/mcp.R110.000133CrossRefPubMedGoogle Scholar
  25. 25.
    Perez-Riverol Y, Xu Q-W, Wang R et al (2016) PRIDE Inspector toolsuite: moving toward a universal visualization tool for proteomics data standard formats and quality assessment of ProteomeXchange datasets. Mol Cell Proteomics 15:305–317.  https://doi.org/10.1074/mcp.O115.050229CrossRefPubMedGoogle Scholar
  26. 26.
    Ellis MJ, Gillette M, Carr SA et al (2013) Connecting genomic alterations to cancer biology with proteomics: the NCI Clinical Proteomic Tumor Analysis Consortium. Cancer Discov 3:1108–1112.  https://doi.org/10.1158/2159-8290.CD-13-0219CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Zhang H, Liu T, Zhang Z et al (2016) Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166:755–765.  https://doi.org/10.1016/j.cell.2016.05.069CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Mertins P, Mani DR, Ruggles KV et al (2016) Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534:55–62CrossRefGoogle Scholar
  29. 29.
    Rudnick PA, Markey SP, Roth J et al (2016) A description of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) common data analysis pipeline. J Proteome Res 15:1023–1032.  https://doi.org/10.1021/acs.jproteome.5b01091CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Edwards NJ, Oberti M, Thangudu RR et al (2015) The CPTAC data portal: a resource for cancer proteomics research. J Proteome Res 14:2707–2713.  https://doi.org/10.1021/pr501254jCrossRefPubMedGoogle Scholar
  31. 31.
    Wilhelm M, Schlegl J, Hahne H et al (2014) Mass-spectrometry-based draft of the human proteome. Nature 509:582–587CrossRefGoogle Scholar
  32. 32.
    Zolg DP, Wilhelm M, Schnatbaum K et al (2017) Building ProteomeTools based on a complete synthetic human proteome. Nat Meth 14:259–262CrossRefGoogle Scholar
  33. 33.
    Perkins DN, Pappin DJC, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567.  https://doi.org/10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2CrossRefPubMedGoogle Scholar
  34. 34.
    Fagerberg L, Hallström BM, Oksvold P et al (2014) Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics 13:397–406.  https://doi.org/10.1074/mcp.M113.035600CrossRefPubMedGoogle Scholar
  35. 35.
    Kim M-S, Pinto SM, Getnet D et al (2014) A draft map of the human proteome. Nature 509:575–581CrossRefGoogle Scholar
  36. 36.
    Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989.  https://doi.org/10.1016/1044-0305(94)80016-2CrossRefPubMedGoogle Scholar
  37. 37.
    Wang M, Weiss M, Simonovic M et al (2012) PaxDb, a database of protein abundance averages across all three domains of life. Mol Cell Proteomics 11:492–500.  https://doi.org/10.1074/mcp.O111.014704CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Szklarczyk D, Franceschini A, Wyder S et al (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43:D447–D452.  https://doi.org/10.1093/nar/gku1003CrossRefPubMedGoogle Scholar
  39. 39.
    Wang M, Herrmann CJ, Simonovic M et al (2015) Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 15:3163–3168.  https://doi.org/10.1002/pmic.201400441CrossRefPubMedGoogle Scholar
  40. 40.
    Schaab C, Geiger T, Stoehr G et al (2012) Analysis of high accuracy, quantitative proteomics data in the MaxQB database. Mol Cell Proteomics 11:M111.014068.  https://doi.org/10.1074/mcp.M111.014068CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26:1367–1372.  https://doi.org/10.1038/nbt.1511CrossRefPubMedGoogle Scholar
  42. 42.
    Conesa A, Madrigal P, Tarazona S et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:13.  https://doi.org/10.1186/s13059-016-0881-8CrossRefPubMedPubMedCentralGoogle Scholar
  43. 43.
    Bittremieux W, Meysman P, Martens L et al (2016) Unsupervised quality assessment of mass spectrometry proteomics experiments by multivariate quality control metrics. J Proteome Res 15:1300–1307.  https://doi.org/10.1021/acs.jproteome.6b00028CrossRefPubMedGoogle Scholar
  44. 44.
    Bittremieux W, Walzer M, Tenzer S et al (2017) The human proteome organization-proteomics standards initiative quality control working group: making quality control more accessible for biological mass spectrometry. Anal Chem 89:4474–4479.  https://doi.org/10.1021/acs.analchem.6b04310CrossRefPubMedGoogle Scholar
  45. 45.
    Bantscheff M, Lemeer S, Savitski MM, Kuster B (2012) Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal Bioanal Chem 404:939–965.  https://doi.org/10.1007/s00216-012-6203-4CrossRefPubMedGoogle Scholar
  46. 46.
    Domon B, Aebersold R (2010) Options and considerations when selecting a quantitative proteomics strategy. Nat Biotechnol 28:710–721CrossRefGoogle Scholar
  47. 47.
    Shi T, Song E, Nie S et al (2016) Advances in targeted proteomics and applications to biomedical research. Proteomics 16:2160–2182.  https://doi.org/10.1002/pmic.201500449CrossRefPubMedPubMedCentralGoogle Scholar
  48. 48.
    Hu A, Noble WS, Wolf-Yadlin A (2016) Technical advances in proteomics: new developments in data-independent acquisition. F1000Res 5. pii: F1000 Faculty Rev-419.  https://doi.org/10.12688/f1000research.7042.1CrossRefGoogle Scholar
  49. 49.
    Yates JR, Ruse CI, Nakorchevsky A (2009) Proteomics by mass spectrometry: approaches, advances, and applications. Annu Rev Biomed Eng 11:49–79.  https://doi.org/10.1146/annurev-bioeng-061008-124934CrossRefPubMedGoogle Scholar
  50. 50.
    Tu C, Sheng Q, Li J et al (2015) Optimization of search engines and postprocessing approaches to maximize peptide and protein identification for high-resolution mass data. J Proteome Res 14:4662–4673.  https://doi.org/10.1021/acs.jproteome.5b00536CrossRefPubMedPubMedCentralGoogle Scholar
  51. 51.
    Shteynberg D, Nesvizhskii AI, Moritz RL, Deutsch EW (2013) Combining results of multiple search engines in proteomics. Mol Cell Proteomics 12:2383–2393.  https://doi.org/10.1074/mcp.R113.027797CrossRefPubMedPubMedCentralGoogle Scholar
  52. 52.
    Ting L, Cowley MJ, Hoon SL et al (2009) Normalization and statistical analysis of quantitative proteomics data generated by metabolic labeling. Mol Cell Proteomics 8:2227–2242.  https://doi.org/10.1074/mcp.M800462-MCP200CrossRefPubMedPubMedCentralGoogle Scholar
  53. 53.
    Karpievitch YV, Dabney AR, Smith RD (2012) Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 13(Suppl 16):S5.  https://doi.org/10.1186/1471-2105-13-S16-S5CrossRefPubMedPubMedCentralGoogle Scholar
  54. 54.
    Välikangas T, Suomi T, Elo LL et al (2016) A systematic evaluation of normalization methods in quantitative label-free proteomics. Brief Bioinform 86:bbw095.  https://doi.org/10.1093/bib/bbw095CrossRefGoogle Scholar
  55. 55.
    Arike L, Valgepea K, Peil L et al (2012) Comparison and applications of label-free absolute proteome quantification methods on Escherichia coli. J Proteome 75:5437–5448.  https://doi.org/10.1016/j.jprot.2012.06.020CrossRefGoogle Scholar
  56. 56.
    Taverner T, Karpievitch YV, Polpitiya AD et al (2012) DanteR: an extensible R-based tool for quantitative analysis of -omics data. Bioinformatics 28:2404–2406.  https://doi.org/10.1093/bioinformatics/bts449CrossRefPubMedPubMedCentralGoogle Scholar
  57. 57.
    Chawade A, Alexandersson E, Levander F (2014) Normalyzer: a tool for rapid evaluation of normalization methods for omics data sets. J Proteome Res 13:3114–3120.  https://doi.org/10.1021/pr401264nCrossRefPubMedPubMedCentralGoogle Scholar
  58. 58.
    Pedrioli PGA, Eng JK, Hubley R et al (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 22:1459–1466.  https://doi.org/10.1038/nbt1031CrossRefPubMedGoogle Scholar
  59. 59.
    Chambers MC, Maclean B, Burke R et al (2012) A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol 30:918–920.  https://doi.org/10.1038/nbt.2377CrossRefPubMedPubMedCentralGoogle Scholar
  60. 60.
    Kessner D, Chambers M, Burke R et al (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24:2534–2536.  https://doi.org/10.1093/bioinformatics/btn323CrossRefPubMedPubMedCentralGoogle Scholar
  61. 61.
    Perez-Riverol Y, Wang R, Hermjakob H et al (2014) Open source libraries and frameworks for mass spectrometry based proteomics: a developer’s perspective. Biochim Biophys Acta 1844:63–76.  https://doi.org/10.1016/j.bbapap.2013.02.032CrossRefPubMedPubMedCentralGoogle Scholar
  62. 62.
    Walzer M, Qi D, Mayer G et al (2013) The mzQuantML data standard for mass spectrometry-based quantitative studies in proteomics. Mol Cell Proteomics 12:2332–2340.  https://doi.org/10.1074/mcp.O113.028506CrossRefPubMedPubMedCentralGoogle Scholar
  63. 63.
    Jarnuczak AF, Lee DCH, Lawless C et al (2016) Analysis of intrinsic peptide detectability via integrated label-free and SRM-based absolute quantitative proteomics. J Proteome Res 15:2945–2959.  https://doi.org/10.1021/acs.jproteome.6b00048CrossRefPubMedGoogle Scholar
  64. 64.
    Falick AM, Lane WS, Lilley KS et al (2011) ABRF-PRG07: advanced quantitative proteomics study. J Biomol Tech 22:21–26PubMedPubMedCentralGoogle Scholar
  65. 65.
    Selevsek N, Chang C-Y, Gillet LC et al (2015) Reproducible and consistent quantification of the Saccharomyces cerevisiae proteome by SWATH-mass spectrometry. Mol Cell Proteomics 14:739–749.  https://doi.org/10.1074/mcp.M113.035550CrossRefPubMedPubMedCentralGoogle Scholar
  66. 66.
    Lee MV, Topper SE, Hubler SL et al (2011) A dynamic model of proteome changes reveals new roles for transcript alteration in yeast. Mol Syst Biol 7:514.  https://doi.org/10.1038/msb.2011.48CrossRefPubMedPubMedCentralGoogle Scholar
  67. 67.
    Goveia J, Pircher A, Conradi L et al (2016) Meta-analysis of clinical metabolic profiling studies in cancer: challenges and opportunities. EMBO Mol Med 8:1134–1142CrossRefGoogle Scholar
  68. 68.
    Griss J, Perez-Riverol Y, Hermjakob H, Vizcaíno JA (2015) Identifying novel biomarkers through data mining—a realistic scenario? Proteomics Clin Appl 9:437–443.  https://doi.org/10.1002/prca.201400107CrossRefPubMedPubMedCentralGoogle Scholar
  69. 69.
    Brazma A, Hingamp P, Quackenbush J et al (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29:365–371.  https://doi.org/10.1038/ng1201-365CrossRefPubMedGoogle Scholar
  70. 70.
    Taylor CF, Paton NW, Lilley KS et al (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25:887–893.  https://doi.org/10.1038/nbt1329CrossRefPubMedGoogle Scholar
  71. 71.
    Martínez-Bartolomé S, Deutsch EW, Binz P-A et al (2013) Guidelines for reporting quantitative mass spectrometry based experiments in proteomics. J Proteome 95:84–88.  https://doi.org/10.1016/j.jprot.2013.02.026CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Andrew F. Jarnuczak
    • 1
  • Tobias Ternent
    • 1
  • Juan Antonio Vizcaíno
    • 1
    Email author
  1. 1.European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL-EBI)CambridgeUK

Personalised recommendations