Skip to main content

Review of Issues and Solutions to Data Analysis Reproducibility and Data Quality in Clinical Proteomics

  • Protocol
  • First Online:
Mass Spectrometry Data Analysis in Proteomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2051))

Abstract

In any analytical discipline, data analysis reproducibility is closely interlinked with data quality. In this book chapter focused on mass spectrometry-based proteomics approaches, we introduce how both data analysis reproducibility and data quality can influence each other and how data quality and data analysis designs can be used to increase robustness and improve reproducibility. We first introduce methods and concepts to design and maintain robust data analysis pipelines such that reproducibility can be increased in parallel. The technical aspects related to data analysis reproducibility are challenging, and current ways to increase the overall robustness are multifaceted. Software containerization and cloud infrastructures play an important part.

We will also show how quality control (QC) and quality assessment (QA) approaches can be used to spot analytical issues, reduce the experimental variability, and increase confidence in the analytical results of (clinical) proteomics studies, since experimental variability plays a substantial role in analysis reproducibility. Therefore, we give an overview on existing solutions for QC/QA, including different quality metrics, and methods for longitudinal monitoring. The efficient use of both types of approaches undoubtedly provides a way to improve the experimental reliability, reproducibility, and level of consistency in proteomics analytical measurements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

BSA:

Bovine serum albumin

CWL:

Common workflow language

DAC:

Data Access Compliance

DACO:

Data Access Compliance Office

DDA:

Data-dependent acquisition

DIA:

Data-independent acquisition

FDR:

False discovery rate

GUI:

Graphical user interface

HPC:

High-performance computing

HUPO:

Human Proteome Organization

LC:

Liquid chromatography

LCL:

Lower control level

MS:

Mass spectrometry

PCAWG:

Pan-cancer analysis of whole genomes

PSI:

Proteomics Standards Initiative

QA:

Quality assessment

QC:

Quality control

SOP:

Standard operating procedure

SPC:

Statistical process control

SRM:

Selected reaction monitoring

UCL:

Upper control level

WMS:

Workflow management system

References

  1. Meo AD et al (2014) What is wrong with clinical proteomics? Clin Chem 60:1258ā€“1266

    ArticleĀ  Google ScholarĀ 

  2. Foster JM et al (2011) A posteriori quality control for the curation and reuse of public proteomics data. Proteomics 11(11):2182ā€“2194

    ArticleĀ  CASĀ  Google ScholarĀ 

  3. Klont F et al (2018) Assessment of sample preparation bias in mass spectrometry-based proteomics. Anal Chem 90:5405ā€“5413

    ArticleĀ  CASĀ  Google ScholarĀ 

  4. Apweiler R et al (2009) Approaching clinical proteomics: current state and future fields of application in fluid proteomics. Clin Chem Lab Med 47:724ā€“744

    ArticleĀ  CASĀ  Google ScholarĀ 

  5. Cairns DA et al (2008) Integrated multi-level quality control for proteomic profiling studies using mass spectrometry. BMC Bioinformatics 9:519

    ArticleĀ  Google ScholarĀ 

  6. Dogu E et al (2017) MSstatsQC: longitudinal system suitability monitoring and quality control for targeted proteomic experiments. Mol Cell Proteomics 16:1335ā€“1347

    ArticleĀ  CASĀ  Google ScholarĀ 

  7. Clough T et al (2012) Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs. BMC Bioinformatics 13(Suppl 1):S6

    ArticleĀ  CASĀ  Google ScholarĀ 

  8. Piehowski PD et al (2013) Sources of technical variability in quantitative LCāˆ’MS proteomics: human brain tissue sample analysis. J Proteome Res 12(5):2128ā€“2137

    ArticleĀ  CASĀ  Google ScholarĀ 

  9. Villanueva J, Carrascal M, Abian J (2014) Isotope dilution mass spectrometry for absolute quantification in proteomics: concepts and strategies. J Proteome 96:184ā€“199

    ArticleĀ  CASĀ  Google ScholarĀ 

  10. Easing the burden of code review (2018) Nat Methods 15(9):641

    Google ScholarĀ 

  11. Kanwal S et al (2017) Investigating reproducibility and tracking provenance - a genomic workflow case study. BMC Bioinformatics 18:1ā€“14

    ArticleĀ  Google ScholarĀ 

  12. Leprevost FD et al (2017) BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33(16):2580ā€“2582

    ArticleĀ  Google ScholarĀ 

  13. Barsnes H, Vaudel M (2018) SearchGUI: a highly adaptable common interface for proteomics search and de novo engines. J Proteome Res 17(7):2552ā€“2555

    ArticleĀ  CASĀ  Google ScholarĀ 

  14. Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26:1367ā€“1372

    ArticleĀ  CASĀ  Google ScholarĀ 

  15. Pluskal T et al (2010) MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 11:395

    ArticleĀ  Google ScholarĀ 

  16. Kessner D et al (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24(21):2534ā€“2536

    ArticleĀ  CASĀ  Google ScholarĀ 

  17. Prince JT, Marcotte EM (2008) mspire: mass spectrometry proteomics in ruby. Bioinformatics 24(23):2796ā€“2797

    ArticleĀ  CASĀ  Google ScholarĀ 

  18. Lopez-Fernandez H et al (2015) Mass-Up: an all-in-one open software application for MALDI-TOF mass spectrometry knowledge discovery. BMC Bioinformatics 16:318

    ArticleĀ  CASĀ  Google ScholarĀ 

  19. KƤll L, Canterbury J, Weston J (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nature 4:923ā€“925

    Google ScholarĀ 

  20. Rƶst HL et al (2016) OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods 13:741ā€“748

    ArticleĀ  Google ScholarĀ 

  21. Ison J et al (2016) Tools and data services registry: a community effort to document bioinformatics resources. Nucleic Acids Res 44:D38ā€“D47

    ArticleĀ  CASĀ  Google ScholarĀ 

  22. Deutsch EW et al (2015) Development of data representation standards by the human proteome organization proteomics standards initiative. J Am Med Inform Assoc 22(3):495ā€“506

    PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  23. Deutsch EW et al (2017) Proteomics standards initiative: fifteen years of progress and future work. J Proteome Res 16:4288ā€“4298

    ArticleĀ  CASĀ  Google ScholarĀ 

  24. Orchard S, Hermjakob H, Apweiler R (2003) The proteomics standards initiative. Proteomics 3:1374ā€“1376

    ArticleĀ  CASĀ  Google ScholarĀ 

  25. Martens L et al (2011) mzMLā€”a community standard for mass spectrometry data. Mol Cell Proteomics 10:R110.000133

    ArticleĀ  Google ScholarĀ 

  26. Martens L, VizcaĆ­no JA, Banks R (2011) Quality control in proteomics. Proteomics 11:1015ā€“1016

    ArticleĀ  CASĀ  Google ScholarĀ 

  27. Perkins DN et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551ā€“3567

    ArticleĀ  CASĀ  Google ScholarĀ 

  28. Eng JK, Jahan TA, Hoopmann MR (2013) Comet: an open-source MS/MS sequence database search tool. Proteomics 13:22ā€“24

    ArticleĀ  CASĀ  Google ScholarĀ 

  29. Kim S, Pevzner PA (2014) MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 5:5277

    ArticleĀ  CASĀ  Google ScholarĀ 

  30. Fenyƶ D, Beavis RC (2003) A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal Chem 75:768ā€“774

    ArticleĀ  Google ScholarĀ 

  31. Jones AR et al (2012) The mzIdentML data standard for mass spectrometry-based proteomics results. Mol Cell Proteomics 11:M111.014381

    ArticleĀ  Google ScholarĀ 

  32. VizcaĆ­no JA et al (2017) The mzIdentML data standard version 1.2, supporting advances in proteome informatics. Mol Cell Proteomics 16:1275ā€“1285

    ArticleĀ  Google ScholarĀ 

  33. Griss J et al (2014) The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Mol Cell Proteomics 13:2765

    ArticleĀ  CASĀ  Google ScholarĀ 

  34. Walzer M et al (2013) The mzQuantML data standard for mass spectrometry-based quantitative studies in proteomics. Mol Cell Proteomics 12:2332ā€“2340

    ArticleĀ  CASĀ  Google ScholarĀ 

  35. Walzer M et al (2014) qcML: an exchange format for quality control metrics from mass spectrometry experiments. Mol Cell Proteomics 13:1905ā€“1913

    ArticleĀ  CASĀ  Google ScholarĀ 

  36. Xu T et al (2015) ProLuCID: an improved SEQUEST-like algorithm with enhanced sensitivity and specificity. J Proteome 129:16ā€“24

    ArticleĀ  CASĀ  Google ScholarĀ 

  37. Zhang J et al (2012) PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics 11:M111.010587

    ArticleĀ  Google ScholarĀ 

  38. Vaudel M et al (2015) PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat Biotechnol 33:22ā€“24

    ArticleĀ  CASĀ  Google ScholarĀ 

  39. Searle BC (2010) Scaffold: a bioinformatic tool for validating MS/MS-based proteomic studies. Proteomics 10(6):1265ā€“1269

    ArticleĀ  CASĀ  Google ScholarĀ 

  40. Amstutz P et al (2016) Common workflow language, v1.0

    Google ScholarĀ 

  41. Afgan E et al (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 46:W537ā€“W544

    ArticleĀ  CASĀ  Google ScholarĀ 

  42. Berthold MR et al (2009) KNIME - the Konstanz information miner. ACM SIGKDD Explor Newsl 11:26

    ArticleĀ  Google ScholarĀ 

  43. Gillet LCL et al (2012) Targeted data extraction of the MS/MS spectra generated by data independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 11:1ā€“45

    ArticleĀ  Google ScholarĀ 

  44. Rƶst HL et al (2014) OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat Biotechnol 32:219ā€“223

    ArticleĀ  Google ScholarĀ 

  45. Collins BC et al (2017) Multi-laboratory assessment of reproducibility, qualitative and quantitative performance of SWATH-mass spectrometry. Nat Commun 8:1ā€“11

    ArticleĀ  Google ScholarĀ 

  46. Moreno P et al (2018) Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud. bioRxiv. Preprint

    Google ScholarĀ 

  47. Peters K et al (2018) PhenoMeNal: processing and analysis of Metabolomics data in the Cloud. bioRxiv. Preprint

    Google ScholarĀ 

  48. Albar JP, Canals F (2013) Standardization and quality control in proteomics. J Proteome 95:1ā€“2

    ArticleĀ  CASĀ  Google ScholarĀ 

  49. Tabb DDL et al (2010) Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J Proteome Res 9:761ā€“776

    ArticleĀ  CASĀ  Google ScholarĀ 

  50. Bateman A et al (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158ā€“D169

    ArticleĀ  CASĀ  Google ScholarĀ 

  51. Tabb DL (2013) Quality assessment for clinical proteomics. Clin Biochem 46:411ā€“420

    ArticleĀ  CASĀ  Google ScholarĀ 

  52. Rodriguez H, Pennington SR (2018) Revolutionizing precision oncology through collaborative proteogenomics and data sharing. Cell 173:535ā€“539

    ArticleĀ  CASĀ  Google ScholarĀ 

  53. Wang X et al (2014) QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statistics. Anal Chem 86:2497ā€“2509

    ArticleĀ  CASĀ  Google ScholarĀ 

  54. Bittremieux W et al (2015) iMonDB: mass spectrometry quality control through instrument monitoring. J Proteome Res 2015:150323163122004

    Google ScholarĀ 

  55. Ma ZQ et al (2012) QuaMeter: multivendor performance metrics for LC-MS/MS proteomics instrumentation. Anal Chem 84:5845ā€“5850

    ArticleĀ  CASĀ  Google ScholarĀ 

  56. Gatto L, Wen B (2018) proteoQC: an R package for proteomics data quality control. R package version 1.16.0. https://github.com/wenbostar/proteoQC

  57. Bittremieux W et al (2017) Computational quality control tools for mass spectrometry proteomics. Proteomics 17:3ā€“4

    ArticleĀ  Google ScholarĀ 

  58. Rudnick PA et al (2010) Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Mol Cell Proteomics 9:225ā€“241

    ArticleĀ  CASĀ  Google ScholarĀ 

  59. Bielow C, Mastrobuoni G, Kempa S (2016) Proteomics quality control ā€“ a quality control software for MaxQuant results. J Proteome Res 15(3):777ā€“787

    ArticleĀ  CASĀ  Google ScholarĀ 

  60. Chiva C et al (2018) QCloud: a cloud-based quality control system for mass spectrometry-based proteomics laboratories. PLoS One 13:1ā€“14

    ArticleĀ  Google ScholarĀ 

  61. Kƶcher T et al (2011) Quality control in LC-MS/MS. Proteomics 11:1026ā€“1030

    ArticleĀ  Google ScholarĀ 

  62. Bramwell D (2013) An introduction to statistical process control in research proteomics. J Proteome 95:3ā€“21

    ArticleĀ  CASĀ  Google ScholarĀ 

  63. Pichler P et al (2012) SIMPATIQCO: a server-based software suite which facilitates monitoring the time course of LC-MS performance metrics on orbitrap instruments. J Proteome Res 11:5540

    ArticleĀ  CASĀ  Google ScholarĀ 

  64. Bereman M et al (2014) Implementation of statistical process control for proteomic experiments via LC MS/MS. J Am Soc Mass Spectrom 25:581ā€“587

    ArticleĀ  CASĀ  Google ScholarĀ 

  65. Dong M, Paul R, Gershanov L (2001) Getting the perfect peaks: system suitability for HPLC. Todays Chemist At Work 10(9):38ā€“42

    Google ScholarĀ 

  66. Shewhart WA (1939) Statistical method from the viewpoint of quality control. Department of Agriculture, Washington, DC, pp 1ā€“7

    Google ScholarĀ 

  67. Western Electric (1958) Statistical quality control handbook. Western Electric, Indianapolis

    Google ScholarĀ 

  68. Westgard JO, Barry PL, Hunt MR (1981) A multi-rule Shewart chart for quality control in clinical chemistry. Clin Chem 27:493ā€“501

    CASĀ  PubMedĀ  Google ScholarĀ 

Download references

Acknowledgments

The authors would wish to acknowledge funding from ELIXIR Implementation Studies, BBSRC [grant number BB/P024599/1], Wellcome Trust [grant number 208391/Z/17/Z], and EMBL core funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Antonio VizcaĆ­no .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2020 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Walzer, M., VizcaĆ­no, J.A. (2020). Review of Issues and Solutions to Data Analysis Reproducibility and Data Quality in Clinical Proteomics. In: Matthiesen, R. (eds) Mass Spectrometry Data Analysis in Proteomics. Methods in Molecular Biology, vol 2051. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9744-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9744-2_15

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9743-5

  • Online ISBN: 978-1-4939-9744-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics