Abstract
In any analytical discipline, data analysis reproducibility is closely interlinked with data quality. In this book chapter focused on mass spectrometry-based proteomics approaches, we introduce how both data analysis reproducibility and data quality can influence each other and how data quality and data analysis designs can be used to increase robustness and improve reproducibility. We first introduce methods and concepts to design and maintain robust data analysis pipelines such that reproducibility can be increased in parallel. The technical aspects related to data analysis reproducibility are challenging, and current ways to increase the overall robustness are multifaceted. Software containerization and cloud infrastructures play an important part.
We will also show how quality control (QC) and quality assessment (QA) approaches can be used to spot analytical issues, reduce the experimental variability, and increase confidence in the analytical results of (clinical) proteomics studies, since experimental variability plays a substantial role in analysis reproducibility. Therefore, we give an overview on existing solutions for QC/QA, including different quality metrics, and methods for longitudinal monitoring. The efficient use of both types of approaches undoubtedly provides a way to improve the experimental reliability, reproducibility, and level of consistency in proteomics analytical measurements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- BSA:
-
Bovine serum albumin
- CWL:
-
Common workflow language
- DAC:
-
Data Access Compliance
- DACO:
-
Data Access Compliance Office
- DDA:
-
Data-dependent acquisition
- DIA:
-
Data-independent acquisition
- FDR:
-
False discovery rate
- GUI:
-
Graphical user interface
- HPC:
-
High-performance computing
- HUPO:
-
Human Proteome Organization
- LC:
-
Liquid chromatography
- LCL:
-
Lower control level
- MS:
-
Mass spectrometry
- PCAWG:
-
Pan-cancer analysis of whole genomes
- PSI:
-
Proteomics Standards Initiative
- QA:
-
Quality assessment
- QC:
-
Quality control
- SOP:
-
Standard operating procedure
- SPC:
-
Statistical process control
- SRM:
-
Selected reaction monitoring
- UCL:
-
Upper control level
- WMS:
-
Workflow management system
References
Meo AD et al (2014) What is wrong with clinical proteomics? Clin Chem 60:1258ā1266
Foster JM et al (2011) A posteriori quality control for the curation and reuse of public proteomics data. Proteomics 11(11):2182ā2194
Klont F et al (2018) Assessment of sample preparation bias in mass spectrometry-based proteomics. Anal Chem 90:5405ā5413
Apweiler R et al (2009) Approaching clinical proteomics: current state and future fields of application in fluid proteomics. Clin Chem Lab Med 47:724ā744
Cairns DA et al (2008) Integrated multi-level quality control for proteomic profiling studies using mass spectrometry. BMC Bioinformatics 9:519
Dogu E et al (2017) MSstatsQC: longitudinal system suitability monitoring and quality control for targeted proteomic experiments. Mol Cell Proteomics 16:1335ā1347
Clough T et al (2012) Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs. BMC Bioinformatics 13(Suppl 1):S6
Piehowski PD et al (2013) Sources of technical variability in quantitative LCāMS proteomics: human brain tissue sample analysis. J Proteome Res 12(5):2128ā2137
Villanueva J, Carrascal M, Abian J (2014) Isotope dilution mass spectrometry for absolute quantification in proteomics: concepts and strategies. J Proteome 96:184ā199
Easing the burden of code review (2018) Nat Methods 15(9):641
Kanwal S et al (2017) Investigating reproducibility and tracking provenance - a genomic workflow case study. BMC Bioinformatics 18:1ā14
Leprevost FD et al (2017) BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33(16):2580ā2582
Barsnes H, Vaudel M (2018) SearchGUI: a highly adaptable common interface for proteomics search and de novo engines. J Proteome Res 17(7):2552ā2555
Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26:1367ā1372
Pluskal T et al (2010) MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 11:395
Kessner D et al (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24(21):2534ā2536
Prince JT, Marcotte EM (2008) mspire: mass spectrometry proteomics in ruby. Bioinformatics 24(23):2796ā2797
Lopez-Fernandez H et al (2015) Mass-Up: an all-in-one open software application for MALDI-TOF mass spectrometry knowledge discovery. BMC Bioinformatics 16:318
KƤll L, Canterbury J, Weston J (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nature 4:923ā925
Rƶst HL et al (2016) OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods 13:741ā748
Ison J et al (2016) Tools and data services registry: a community effort to document bioinformatics resources. Nucleic Acids Res 44:D38āD47
Deutsch EW et al (2015) Development of data representation standards by the human proteome organization proteomics standards initiative. J Am Med Inform Assoc 22(3):495ā506
Deutsch EW et al (2017) Proteomics standards initiative: fifteen years of progress and future work. J Proteome Res 16:4288ā4298
Orchard S, Hermjakob H, Apweiler R (2003) The proteomics standards initiative. Proteomics 3:1374ā1376
Martens L et al (2011) mzMLāa community standard for mass spectrometry data. Mol Cell Proteomics 10:R110.000133
Martens L, VizcaĆno JA, Banks R (2011) Quality control in proteomics. Proteomics 11:1015ā1016
Perkins DN et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551ā3567
Eng JK, Jahan TA, Hoopmann MR (2013) Comet: an open-source MS/MS sequence database search tool. Proteomics 13:22ā24
Kim S, Pevzner PA (2014) MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 5:5277
Fenyƶ D, Beavis RC (2003) A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal Chem 75:768ā774
Jones AR et al (2012) The mzIdentML data standard for mass spectrometry-based proteomics results. Mol Cell Proteomics 11:M111.014381
VizcaĆno JA et al (2017) The mzIdentML data standard version 1.2, supporting advances in proteome informatics. Mol Cell Proteomics 16:1275ā1285
Griss J et al (2014) The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Mol Cell Proteomics 13:2765
Walzer M et al (2013) The mzQuantML data standard for mass spectrometry-based quantitative studies in proteomics. Mol Cell Proteomics 12:2332ā2340
Walzer M et al (2014) qcML: an exchange format for quality control metrics from mass spectrometry experiments. Mol Cell Proteomics 13:1905ā1913
Xu T et al (2015) ProLuCID: an improved SEQUEST-like algorithm with enhanced sensitivity and specificity. J Proteome 129:16ā24
Zhang J et al (2012) PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics 11:M111.010587
Vaudel M et al (2015) PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat Biotechnol 33:22ā24
Searle BC (2010) Scaffold: a bioinformatic tool for validating MS/MS-based proteomic studies. Proteomics 10(6):1265ā1269
Amstutz P et al (2016) Common workflow language, v1.0
Afgan E et al (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 46:W537āW544
Berthold MR et al (2009) KNIME - the Konstanz information miner. ACM SIGKDD Explor Newsl 11:26
Gillet LCL et al (2012) Targeted data extraction of the MS/MS spectra generated by data independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 11:1ā45
Rƶst HL et al (2014) OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat Biotechnol 32:219ā223
Collins BC et al (2017) Multi-laboratory assessment of reproducibility, qualitative and quantitative performance of SWATH-mass spectrometry. Nat Commun 8:1ā11
Moreno P et al (2018) Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud. bioRxiv. Preprint
Peters K et al (2018) PhenoMeNal: processing and analysis of Metabolomics data in the Cloud. bioRxiv. Preprint
Albar JP, Canals F (2013) Standardization and quality control in proteomics. J Proteome 95:1ā2
Tabb DDL et al (2010) Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J Proteome Res 9:761ā776
Bateman A et al (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158āD169
Tabb DL (2013) Quality assessment for clinical proteomics. Clin Biochem 46:411ā420
Rodriguez H, Pennington SR (2018) Revolutionizing precision oncology through collaborative proteogenomics and data sharing. Cell 173:535ā539
Wang X et al (2014) QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statistics. Anal Chem 86:2497ā2509
Bittremieux W et al (2015) iMonDB: mass spectrometry quality control through instrument monitoring. J Proteome Res 2015:150323163122004
Ma ZQ et al (2012) QuaMeter: multivendor performance metrics for LC-MS/MS proteomics instrumentation. Anal Chem 84:5845ā5850
Gatto L, Wen B (2018) proteoQC: an R package for proteomics data quality control. R package version 1.16.0. https://github.com/wenbostar/proteoQC
Bittremieux W et al (2017) Computational quality control tools for mass spectrometry proteomics. Proteomics 17:3ā4
Rudnick PA et al (2010) Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Mol Cell Proteomics 9:225ā241
Bielow C, Mastrobuoni G, Kempa S (2016) Proteomics quality control ā a quality control software for MaxQuant results. J Proteome Res 15(3):777ā787
Chiva C et al (2018) QCloud: a cloud-based quality control system for mass spectrometry-based proteomics laboratories. PLoS One 13:1ā14
Kƶcher T et al (2011) Quality control in LC-MS/MS. Proteomics 11:1026ā1030
Bramwell D (2013) An introduction to statistical process control in research proteomics. J Proteome 95:3ā21
Pichler P et al (2012) SIMPATIQCO: a server-based software suite which facilitates monitoring the time course of LC-MS performance metrics on orbitrap instruments. J Proteome Res 11:5540
Bereman M et al (2014) Implementation of statistical process control for proteomic experiments via LC MS/MS. J Am Soc Mass Spectrom 25:581ā587
Dong M, Paul R, Gershanov L (2001) Getting the perfect peaks: system suitability for HPLC. Todays Chemist At Work 10(9):38ā42
Shewhart WA (1939) Statistical method from the viewpoint of quality control. Department of Agriculture, Washington, DC, pp 1ā7
Western Electric (1958) Statistical quality control handbook. Western Electric, Indianapolis
Westgard JO, Barry PL, Hunt MR (1981) A multi-rule Shewart chart for quality control in clinical chemistry. Clin Chem 27:493ā501
Acknowledgments
The authors would wish to acknowledge funding from ELIXIR Implementation Studies, BBSRC [grant number BB/P024599/1], Wellcome Trust [grant number 208391/Z/17/Z], and EMBL core funding.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2020 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Walzer, M., VizcaĆno, J.A. (2020). Review of Issues and Solutions to Data Analysis Reproducibility and Data Quality in Clinical Proteomics. In: Matthiesen, R. (eds) Mass Spectrometry Data Analysis in Proteomics. Methods in Molecular Biology, vol 2051. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9744-2_15
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9744-2_15
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9743-5
Online ISBN: 978-1-4939-9744-2
eBook Packages: Springer Protocols