Abstract
Two-dimensional gel electrophoresis remains the most widely used technique for protein separation in plant proteomics experiments. Despite the continuous technical advances and improvements in current 2-DE protocols, an adequate and correct experimental design and statistical analysis of the data tend to be ignored or not properly documented in current literature. Both proper experimental design and appropriate statistical analysis are requested in order to confidently discuss our results and to conclude from experimental data.
In this chapter, we describe a model procedure for a correct experimental design and a complete statistical analysis of proteomic dataset. Our model procedure covers all of the steps in data mining and processing, starting with the data preprocessing (transformation, missing value imputation, definition of outliers) and univariate statistics (parametric and nonparametric tests), and finishing with multivariate statistics (clustering, heat-mapping, PCA, ICA, PLS-DA).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abril N, Gion J-M, Kerner R (2011) Proteomics research on forest trees, the most recalcitrant and orphan plant species. Phytochemistry 72:1219–1242
Jorrin Novo JV, Maldonado AM, Echevarría-Zomeño S (2009) Second generation proteomic techniques, an appropriate experimental design and data analysis to fulfill MIAPE standards, increase plant proteome coverage and biological knowledge. J Proteomics 72:285–314
Valledor L, Jorrin JV (2011) Back to the basics: maximizing the information obtained by quantitative two dimensional gel electrophoresis analyses by an appropriate experimental design and statistical analyses. J Proteomics 74:1–18
Stessl M, Noe CR, Lachmann B (2009) Influence of image-analysis software on quantitation of two-dimensional gel electrophoresis data. Electrophoresis 30:325–328
Grove H, Jørgensen BM, Jessen F et al (2008) Combination of statistical approaches for analysis of 2-DE data gives complementary results. J Proteome Res 7:5119–5124
Berth M, Moser FM, Kolbe M et al (2007) The state of the art in the analysis of two dimensional gel electrophoresis images. Appl Microbiol Biotechnol 76:1223–1243
Core Team R (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0
Sun X, Weckwerth W (2012) COVAIN: a toolbox for uni- and multivariate statistics, time-series and correlation network analysis and inverse estimation of the differential Jacobian from metabolomics covariance data. Metabolomics 8:S81–S93
Horgan GW (2007) Sample size and replication in 2D gel electrophoresis studies. J Proteome Res 6:2884–2887
Valledor L, Castillejo MA, Lenz C et al (2008) Proteomic analysis of Pinus radiata needles: 2-DE Map and protein identification by LC/MS/MS and substitution-tolerant database searching. J Proteome Res 7:2616–2631
Fay MP, Proschan MA (2009) Wilcoxon–Mann–Whitney or t-test? On assumptions for hypothesis tests and multiple interpretation of decision rules. Stat Surv 4:1–39
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
Nedenskov Jensen K, Jessen F, Jørgensen BM (2008) Multivariate data analysis of two-dimensional gel electrophoresis protein patterns from few samples. J Proteome Res 7:1288–1296
Safavi H, Correa N, Xiong W et al (2008) Independent component analysis of 2-D electrophoresis gels. Electrophoresis 29:4017–4026
Morgenthal K, Wienkoop S, Scholz M et al (2005) Correlative GC–TOF–MS-based metabolite profiling and LC–MS-based protein profiling reveal time-related systemic regulation of metabolite-protein networks and improve pattern recognition for multiple biomarker selection. Metabolomics 1:109–121
Lê Cao K-A, Boitard S, Besse P (2011) Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 22:253
Meunier B, Dumas E, Piec I et al (2007) Assessment of hierarchical clustering methodologies for protemic data mining. J Proteome Res 6:358–366
Zhou J-Y, Schepmoes AA, Zhang X et al (2010) Improved LC-MS/MS spectral counting statistics by recovering low-scoring spectra matched to confidently identified peptide sequences. J Proteome Res 9:5698–5704
Peng J, Elias JE, Thoreen CC et al (2002) Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J Proteome Res 2:43–50
Valledor L, Recuenco L, Egelhofer V et al (2012) The different proteomes of Chlamydomonas reinhardtii. J Proteomics 75:5883–5887
Paoletti AC, Parmely TJ, Tomomori-Sato C et al (2006) Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors. Proc Natl Acad Sci USA 103:18928–18933
Acknowledgements
L.V.’s work was generously supported by a Marie Curie IEF Grant (FP7-PEOPLE-IEF, European Union).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Valledor, L., Romero-Rodríguez, M.C., Jorrin-Novo, J.V. (2014). Standardization of Data Processing and Statistical Analysis in Comparative Plant Proteomics Experiment. In: Jorrin-Novo, J., Komatsu, S., Weckwerth, W., Wienkoop, S. (eds) Plant Proteomics. Methods in Molecular Biology, vol 1072. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-631-3_5
Download citation
DOI: https://doi.org/10.1007/978-1-62703-631-3_5
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-630-6
Online ISBN: 978-1-62703-631-3
eBook Packages: Springer Protocols