Statistical Methods for Proteomics

  • Klaus Jung
Part of the Methods in Molecular Biology book series (MIMB, volume 620)


During the last decade, analytical methods for the detection and quantification of proteins and peptides in biological samples have been considerably improved. It is therefore now possible to compare simultaneously the expression levels of hundreds or thousands of proteins in different types of tissue, for example, normal and cancerous, or in different cell lines. In this chapter, we illustrate statistical designs for such proteomics experiments as well as methods for the analysis of resulting data. In particular, we focus on the preprocessing and analysis of protein expression levels recorded by the use of either two-dimensional gel electrophoresis or mass spectrometry.

Key words

Protein expression data preprocessing differential proteome analysis disease classification two-dimensional gel electrophoresis mass spectrometry 


  1. 1.
    Nesvizhskii, A. I., Keller, A., Kolker, E., and Aebersold, R. (2002) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75, 4646–4658.CrossRefGoogle Scholar
  2. 2.
    Urfer, W., Grzegorczyk, M., and Jung, K. (2006) Statistics for proteomics: a review of tools for analyzing experimental data. Pract Proteomics 1, 48–55.CrossRefGoogle Scholar
  3. 3.
    Klose, J., and Kobalz, U. (1995) Two-dimensional electrophoresis of proteins: and updated protocol and implications for functional analysis of the genome. Electrophoresis 4, 1034–1059.CrossRefGoogle Scholar
  4. 4.
    Ünlü, M., Morgan, M. E., and Minden, J. S. (1997) Difference gel electrophoresis: A single gel method for detecting changes in protein extracts. Electrophoresis 18, 2071–2077.PubMedCrossRefGoogle Scholar
  5. 5.
    Aebersold, R., and Goodlett, D. R. (2001) Mass spectrometry in proteomics. Chem Rev 101, 269–295.PubMedCrossRefGoogle Scholar
  6. 6.
    Stühler, K., Pfeiffer, K., Joppich, C., Stephan, C., Jung, K., Müller, M., Schmidt, O., van Hall, A., Hamacher, M., Urfer, W., Meyer, H. E., and Marcus, K. (2006) Pilot study of the Human Proteome Organisation Brain Proteome Project: Applying different 2-DE techniques to monitor proteomic changes during murine brain development. Proteomics 6, 4899–4913.PubMedCrossRefGoogle Scholar
  7. 7.
    Karp, N. A., McCormick, P. S., Russell, M. R., and Lilley, K. S. (2007) Experimental and statistical considerations to avoid false conclusions in proteomic studies using differential in-gel electrophoresis. Mol Cell Proteomics 6, 1354–1364.PubMedCrossRefGoogle Scholar
  8. 8.
    Fodor, I. K., Nelson, D. O., Alegria-Hartman, M., Robbins, K., Langlois, R. G., Turteltaub, K. W., Corzett, T.H., and McCutchen-Maloney, S.L. (2005) Statistical challenges in analysis of two-dimensional difference gel electrophoresis experiments using DeCyder. Bioinformatics 21, 3733–3740.PubMedCrossRefGoogle Scholar
  9. 9.
    Chich, J.-F., David, O., Villers, F., Schaeffer, B., Lutomski, D., and Huet, S. (2007) Statistics for proteomics: Experimental design and 2-DE differential analysis. J Chromatogr B 849, 261–272.CrossRefGoogle Scholar
  10. 10.
    Kreil, D. P., Karp, N. A., and Lilley, K. S. (2004) DNA microarray normalization methods can remove bias from differential protein expression analysis of 2D difference gel electrophoresis results. Bioinformatics 20, 2026–3740.PubMedCrossRefGoogle Scholar
  11. 11.
    Huber, W., Heydebreck, A., von Sültmann, H., Poustka, A., and Vingron, M. (2002) Variance stabilization applied to microarray data calibration and the quantification of differential expression. Bioinformatics 18, S96–S104.PubMedCrossRefGoogle Scholar
  12. 12.
    Bolstad, B. M., Irizarry R. A., Astrand, M., and Speed, T. P. (2003) A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 19, 185–193.PubMedCrossRefGoogle Scholar
  13. 13.
    Jung, K., Gannoun, A., Sitek, B., Meyer, H. E., Stühler, K., and Urfer, W. (2005) Analysis of dynamic protein expression data. RevStat-Stat J 3, 99–111.Google Scholar
  14. 14.
    Jung, K., Gannoun, A., Sitek, B., Apostolov, O., Schramm, A., Meyer, H. E., Stühler, K., and Urfer, W. (2006) Statistical evaluation of methods for the analysis of dynamic protein expression data from a tumor study. RevStat-Stat J 4, 67–80.Google Scholar
  15. 15.
    Smyth, G. K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3, Article 3.Google Scholar
  16. 16.
    Dudoit, S., Shaffer, J. P., and Boldrick, J. C. (2003) Multiple hypothesis testing in microarray experiments. Stat Sci 18, 71–103.CrossRefGoogle Scholar
  17. 17.
    Jung, K., Poschmann, G., Podwojski, K., Eisenacher, M., Kohl, M., Pfeiffer, K., Meyer, H. E., Stühler, K., and Stephan, C. (2009) adjusted confidence intervals for the expression change of proteins observed in 2-dimensional difference gel electrophoresis. J Proteomics Bioinform 2, 78–87.CrossRefGoogle Scholar
  18. 18.
    Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H., and Aebersold, R. (1999) Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol 17, 994–999.PubMedCrossRefGoogle Scholar
  19. 19.
    Ross, P. L., Huang, Y. N., Marchese, J. N., et al. (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using aminereactive isobaric tagging reagents. Mol Cell Proteomics 3, 1154–1169.PubMedCrossRefGoogle Scholar
  20. 20.
    Boehm, A. M., Pütz, S., Altenhöfer, D., Sickmann, A., and Falk, M. (2007) Precise protein quantification based on peptide quantification using iTRAQ™. BMC Bioinformatics 8, 214.PubMedCrossRefGoogle Scholar
  21. 21.
    Jeffries, N. (2005) Algorithms for alignment of mass spectrometry proteomic data. Bioinformatics 21, 3066–3073.PubMedCrossRefGoogle Scholar
  22. 22.
    Pusch, W., Flocco, M. T., Leung, S.-M., Thiele, H., and Kostrzewa, M. (2003) Mass spectrometry-based clinical proteomics. Pharmacogenomics 4, 463–476.PubMedCrossRefGoogle Scholar
  23. 23.
    Jeffries, N. O. (2004) Performance of a genetic algorithm for mass spectrometry proteomics. BMC Bioinformatics 5, 180.PubMedCrossRefGoogle Scholar
  24. 24.
    Lilien, R. H., Farid, H., and Donald, B. R. (2003) Probabilistic disease classification of expression dependent proteomic data from mass spectrometry of human serum. J Comput Biol 10, 925–946.PubMedCrossRefGoogle Scholar
  25. 25.
    Zhang, X., Lu, X., Shi, Q., Xu, X., Leung, H., Harris, L. N., Iglehart, J. D., Miron, A., Liu, J. S., and Wong, W. H. (2006) Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics 7, 197.PubMedCrossRefGoogle Scholar
  26. 26.
    Cairns, D. A., Barrett, J. H., Billingham, L. J., Stanley, A. J., Xinarianos, G., Field, J. K., Johnson, P. J., Selby, P. J., and Banks, R. E. (2009) Sample size determination in clinical proteomic profiling experiments using mass spectrometry for class comparison. Proteomics 9, 74–86.PubMedCrossRefGoogle Scholar
  27. 27.
    Fu, W. J., Dougherty, E. R., Mallick, B., and Carrol, R. (2005) How many samples are needed to build a classifier: A general sequential approach. Bioinformatics 21, 63–70.PubMedCrossRefGoogle Scholar
  28. 28.
    Sitek, B., Apostolov, O., K. S., Pfeiffer, K., Meyer, H. E., Eggert, A., and Schramm, A. (2005) Identification of dynamic proteome changes upon ligand activation of trk-receptors using two-dimensional fluorescence difference gel electrophoresis and mass spectrometry. Mol Cell Proteomics 4, 291–299.Google Scholar
  29. 29.
    Brunner, E., Domhof, S., and Langer, F. (2002) Nonparametric Analysis of Longitudinal Data in Factorial Experiments. John Wiley & Sons, New York.Google Scholar
  30. 30.
    Grzegorczyk, M. (2007) Extracting protein regulatory networks with graphical models. Proteomics 7(S1), 51–59.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Klaus Jung
    • 1
  1. 1.Department of Medical StatisticsGeorg-August-University GöttingenGöttingenGermany

Personalised recommendations