Statistical Analysis of Spectral Count Data Generated by Label-Free Tandem Mass Spectrometry-Based Proteomics
Label-free strategies for quantitative proteomics provide a versatile and economical alternative to labeling-based proteomics strategies. We have shown for different types of biological samples that spectral counting-based label-free quantitation is a promising avenue for biomarker discovery. Analyzing spectral count data generated from these studies is, however, not straightforward, as commonly used techniques for genomics data analysis are not suitable. In this book chapter, we describe three methods to analyze spectral count data, namely, cluster analysis, significance analysis of independent samples, and significance analysis of paired samples. For cluster analysis, we devise a novel distance measure between samples based on the Jeffrey divergence. This measure prevents highly abundant proteins from dominating others in contribution to the total sample difference. We employ the beta-binomial distribution for significance analysis of independent samples, which integrates both within-sample variation and between-sample variation into a single statistical model. Finally, the Mantel–Haenszel test is used for significance analysis of paired samples. We provide detailed illustrations of the steps involved in the analyses.
Key wordsBeta-binomial distribution Biomarker discovery Cluster analysis Comparative analysis Label-free tandem mass spectrometry-based proteomics Spectral counting
This work is supported by the VUmc Cancer Center, Amsterdam.
- 4.Albrethsen J, Knol JC, Piersma SR, Pham TV, de Wit M, Mongera S, Carvalho B, Verheul HM, Fijneman RJ, Meijer GA, Jimenez CR (2010) Sub-nuclear proteomics in colorectal cancer: Identification of proteins enriched in the nuclear matrix fraction and regulation in adenoma to carcinoma progression. Molecular and Cellular Proteomics, 9(5):988–1005.PubMedCrossRefGoogle Scholar
- 6.Piersma SR, Fiedler U, Span S, Lingnau A, Pham TV, Hoffmann S, Kubbutat MHG, Jimenez CR (2010) Workflow comparison for in-depth, quantitative secretome proteomics for cancer biomarker discovery: Method evaluation, differential analysis and verification in serum. Journal of Proteome Research, 9(4):1913–1922.PubMedCrossRefGoogle Scholar
- 7.Rajcevic U, Piersma SR, Bougnaud S, Pham TV, Enger P, Bjerkvig R, Jimenez CR, Niclou SP (2009) Enrichment of tumorigenic stem-like cells in biopsy spheroids from colorectal cancer. In Proceedings of the 8th Annual World Congress HUPO 2009, Toronto, Canada.Google Scholar
- 9.Saydam O, Senol O, Schaaij-Visser TB, Pham TV, Piersma SR, Stemmer-Rachamimov AO, Wurdinger T, Peerdeman SM, Jimenez CR (2010) Comparative protein profiling reveals minichromosome maintenance (MCM) proteins as novel potential tumor markers for meningiomas. Journal of Proteome Research, 9(1):485–494.PubMedCrossRefGoogle Scholar
- 10.Zybailov B, Friso G, Kim J, Rudella A, Rodriguez VR, Asakura Y, Sun Q, van Wijk KJ (2009) Large scale comparative proteomics of a Chloroplast Clp protease mutant reveals folding stress, altered protein homeostasis, and feedback regulation of metabolism. Molecular & Cellular Proteomics, 8(8), 1789–1810.CrossRefGoogle Scholar
- 11.R Development Core Team (2009) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.
- 15.Sokal RR, Rohlf FJ (1995) Biometry: the principles and practice of statistics in biological research (3rd edition). W. H. Freeman: New York., Chapter 17. Analysis of frequencies, 685–793.Google Scholar
- 17.Skellam JG (1948) A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. Journal of the Royal Statistical Society. Series B (Methodological), 10(2), 257–261.Google Scholar