Abstract
The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see Fig. 22.1, Chap. 19). Initially, data visualization (Sect. 22.1, below) is important to determine outliers and to get a feel for the nature of the data and whether there appear to be any differences among the groups being examined. From there, the data must be pre-processed (Sect. 22.2) so that outliers are handled, missing values are dealt with, and normality is assessed. Once the processed data has been cleaned and is ready for downstream analysis, hypothesis tests (Sect. 22.3) are performed, and proteins that are differentially expressed are identified. Since the number of differentially expressed proteins is usually larger than warrants further investigation (50+ proteins versus just a handful that will be considered for a biomarker panel), some sort of feature reduction (Sect. 22.4) should be performed to narrow the list of candidate biomarkers down to a more reasonable number. Once the list of proteins has been reduced to those that are likely most useful for downstream classification purposes, unsupervised or supervised learning is performed (Sects. 22.5 and 22.6, respectively).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Bibliography
Batista G, Monard M (2002) A study of K-nearest neighbour as an imputation method. Hybrid Intelligent Systems, Santiago, Chile, pp 251–260
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:125–133
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, Belmont
Breiman L (2001) Random forests–random features. University of California, Berkeley
Carroll R, Ruppert A, Stefanski L, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. CRC Press, London
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines: and other kernel-based learning methods. Cambridge University Press, Cambridge
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Enders C (2001) A primer on maximum likelihood algorithms available for use with missing data. Struct Equ Model Multidiscip J 8:128–141
Friedman J (1999) Greedy function approximation: a gradient boosting machine. Department of Statistics, Stanford University
Friedman J (2012) Fast sparse regression and classification. Int J Forecast 28:722–738
Friedman J (1991) Multivariate adaptive regression splines. Ann Stat 19:1–41
Fuller W (1987) Measurement error models. Wiley, New York
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning; data mining, inference and prediction. Springer, New York
Karatzoglou A, Meyer D, Hornik K (2006) Support vector machines in R. J Stat Softw 15:1–28
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Fourteenth international joint conference on artificial intelligence, Montreal, Canada, pp 1137–1143
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York
Little R, Rubin D (2002) Statistical analysis with missing data, 2nd edn. Wiley & Sons, New York
Scholkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA
Shaffer J (1995) Multiple hypothesis testing. Annu Rev Psychol 46:561–584
Steinberg D, Colla P (1995) CART: tree-structured nonparametric data analysis. Salford Systems, San Diego
Tusher V, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98:5116–5121
Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577
Rubin D (1976) Inference and missing data. Biometrika 63:581–592.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Spratt, H.M., Ju, H. (2016). Statistical Approaches to Candidate Biomarker Panel Selection. In: Mirzaei, H., Carrasco, M. (eds) Modern Proteomics – Sample Preparation, Analysis and Practical Applications. Advances in Experimental Medicine and Biology, vol 919. Springer, Cham. https://doi.org/10.1007/978-3-319-41448-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-41448-5_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41446-1
Online ISBN: 978-3-319-41448-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)