Statistical Approaches to Candidate Biomarker Panel Selection

Spratt, Heidi M.; Ju, Hyunsu

doi:10.1007/978-3-319-41448-5_22

Heidi M. Spratt³ &
Hyunsu Ju Ph.D³

Part of the book series: Advances in Experimental Medicine and Biology ((AEMB,volume 919))

6266 Accesses
5 Citations

Abstract

The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see Fig. 22.1, Chap. 19). Initially, data visualization (Sect. 22.1, below) is important to determine outliers and to get a feel for the nature of the data and whether there appear to be any differences among the groups being examined. From there, the data must be pre-processed (Sect. 22.2) so that outliers are handled, missing values are dealt with, and normality is assessed. Once the processed data has been cleaned and is ready for downstream analysis, hypothesis tests (Sect. 22.3) are performed, and proteins that are differentially expressed are identified. Since the number of differentially expressed proteins is usually larger than warrants further investigation (50+ proteins versus just a handful that will be considered for a biomarker panel), some sort of feature reduction (Sect. 22.4) should be performed to narrow the list of candidate biomarkers down to a more reasonable number. Once the list of proteins has been reduced to those that are likely most useful for downstream classification purposes, unsupervised or supervised learning is performed (Sects. 22.5 and 22.6, respectively).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bibliography

Batista G, Monard M (2002) A study of K-nearest neighbour as an imputation method. Hybrid Intelligent Systems, Santiago, Chile, pp 251–260
Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:125–133
Google Scholar
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, Belmont
Google Scholar
Breiman L (2001) Random forests–random features. University of California, Berkeley
Google Scholar
Carroll R, Ruppert A, Stefanski L, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. CRC Press, London
Book Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines: and other kernel-based learning methods. Cambridge University Press, Cambridge
Book Google Scholar
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26
Article Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Article Google Scholar
Enders C (2001) A primer on maximum likelihood algorithms available for use with missing data. Struct Equ Model Multidiscip J 8:128–141
Article Google Scholar
Friedman J (1999) Greedy function approximation: a gradient boosting machine. Department of Statistics, Stanford University
Google Scholar
Friedman J (2012) Fast sparse regression and classification. Int J Forecast 28:722–738
Article Google Scholar
Friedman J (1991) Multivariate adaptive regression splines. Ann Stat 19:1–41
Article Google Scholar
Fuller W (1987) Measurement error models. Wiley, New York
Book Google Scholar
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
Article CAS PubMed Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning; data mining, inference and prediction. Springer, New York
Google Scholar
Karatzoglou A, Meyer D, Hornik K (2006) Support vector machines in R. J Stat Softw 15:1–28
Article Google Scholar
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Fourteenth international joint conference on artificial intelligence, Montreal, Canada, pp 1137–1143
Google Scholar
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York
Book Google Scholar
Little R, Rubin D (2002) Statistical analysis with missing data, 2nd edn. Wiley & Sons, New York
Google Scholar
Scholkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA
Google Scholar
Shaffer J (1995) Multiple hypothesis testing. Annu Rev Psychol 46:561–584
Article Google Scholar
Steinberg D, Colla P (1995) CART: tree-structured nonparametric data analysis. Salford Systems, San Diego
Google Scholar
Tusher V, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98:5116–5121
Article CAS PubMed PubMed Central Google Scholar
Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577
CAS PubMed Google Scholar
Rubin D (1976) Inference and missing data. Biometrika 63:581–592.
Article Google Scholar

Download references

Author information

Authors and Affiliations

The University of Texas Medical Branch, 301 University Blvd, Galveston, TX, 77555-1148, USA
Heidi M. Spratt & Hyunsu Ju Ph.D

Authors

Heidi M. Spratt
View author publications
You can also search for this author in PubMed Google Scholar
Hyunsu Ju Ph.D
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heidi M. Spratt .

Editor information

Editors and Affiliations

UT Southwestern Medical Center, Dallas, Texas, USA
Hamid Mirzaei
Biotech Division, Neurophagy Therapeutics, INC, Odessa, Texas, USA
Martin Carrasco

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Spratt, H.M., Ju, H. (2016). Statistical Approaches to Candidate Biomarker Panel Selection. In: Mirzaei, H., Carrasco, M. (eds) Modern Proteomics – Sample Preparation, Analysis and Practical Applications. Advances in Experimental Medicine and Biology, vol 919. Springer, Cham. https://doi.org/10.1007/978-3-319-41448-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-41448-5_22
Published: 15 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41446-1
Online ISBN: 978-3-319-41448-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics