Skip to main content

Statistical Approaches to Candidate Biomarker Panel Selection

  • Chapter
  • First Online:
Modern Proteomics – Sample Preparation, Analysis and Practical Applications

Part of the book series: Advances in Experimental Medicine and Biology ((AEMB,volume 919))

Abstract

The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see Fig. 22.1, Chap. 19). Initially, data visualization (Sect. 22.1, below) is important to determine outliers and to get a feel for the nature of the data and whether there appear to be any differences among the groups being examined. From there, the data must be pre-processed (Sect. 22.2) so that outliers are handled, missing values are dealt with, and normality is assessed. Once the processed data has been cleaned and is ready for downstream analysis, hypothesis tests (Sect. 22.3) are performed, and proteins that are differentially expressed are identified. Since the number of differentially expressed proteins is usually larger than warrants further investigation (50+ proteins versus just a handful that will be considered for a biomarker panel), some sort of feature reduction (Sect. 22.4) should be performed to narrow the list of candidate biomarkers down to a more reasonable number. Once the list of proteins has been reduced to those that are likely most useful for downstream classification purposes, unsupervised or supervised learning is performed (Sects. 22.5 and 22.6, respectively).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bibliography

  1. Batista G, Monard M (2002) A study of K-nearest neighbour as an imputation method. Hybrid Intelligent Systems, Santiago, Chile, pp 251–260

    Google Scholar 

  2. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:125–133

    Google Scholar 

  3. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, Belmont

    Google Scholar 

  4. Breiman L (2001) Random forests–random features. University of California, Berkeley

    Google Scholar 

  5. Carroll R, Ruppert A, Stefanski L, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. CRC Press, London

    Book  Google Scholar 

  6. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines: and other kernel-based learning methods. Cambridge University Press, Cambridge

    Book  Google Scholar 

  7. Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26

    Article  Google Scholar 

  8. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499

    Article  Google Scholar 

  9. Enders C (2001) A primer on maximum likelihood algorithms available for use with missing data. Struct Equ Model Multidiscip J 8:128–141

    Article  Google Scholar 

  10. Friedman J (1999) Greedy function approximation: a gradient boosting machine. Department of Statistics, Stanford University

    Google Scholar 

  11. Friedman J (2012) Fast sparse regression and classification. Int J Forecast 28:722–738

    Article  Google Scholar 

  12. Friedman J (1991) Multivariate adaptive regression splines. Ann Stat 19:1–41

    Article  Google Scholar 

  13. Fuller W (1987) Measurement error models. Wiley, New York

    Book  Google Scholar 

  14. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36

    Article  CAS  PubMed  Google Scholar 

  15. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning; data mining, inference and prediction. Springer, New York

    Google Scholar 

  16. Karatzoglou A, Meyer D, Hornik K (2006) Support vector machines in R. J Stat Softw 15:1–28

    Article  Google Scholar 

  17. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Fourteenth international joint conference on artificial intelligence, Montreal, Canada, pp 1137–1143

    Google Scholar 

  18. Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York

    Book  Google Scholar 

  19. Little R, Rubin D (2002) Statistical analysis with missing data, 2nd edn. Wiley & Sons, New York

    Google Scholar 

  20. Scholkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA

    Google Scholar 

  21. Shaffer J (1995) Multiple hypothesis testing. Annu Rev Psychol 46:561–584

    Article  Google Scholar 

  22. Steinberg D, Colla P (1995) CART: tree-structured nonparametric data analysis. Salford Systems, San Diego

    Google Scholar 

  23. Tusher V, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98:5116–5121

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577

    CAS  PubMed  Google Scholar 

  25. Rubin D (1976) Inference and missing data. Biometrika 63:581–592.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heidi M. Spratt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Spratt, H.M., Ju, H. (2016). Statistical Approaches to Candidate Biomarker Panel Selection. In: Mirzaei, H., Carrasco, M. (eds) Modern Proteomics – Sample Preparation, Analysis and Practical Applications. Advances in Experimental Medicine and Biology, vol 919. Springer, Cham. https://doi.org/10.1007/978-3-319-41448-5_22

Download citation

Publish with us

Policies and ethics