Skip to main content

Efficient Study Designs and Semiparametric Inference Methods for Developing Genomic Biomarkers in Cancer Clinical Research

  • Chapter
  • First Online:
Book cover Frontiers of Biostatistical Methods and Applications in Clinical Oncology
  • 858 Accesses

Abstract

In the development of genomic biomarkers and molecular diagnostics, clinical studies using high-throughput assays such as DNA microarrays generally require enormous costs and efforts. Several efficient study designs for reducing the costs of such expensive measurements have been developed, mainly in the field of epidemiology. Under these efficient designs, expensive measurements are collected only on selected subsamples based on adequate response-selective sampling schemes, and total measurement costs are effectively reduced. In this study, we discuss the application of these effective designs to genomic analyses in cancer clinical studies, and provide relevant statistical methods such as gene selection (e.g., multiple testing based on the false discovery rate). Efficient semiparametric inference methods using auxiliary clinical information are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Simon R. Genomic clinical trials and predictive medicine. New York: Cambridge University Press; 2013.

    Book  Google Scholar 

  2. Crowley J, Hoering A, editors. Handbook of statistics in clinical oncology. 3rd ed. Boca Raton: Chapman Hall/CRC; 2012.

    MATH  Google Scholar 

  3. Matsui S, Buyse M, Simon D, editors. Design and analysis of clinical trials for predictive medicine. Boca Raton: Chapman Hall/CRC; 2015.

    MATH  Google Scholar 

  4. Benjamini Y, Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc B. 1995;57(1):289–300.

    MathSciNet  MATH  Google Scholar 

  5. Storey JD. A direct approach to false discovery rates. J R Stat Soc B. 2002;64(3):479–98. doi:10.1111/1467-9868.00346.

    Article  MathSciNet  MATH  Google Scholar 

  6. Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med. 2002;346(25):1937–47. doi:10.1056/NEJMoa012914.

    Article  Google Scholar 

  7. van de Vijver MJ, He YD, van’t Veer LJ, Dai H, Hart AAM, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002;347(25):1999–2009. doi:10.1056/NEJMoa021967.

    Article  Google Scholar 

  8. Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365(9460):671–9. doi:10.1016/S0140-6736(05)17947-1.

    Article  Google Scholar 

  9. Rothman KJ, Greenland G, Lash TL. Modern epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008.

    Google Scholar 

  10. Lawless JF, Kalbfleisch JD, Wild CJ. Semiparametric methods for response-selective and missing data problems. J R Stat Soc B. 1999;61(2):413–38. doi:10.1111/1467-9868.00185.

    Article  MathSciNet  MATH  Google Scholar 

  11. Breslow NE, McNeney B, Wellner JA. Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Ann Stat. 2003;31(4):1110–39. doi:10.1214/aos/1059655907.

    Article  MathSciNet  MATH  Google Scholar 

  12. Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M. Using the whole cohort in the analysis of case-cohort data. Am J Epidemiol. 2009;169(11):1398–405. doi:10.1093/aje/kwp055.

    Article  Google Scholar 

  13. Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M. Improved Horvitz–Thompson estimation of model parameters from two-phases stratified samples: applications in epidemiology. Stat Biosci. 2009;1(1):32–49. doi:10.1007/s12561-009-9001-6.

    Article  Google Scholar 

  14. Lumley T, Shaw PA, Dai JY. Connections between survey calibration estimators and semiparametric models for incomplete data. Int Stat Rev. 2011;79(2):200–20. doi:10.1111/j.1751-5823.2011.00138.x.

    Article  MATH  Google Scholar 

  15. Laird NM, Lange C. The fundamentals of modern statistical genetics. New York: Springer; 2011.

    Book  MATH  Google Scholar 

  16. Simon RM, Korn EL, McShane LM, Radmacher MD, Wright GW, et al. Design and analysis of DNA microarray investigations. New York: Springer; 2003.

    MATH  Google Scholar 

  17. Thomas DC. Addendum to a paper by Liddell FDK, McDolad JC, Thomas DC, and Cunliffe SV. J R Stat Soc Ser A. 1977;140(4):483–5.

    Google Scholar 

  18. Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11. doi:10.1093/biomet/73.1.1.

    Article  MathSciNet  MATH  Google Scholar 

  19. Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat. 1988;16(1):64–81. doi:10.1214/aos/1176350691.

    Article  MathSciNet  MATH  Google Scholar 

  20. Barlow WE, Ichikawa L, Rosner D, Izumi S. Analysis of case-cohort designs. J Clin Epidemiol. 1999;52(12):1165–72.

    Article  Google Scholar 

  21. Borgan Ø, Langholz B, Samuelsen SO, Goldstein DR, Pogoda J. Exposure stratified case-cohort designs. Lifetime Data Anal. 2000;6(1):39–58. doi:10.1023/A:1009661900674.

    Article  MathSciNet  MATH  Google Scholar 

  22. Barlow WE. Robust variance estimation for the case-cohort design. Biometrics. 1994;50(4):1064–72. doi:10.2307/2533444.

    Article  MATH  Google Scholar 

  23. Kulathinal S, Karvanen J, Saarela O, Kuulasmaa K. Case-cohort design in practice: experiences from the MORGAM Project. Epidemiol Perspect Innov. 2007;4:15. doi:10.1186/1742-5573-4-15.

    Article  Google Scholar 

  24. Desmedt C, Piette F, Loi S, Wang Y, Lallemand F, et al. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res. 2007;13(11):3207–14. doi:10.1158/1078-0432.CCR-06-2765.

    Article  Google Scholar 

  25. Noma H, Tanaka S. Analysis of case-cohort designs with binary outcomes: improving the efficiency using whole cohort auxiliary information. Stat Methods Med Res. 2014;. doi:10.1177/0962280214556175.

    Google Scholar 

  26. Prentice RL, Pyke R. Logistic disease incidence models and case-control studies. Biometrika. 1979;66(3):403–11. doi:10.2307/2335158.

    Article  MathSciNet  MATH  Google Scholar 

  27. Breslow NE, Robins JM, Wellner JA. On the semi-parametric efficiency of logistic regression under case-control sampling. Bernoulli. 2000;6(3):447–55.

    Article  MathSciNet  MATH  Google Scholar 

  28. Hatzis C, Pusztai L, Valero V, Booser DJ, Esserman L, et al. A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer. J Am Med Assoc. 2011;305(18):1873–81. doi:10.1001/jama.2011.593.

    Article  Google Scholar 

  29. Robins JM, Rotnitzky A, Zhao LP. Estimation of regression-coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89(427):846–66. doi:10.2307/2290910.

    Article  MathSciNet  MATH  Google Scholar 

  30. Samuelsen SO. A pseudolikelihood approach to analysis of nested case-control data. Biometrika. 1997;84(2):379–94. doi:10.1093/biomet/84.2.379.

    Article  MathSciNet  MATH  Google Scholar 

  31. Henmi M, Eguchi S. A paradox concerning nuisance parameters and projected estimating functions. Biometrika. 2004;91(4):929–41. doi:10.1093/biomet/91.4.929.

    Article  MathSciNet  MATH  Google Scholar 

  32. Lumley T. Analysis of complex survey samples. J Stat Softw. 2004;. doi:10.18637/jss.v009.i08.

    Google Scholar 

  33. Kulich M, Lin DY. Improving the efficiency of relative-risk estimation in case-control studies. J Am Stat Assoc. 2004;99(467):832–44. doi:10.1198/016214504000000584.

    Article  MATH  Google Scholar 

  34. Qi L, Wang CY, Prentice RL. Weighted estimators for proportional hazards regression with missing covariates. J Am Stat Assoc. 2005;100(472):1250–63. doi:10.1198/016214505000000295.

    Article  MathSciNet  MATH  Google Scholar 

  35. Breslow NE, Wellner JA. Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand J Stat. 2007;34(1):86–102. doi:10.1111/j.1467-9469.2006.00523.x.

    Article  MathSciNet  MATH  Google Scholar 

  36. Scott AJ, Wild CJ. Fitting regression models to case-control data by maximum likelihood. Biometrika. 1997;84(1):57–71. doi:10.1093/biomet/84.1.57.

    Article  MathSciNet  MATH  Google Scholar 

  37. Horvitz D, Thompson D. A generalization of sampling without replacement from a finite population. J Am Stat Assoc. 1952;47(260):663–85. doi:10.2307/2280784.

    Article  MATH  Google Scholar 

  38. Deville JC, Särndal C-E. Calibration estimators in survey sampling. J Am Stat Assoc. 1992;87(418):376–82. doi:10.2307/2290268.

    Article  MathSciNet  MATH  Google Scholar 

  39. Stoer NC, Samuelsen SO. Comparison of estimators in nested case-control studies with multiple outcomes. Lifetime Data Anal. 2012;18(3):261–83. doi:10.1007/s10985-012-9214-8.

    Article  MathSciNet  MATH  Google Scholar 

  40. Deville JC, Särndal C-E, Sautory O. Generalized raking procedures in survey sampling. J Am Stat Assoc. 1993;88(423):1013–20. doi:10.2307/2290793.

    Article  MATH  Google Scholar 

  41. McLachlan GJ. Discriminant analysis and statistical pattern recognition. Hoboken: Wiley; 2004.

    MATH  Google Scholar 

  42. Guo Y, Hastie T, Tibshirani R. Regularized linear discriminant analysis and its application in microarrays. Biostatistics. 2007;8(1):86–100. doi:10.1093/biostatistics/kxj035.

    Article  MATH  Google Scholar 

  43. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. 2nd ed. New York: Springer; 2009.

    Book  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported by the GSK Japan Research Grant, JST-CREST (JPMJCR1412), and a Grant-in-Aid for Scientific Research (15K15954) from the Ministry of Education, Culture, Sports, Science and Technology of Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hisashi Noma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this chapter

Cite this chapter

Noma, H. (2017). Efficient Study Designs and Semiparametric Inference Methods for Developing Genomic Biomarkers in Cancer Clinical Research. In: Matsui, S., Crowley, J. (eds) Frontiers of Biostatistical Methods and Applications in Clinical Oncology. Springer, Singapore. https://doi.org/10.1007/978-981-10-0126-0_23

Download citation

Publish with us

Policies and ethics