Machine Learning Techniques in Cancer Prognostic Modeling and Performance Assessment

Chen, Yiyi; Millar, Jess A.

doi:10.1007/978-981-10-0126-0_13

Yiyi Chen³ &
Jess A. Millar⁴

1017 Accesses
3 Citations
2 Altmetric

Abstract

Prognostic models for disease occurrence, tumor progression and survival are abundant for most types of cancers. Physicians and cancer patients are utilizing these models to make informed treatment decisions and corresponding arrangements. However, not all cancer prognostic models are built and validated rigorously. Some are more useful and reliable than others. In this chapter, we briefly introduce some popular machine learning methods for constructing cancer prognostic models, and discuss pros and cons of each. We also introduce the commonly used discrimination and calibration metrics for assessing predictive performance and validating the prognostic models. In the end, we outline several challenges of using prognostic models in the real world for clinical decision-making support, and propose related suggestions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmad A. Pathways to breast cancer recurrence. ISRN Oncol. 2013;2013:290568. doi:10.1155/2013/290568.
Google Scholar
Ahmad LG, Eshlaghy AT, Poorebrahimi A, et al. Using three machine learning techniques for predicting breast cancer recurrence. J Heal Med Inform. 2013;4:1000124. doi:10.4172/2157-7420.1000124.
Google Scholar
Altman DG, Royston P. What do we mean by validating a prognistic model? Stat Med. 2000;19:453–73.
Article Google Scholar
Ankerst DP, Hoefler J, Bock S, et al. Prostate cancer prevention trial risk calculator 2.0 for the prediction of low- vs high-grade prostate cancer. Urology. 2014;83:1362–7. doi:10.1016/j.urology.2014.02.035.
Article Google Scholar
Bellaachia A, Guven E. Predicting breast cancer survivability using data mining techniques. SIAM Int Conf Data Min. 2006;6:1–4. doi:10.1109/ICSTE.2010.5608818.
Google Scholar
Bharathi A, Natarajan AM. Cancer classification using support vector machines and relevance vector machine based on analysis of variance features. J Comput Sci. 2011;7:1393–9.
Article Google Scholar
De Bin R, Sauerbrei W, Boulesteix A-L. Investigating the prediction ability of survival models based on both clinical and omics data: Two case studies. Stat Med. 2014;33:5310–29. doi:10.1002/sim.6246.
Article MathSciNet Google Scholar
Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual ACM workshop on computational learning theory. New York: ACM Press; 1992. p. 144–152.
Google Scholar
Bottaci L, Drew PJ, Hartley JE, et al. Artificial neural networks applied to outcome prediction for colorectal cancer patients in separate institutions. Lancet. 1997;350:469–72. doi:10.1016/S0140-6736(96)11196-X.
Article Google Scholar
Bou-Hamd I, Larocque D, Ben-Ameur H. A review of survival trees. Stat Surv. 2011;5:44–71. doi:10.1214/09-SS047.
Article MathSciNet MATH Google Scholar
Boulesteix A, Sauerbrei W. Added predictive value of high-throughput molecular data to clinical data and its validation. Brief Bioinform. 2011;12:215–29. doi:10.1093/bib/bbq085.
Article Google Scholar
Burges CJC. A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov. 1998;2:121–67.
Article Google Scholar
Burke HB, Goodman PH, Rosen DB, et al. Artificial neural networks improve the accuracy of cancer survival prediction. Cancer. 1997;79:857–62.
Article Google Scholar
Chow E, Abdolell M, Panzarella T, et al. Predictive model for survival in patients with advanced cancer. J Clin Oncol. 2008;26:5863–9. doi:10.1200/JCO.2008.17.1363.
Article Google Scholar
Chow E, James JL, Hartsell W, et al. Validation of a predictive model for survival in patients with advanced cancer: Secondary analysis of RTOG 9714. World J Oncol. 2011;2:181–90. doi:10.4021/wjon325w.
Google Scholar
Clark GM. Prognostic factors versus predictive factors: Examples from a clinical trial of erlotinib. Mol Oncol. 2008;1:406–12. doi:10.1016/j.molonc.2007.12.001.
Article Google Scholar
Craven MW, Shavlik JW. Extracting tree-structured representations of trained networks. In: Advances in neural information processing systems. Denver: MIT Press; 1996. p. 24–30.
Google Scholar
Delen D, Walker G, Kadam A. Predicting breast cancer survivability: A comparison of three data mining methods. Artif Intell Med. 2005;34:113–27. doi:10.1016/j.artmed.2004.07.002.
Article Google Scholar
Dettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics. 2003;19:1061–9. doi:10.1093/bioinformatics/btf867.
Article Google Scholar
Faraggi D, LeBlanc M, Crowley J. Understanding neural networks using regression trees: an application to multiple myeloma survival data. Stat Med. 2001;20:2965–76. doi:10.1002/sim.912.
Article Google Scholar
Freund Y, Schapire RE. A desicion-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55:119–39. doi:10.1006/jcss.1997.1504.
Article MATH Google Scholar
Friedman JH, Meulman JJ. Multiple additive regression trees with application in epidemiology. Stat Med. 2003;22:1365–81. doi:10.1002/sim.1501.
Article Google Scholar
Furey TS, Cristianini N, Duffy N, et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics. 2000;16:906–14.
Article Google Scholar
Ganesan N, Vankatesh K, Rama MA, Palani AM. Application of neural networks in diagnosing cancer disease using demographic data. Int J Comput Appl. 2010;1:76–85. doi:10.5120/476-783.
Google Scholar
Garson DG. Interpreting neural-network connection weights. Artif Intell Expert. 1991;6:46–51.
Article Google Scholar
Ge G, Wong GW. Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles. BMC Bioinform. 2008;9:275. doi:10.1186/1471-2105-9-275.
Article Google Scholar
Glare P. Clinical predictors of survival in advanced cancer. J Support Oncol. 2005;3:331–9.
Google Scholar
Goh ATC. Back-propagation neural networks for modeling complex systems. Artif Intell Eng. 1995;9:143–51. doi:10.1016/0954-1810(94)00011-S.
Article Google Scholar
Goldberg Y, Kosorok MR. Support vector regression for right censored data. 2012. arXiv 1202.5130v2.
Google Scholar
Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Stat Med. 1999;18:2529–45. doi:10.1002/(SICI)1097-0258(19990915/30)18:17/18<2529:AID-SIM274>3.0.CO;2-5.
Article Google Scholar
Gupta S, Tran T, Luo W, et al. Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry. BMJ Open. 2014;4:e004007. doi:10.1136/bmjopen-2013-004007.
Article Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389–422.
Article MATH Google Scholar
Halabi S, Lin C-Y, Kelly WK, et al. Updated prognostic model for predicting overall survival in first-line chemotherapy for patients with metastatic castration-resistant prostate cancer. J Clin Oncol. 2014;32:671–7. doi:10.1200/JCO.2013.52.3696.
Article Google Scholar
Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–87.
Article Google Scholar
Henderson R, Jones M, Stare J. Accuracy of point predictions in survival analysis. Stat Med. 2001;20:3083–96. doi:10.1002/sim.913.
Article Google Scholar
Henderson R, Keiding N. Individual survival time prediction using statistical models. J Med Ethics. 2005;31:703–6. doi:10.1136/jme.2005.012427.
Article Google Scholar
Hofner B, Boccuto L, Göker M. Controlling false discoveries in high-dimensional situations: boosting with stability selection. BMC Bioinform. 2015;16:144. doi:10.1186/s12859-015-0575-3.
Article Google Scholar
Hosmer DW, Lemeshow S, Sturdivant RX. Applied logistic regression. 3rd ed. New York: Wiley Interscience; 2013.
Book MATH Google Scholar
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat. 2008;2:841–60. doi:10.1214/08-AOAS169.
Article MathSciNet MATH Google Scholar
Jonsdottir T, Hvannberg ET, Sigurdsson H, Sigurdsson S. The feasibility of constructing a predictive outcome model for breast cancer using the tools of data mining. Expert Syst Appl. 2008;34:108–18. doi:10.1016/j.eswa.2006.08.029.
Article Google Scholar
Kass GV. An exploratory technique for investigating large quantities of categorical data. Appl Stat. 1980;29:119–27. doi:10.2307/2986296.
Article Google Scholar
Katz MHG, Hu C-Y, Fleming JB, et al. A clinical calculator of conditional survival estimates for resected and unresected pancreatic cancer survivors. Arch Surg. 2012;147:513–9. doi:10.1001/archsurg.2011.2281.
Article Google Scholar
Khan FM, Zubek VB. Support vector regression for censored data (SVRc): a novel tool for survival analysis. In: Eighth IEEE international conference on data mining. New York: IEEE; 2008. p. 863–868.
Google Scholar
Kharya S. Using data mining techniques for diagnosis and prognosis of cancer disease. Int J Comput Sci Inf Technol. 2012;2:55–66. doi:10.5121/ijcseit.2012.2206.
Google Scholar
Laber EB, Zhao YQ. Tree-based methods for individualized treatment regimes. Biometrika. 2015;102:501–14. doi:10.1093/biomet/asv028.
Article MathSciNet MATH Google Scholar
Lancashire LJ, Lemetre C, Ball GR. An introduction to artificial neural networks in bioinformatics—application to complex microarray and mass spectrometry datasets in cancer studies. Brief Bioinform. 2009;10:315–29. doi:10.1093/bib/bbp012.
Article Google Scholar
LeBlanc M, Crowley J. Relative risk tees for censored survival data. Biometrics. 1992;48:411–25.
Article Google Scholar
LeBlanc M, Kooperberg C. Boosting predictions of treatment success. Proc Natl Acad Sci USA. 2010;107:13559–60. doi:10.1073/pnas.1008052107.
Article Google Scholar
Lisboa PJ, Taktak AFG. The use of artificial neural networks in decision support in cancer: a systematic review. Neural Netw. 2006;19:408–15. doi:10.1016/j.neunet.2005.10.007.
Article MATH Google Scholar
Liu HX, Zhang RS, Luan F, et al. Diagnosing breast cancer based on support vector machines. J Chem Inf Comput Sci. 2003;43:900–7.
Article Google Scholar
Loh W-Y. Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov. 2011;1:14–23. doi:10.1002/widm.8.
Article Google Scholar
Louie KS, Seigneurin A, Cathcart P, Sasieni P. Do prostate cancer risk models improve the predictive accuracy of PSA screening? A meta-analysis. Ann Oncol. 2015;26:848–64. doi:10.1093/annonc/mdu525.
Article Google Scholar
Lowrance WT, Elkin EB, Jacks LM, et al. Comparative effectiveness of surgical treatments for prostate cancer: a population-based analysis of postoperative outcomes. J Urol. 2010;183:1366–72. doi:10.1016/j.juro.2009.12.021.Comparative.
Article Google Scholar
Lundin M, Lundin J, Burke HB, et al. Artificial neural networks applied to survival prediction in breast cancer. Oncology. 1999;57:281–6.
Article Google Scholar
Mayr A, Hofner B, Schmid M. Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection. BMC Bioinform. 2016;17:288. doi:10.1186/s12859-016-1149-8.
Article Google Scholar
Meads C, Ahmed I, Riley RD. A systematic review of breast cancer incidence risk prediction models with meta-analysis of their performance. Breast Cancer Res Treat. 2012;132:365–77. doi:10.1007/s10549-011-1818-2.
Article Google Scholar
Menéndez LÁ, de Cos Juez FJ, Lasheras SF, Riesgo JAÁ. Artificial neural networks applied to cancer detection in a breast screening programme. Math Comput Model. 2010;52:983–91. doi:10.1016/j.mcm.2010.03.019.
Article MathSciNet MATH Google Scholar
Morgan JN, Sonquist JA. Problems in the analysis of survey data, and a proposal. J Am Stat Assoc. 1963;58:415–34. doi:10.1080/01621459.1963.10500855.
Article MATH Google Scholar
Oberije C, De Ruysscher D, Houben R, et al. A validated prediction model for overall survival from stage III non-small cell lung cancer: toward survival prediction for individual patients. Int J Radiat Oncol Biol Phys. 2015;92:935–44. doi:10.1016/j.ijrobp.2015.02.048.
Article Google Scholar
Parks CM. Prognoses should be based on proved indicators not intuition. BMJ. 2000;320:473. doi:10.1136/bmj.320.7233.469.
Article Google Scholar
Penciana MJ, D’Agostino RB. Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Stat Med. 2004;23:2109–23. doi:10.1002/sim.1802.
Article Google Scholar
Pölsterl S, Conjeti S, Navab N, Katouzian A. Survival analysis for high-dimensional, heterogeneous medical data: exploring feature extraction as an alternative to feature selection. Artif Intell Med. 2016;72:1–11. doi:10.1016/j.artmed.2016.07.004.
Article Google Scholar
Royston P, Sauerbrei W. A new measure of prognostic separation in survival data. Stat Med. 2004;23:723–48. doi:10.1002/sim.1621.
Article Google Scholar
Saritas I. Prediction of breast cancer using artificial neural networks. J Med Syst. 2012;36:2901–7. doi:10.1007/s10916-011-9768-0.
Article Google Scholar
Sauerbrei W, Hübner K, Schmoor C, Schumacher M. Validation of existing and development of new prognostic classification schemes in node negative breast cancer. Breast Cancer Res Treat. 1997;42:149–63.
Article Google Scholar
Schapire RE, Freund Y. Boosting—foundations and algorithms. Cambridge: MIT Press; 2012.
MATH Google Scholar
Schoop R, Graf E, Schumacher M. Quantifying the predictive performance of prognostic models for censored survival data with time-dependent covariates. Biometrics. 2008;64:603–10. doi:10.1111/j.l541-0420.2007.00889.x.
Article MathSciNet MATH Google Scholar
Schwarzer G, Vach W, Schumacher M. On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. Stat Med. 2000;19:541–61. doi:10.1002/(SICI)1097-0258(20000229)19:4<541:AID-SIM355>3.0.CO;2-V.
Article Google Scholar
Scutari M, Denis J-B. Bayesian networks: with examples in R. Boca Raton: CRC Press; 2014.
MATH Google Scholar
Sesen MB, Nicholson AE, Banares-Alcantara R, et al. Bayesian networks for clinical decision support in lung cancer care. PLoS ONE. 2013;8:e82349. doi:10.1371/journal.pone.0082349.
Article Google Scholar
Shivaswamy PK, Chu W, Jansche M. A support vector approach to censored targets. In: Seventh IEEE international conference on data mining. New York: IEEE; 2007. p. 655–660.
Google Scholar
Steyerberg EW, Harrell FE, Borsboom GJJM, et al. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001;54:774–81. doi:10.1016/S0895-4356(01)00341-9.
Article Google Scholar
Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology. 2010;21:128–38. doi:10.1097/EDE.0b013e3181c30fb2.Assessing.
Article Google Scholar
Sweilam NH, Tharwat AA, Moniem NKA. Support vector machine for diagnosis cancer disease: a comparative study. Egypt Inform J. 2010;11:81–92. doi:10.1016/j.eij.2010.10.005.
Article Google Scholar
Van Belle V, Pelckmans K, Van Huffel S, Suykens JAK. Support vector methods for survival analysis: A comparison between ranking and regression approaches. Artif Intell Med. 2011;53:107–18.
Article Google Scholar
van Gerven MAJ, Taal BG, Lucas PJF. Dynamic Bayesian networks as prognostic models for clinical patient management. J Biomed Inform. 2008;41:515–29. doi:10.1016/j.jbi.2008.01.006.
Article Google Scholar
van Stiphout RGPM, Postma EO, Valentini V, Lambin P. The contribution of machine learning to predicting cancer outcome. Artif Intell. 2010;350:400.
Google Scholar
Vapnik VN. Statistical learning theory. New york: Wiley Interscience; 1998.
MATH Google Scholar
Wang SJ, Wissel AR, Luh JY, et al. An interactive tool for individualized estimation of conditional survival in rectal cancer. Ann Surg Oncol. 2011;18:1547–52. doi:10.1245/s10434-010-1512-3.
Article Google Scholar
Williams TGS, Cubiella J, Griffin SJ, et al. Risk prediction models for colorectal cancer in people with symptoms: a systematic review. BMC Gastroenterol. 2016;16:63. doi:10.1186/s12876-016-0475-7.
Article Google Scholar
Yosefian I, Mosa Farkhani E, Baneshi MR. Application of random forest survival models to increase generalizability of decision trees: a case study in acute myocardial infarction. Comput Math Methods Med. 2015;2015:576413. doi:10.1155/2015/576413.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

OHSU-PSU School of Public Health, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, 97239, USA
Yiyi Chen
Fariborz Maseeh Department of Mathematics and Statistics, Portland State University, Portland, OR, 97006, USA
Jess A. Millar

Authors

Yiyi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jess A. Millar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiyi Chen .

Editor information

Editors and Affiliations

Graduate School of Medicine, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
Shigeyuki Matsui
Cancer Research and Biostatistics, Seattle, Washington, USA
John Crowley

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, Y., Millar, J.A. (2017). Machine Learning Techniques in Cancer Prognostic Modeling and Performance Assessment. In: Matsui, S., Crowley, J. (eds) Frontiers of Biostatistical Methods and Applications in Clinical Oncology. Springer, Singapore. https://doi.org/10.1007/978-981-10-0126-0_13

Download citation

DOI: https://doi.org/10.1007/978-981-10-0126-0_13
Published: 04 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0124-6
Online ISBN: 978-981-10-0126-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics