Skip to main content

Utilizing Data Mining for Predictive Modeling of Colorectal Cancer Using Electronic Medical Records

  • Conference paper
Book cover Brain Informatics and Health (BIH 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8609))

Included in the following conference series:

Abstract

Colorectal cancer (CRC) is a relatively common cause of death around the globe. Predictive models for the development of CRC could be highly valuable and could facilitate an early diagnosis and increased survival rates. Currently available predictive models are improving, but do not fully utilize the wealth of data available about patients in routine care nor do they take advantage of the developments in the area of data mining. In this paper, a first attempt to generate a predictive model using the CHAID decision tree learner based on anonymously extracted Electronic Medical Records is reported, showing an area under the curve (AUC) of .839 for the adult population and .702 for the age group between 55 and 75.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breiman, L.: Bagging predictors. Machine Learning 26, 123–140 (1996)

    Google Scholar 

  2. Ferlay, J., Parkin, D.M., Steliarova-Foucher, E.: Estimates of cancer incidence and mortality in Europe in 2008. European Journal of Cancer 46(4), 765–781 (2010)

    Article  Google Scholar 

  3. Grobbee, D.E., Hoes, A.W., Verheij, T.J., Schrijvers, A.J., van Ameijden, E.J., Numans, M.E.: The Utrecht Health Project: optimization of routine healthcare data for research. Eur. J. Epidemiol. 20(3), 285–287 (2005)

    Article  Google Scholar 

  4. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a Receiver Operating Characteristic (ROC) curve. Radiology 143, 29–36 (1982)

    Article  Google Scholar 

  5. Hippisley-Cox, J., Coupland, C.: Identifying patients with suspected colorectal cancer in primary care: derivation and validation of an algorithm. British Journal of GeneralPractice 62(594), e29–e37 (2012)

    Google Scholar 

  6. Kass, G.V.: An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics 29(2), 119–127 (1980)

    Article  Google Scholar 

  7. Lamberts, H., Wood, M., Hofmans-Okkes, I.M.: International primary care classifications: the effect of fifteen years of evolution. Fam. Pract. 9(3), 330–339 (1992)

    Article  Google Scholar 

  8. Laxman, S., Sastry, P.: A survey of temporal data mining. In: SADHANA, Academy Proceedings in Engineering Sciences, vol. 31 (2006)

    Google Scholar 

  9. Marshall, T., Lancashire, R., Sharp, D., Peters, T.J., Cheng, K.K., Hamilton, W.: The diagnostic performance of scoring systems to identify symptomatic colorectal cancer compared to current referral guidance. Gut. 60(9), 1242–1248 (2011)

    Article  Google Scholar 

  10. Patnaik, D., Butler, P., Ramakrishnan, N., Parida, L., Keller, B.J., Hanauer, A.: Experiences with Mining Temporal Event Sequences from Electronic Medical Records. In: Proc. of ACM SIGKDD, pp. 360–368 (2011)

    Google Scholar 

  11. Post, A.R., Harrison, J.H.: Temporal data mining. Clinics in Laboratory Medicine 28(1), 83–100 (2008)

    Article  Google Scholar 

  12. Quinlan, R.: Data Mining Tools See5 and C5.0 (2003), http://www.rulequest.com

  13. Riboli, E., et al.: European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutrition 5(6b), 1113–1124 (2002)

    Article  Google Scholar 

  14. Zhang, J., Silvescu, A., Honavar, V.G.: Ontology-driven induction of decision trees at multiple levels of abstraction. In: Koenig, S., Holte, R. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, p. 316. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hoogendoorn, M., Moons, L.M.G., Numans, M.E., Sips, RJ. (2014). Utilizing Data Mining for Predictive Modeling of Colorectal Cancer Using Electronic Medical Records. In: Ślȩzak, D., Tan, AH., Peters, J.F., Schwabe, L. (eds) Brain Informatics and Health. BIH 2014. Lecture Notes in Computer Science(), vol 8609. Springer, Cham. https://doi.org/10.1007/978-3-319-09891-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09891-3_13

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09890-6

  • Online ISBN: 978-3-319-09891-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics