Skip to main content
Book cover

Data Mining pp 260–272Cite as

Identifying Risk Groups Associated with Colorectal Cancer

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3755))

Abstract

In this paper, we explore data mining techniques for the task of identifying and describing risk groups for colorectal cancer (CRC) from population based administrative health data. Association rule discovery, association classification and scalable clustering analysis are applied to the colorectal cancer patients’ profiles in contrast to background patients’ profiles. These data mining methods enable us to identify the most common characteristics of the colorectal cancer patients. The knowledge discovered by data mining methods which are quite different from traditional survey approaches. Although it is heuristic, the data mining methods may identify risk groups for further epidemiological study, such as older patients living near health facilities yet seldom utilising those facilities, and with respiratory and circulatory diseases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Colorectal cancer: The importance of prevention and early detection. Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, U.S. Department of Health and Human Services (2004)

    Google Scholar 

  2. Chen, J., He, H., Williams, G., Jin, H.: Temporal sequence associations for rare events. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 235–239. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Cios, K.J., Moore, G.W.: Uniqueness of medical data mining. Artificial Intelligence in Medicine 26(1-2), 1–24 (2002)

    Article  Google Scholar 

  4. Gu, L., Li, J., He, H., Williams, G., Hawkins, S., Kelman, C.: Association rule discovery with unbalanced class. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 221–232. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  6. He, H., Chen, J., Jin, H., Hawkins, S., Williams, G., McAullay, D., Sparks, R., Cui, J., Kelman, C.: QLDS: Colorectal cancer data mining analysis. Technical Report 04/92, CSIRO Mathematical and Information Sciences, Canberra (2004)

    Google Scholar 

  7. Jin, H.-D., Leung, K.-S., Wong, M.-L., Xu, Z.-B.: Scalable model-based cluster analysis using clustering features. Pattern Recognition 38(5), 637–649 (2005)

    Article  Google Scholar 

  8. Jin, H.-D., Shum, W., Leung, K.-S., Wong, M.-L.: Expanding self-organizing map for data visualization and cluster analysis. Information Sciences 163, 157–173 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  9. Jin, H.-D., Wong, M.-L., Leung, K.-S.: Scalable model-based clustering by working on data summaries. In: Proceedings of Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida, USA, November 2003, pp. 91–98 (2003)

    Google Scholar 

  10. Li, J., Shen, H., Topor, R.: Mining the optimal class association rule set. Knowledge-Based Systems 15(7), 399–405 (2002)

    Article  Google Scholar 

  11. McClisha, D., Penberthyb, L., Pughc, A.: Using medicare claims to identify second primary cancers and recurrences in order to supplement a cancer registry. Journal of Clinical Epidemiology 56, 760–767 (2003)

    Article  Google Scholar 

  12. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2004) ISBN 3-900051-00-3

    Google Scholar 

  13. Rao, R.B., Sandilya, S., Niculescu, R.S., Germond, C., Rao, H.: Clinical and financial outcomes analysis with existing hospital patient records. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 416–425 (2003)

    Google Scholar 

  14. Roddick, J., Fule, P., Graco, W.: Exploratory medical knowledge discovery: Experiences and issues. SIGKDD Exploration 5(1), 94–99 (2003)

    Article  Google Scholar 

  15. Smith, A.E., Anand, S.S.: Patient survival estimation with multiple attributes: adaptation of coxs regression to give an individuals point prediction. In: Proceedings of European Conference in Artificial Intelligence in Intelligent Datamining in Medicine & Pharmacology, Berlin, pp. 51–54 (2000)

    Google Scholar 

  16. Webb, G.I.: Efficient search for association rules. In: Proceedings of SIGKDD 2000, pp. 99–107 (2000)

    Google Scholar 

  17. Williams, G., Vickers, D., Baxter, R., Hawkins, S., Kelman, C., Solon, R., He, H., Gu, L.: The Queensland Linked Data Set. Technical Report CMIS 02/21, CSIRO, Canberra (2002)

    Google Scholar 

  18. Williams, G., Vickers, D., Rainsford, C., Gu, L., He, H., Baxter, R., Hawkins, S.: Bias in the Queensland Linked Data Set. Technical Report 02/117, CSIRO Mathematical and Information Sciences, Canberra (2002)

    Google Scholar 

  19. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chen, J., He, H., Jin, H., McAullay, D., Williams, G., Kelman, C. (2006). Identifying Risk Groups Associated with Colorectal Cancer. In: Williams, G.J., Simoff, S.J. (eds) Data Mining. Lecture Notes in Computer Science(), vol 3755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677437_20

Download citation

  • DOI: https://doi.org/10.1007/11677437_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32547-5

  • Online ISBN: 978-3-540-32548-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics