Identifying Risk Groups Associated with Colorectal Cancer

Chen, Jie; He, Hongxing; Jin, Huidong; McAullay, Damien; Williams, Graham; Kelman, Chris

doi:10.1007/11677437_20

Identifying Risk Groups Associated with Colorectal Cancer

Jie Chen²⁰,
Hongxing He²⁰,
Huidong Jin²⁰,
Damien McAullay²⁰,
Graham Williams^20,21 &
…
Chris Kelman²²

Chapter

3344 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3755))

Abstract

In this paper, we explore data mining techniques for the task of identifying and describing risk groups for colorectal cancer (CRC) from population based administrative health data. Association rule discovery, association classification and scalable clustering analysis are applied to the colorectal cancer patients’ profiles in contrast to background patients’ profiles. These data mining methods enable us to identify the most common characteristics of the colorectal cancer patients. The knowledge discovered by data mining methods which are quite different from traditional survey approaches. Although it is heuristic, the data mining methods may identify risk groups for further epidemiological study, such as older patients living near health facilities yet seldom utilising those facilities, and with respiratory and circulatory diseases.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Colorectal cancer: The importance of prevention and early detection. Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, U.S. Department of Health and Human Services (2004)
Google Scholar
Chen, J., He, H., Williams, G., Jin, H.: Temporal sequence associations for rare events. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 235–239. Springer, Heidelberg (2004)
Chapter Google Scholar
Cios, K.J., Moore, G.W.: Uniqueness of medical data mining. Artificial Intelligence in Medicine 26(1-2), 1–24 (2002)
Article Google Scholar
Gu, L., Li, J., He, H., Williams, G., Hawkins, S., Kelman, C.: Association rule discovery with unbalanced class. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 221–232. Springer, Heidelberg (2003)
Chapter Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Google Scholar
He, H., Chen, J., Jin, H., Hawkins, S., Williams, G., McAullay, D., Sparks, R., Cui, J., Kelman, C.: QLDS: Colorectal cancer data mining analysis. Technical Report 04/92, CSIRO Mathematical and Information Sciences, Canberra (2004)
Google Scholar
Jin, H.-D., Leung, K.-S., Wong, M.-L., Xu, Z.-B.: Scalable model-based cluster analysis using clustering features. Pattern Recognition 38(5), 637–649 (2005)
Article Google Scholar
Jin, H.-D., Shum, W., Leung, K.-S., Wong, M.-L.: Expanding self-organizing map for data visualization and cluster analysis. Information Sciences 163, 157–173 (2004)
Article MathSciNet MATH Google Scholar
Jin, H.-D., Wong, M.-L., Leung, K.-S.: Scalable model-based clustering by working on data summaries. In: Proceedings of Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida, USA, November 2003, pp. 91–98 (2003)
Google Scholar
Li, J., Shen, H., Topor, R.: Mining the optimal class association rule set. Knowledge-Based Systems 15(7), 399–405 (2002)
Article Google Scholar
McClisha, D., Penberthyb, L., Pughc, A.: Using medicare claims to identify second primary cancers and recurrences in order to supplement a cancer registry. Journal of Clinical Epidemiology 56, 760–767 (2003)
Article Google Scholar
R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2004) ISBN 3-900051-00-3
Google Scholar
Rao, R.B., Sandilya, S., Niculescu, R.S., Germond, C., Rao, H.: Clinical and financial outcomes analysis with existing hospital patient records. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 416–425 (2003)
Google Scholar
Roddick, J., Fule, P., Graco, W.: Exploratory medical knowledge discovery: Experiences and issues. SIGKDD Exploration 5(1), 94–99 (2003)
Article Google Scholar
Smith, A.E., Anand, S.S.: Patient survival estimation with multiple attributes: adaptation of coxs regression to give an individuals point prediction. In: Proceedings of European Conference in Artificial Intelligence in Intelligent Datamining in Medicine & Pharmacology, Berlin, pp. 51–54 (2000)
Google Scholar
Webb, G.I.: Efficient search for association rules. In: Proceedings of SIGKDD 2000, pp. 99–107 (2000)
Google Scholar
Williams, G., Vickers, D., Baxter, R., Hawkins, S., Kelman, C., Solon, R., He, H., Gu, L.: The Queensland Linked Data Set. Technical Report CMIS 02/21, CSIRO, Canberra (2002)
Google Scholar
Williams, G., Vickers, D., Rainsford, C., Gu, L., He, H., Baxter, R., Hawkins, S.: Bias in the Queensland Linked Data Set. Technical Report 02/117, CSIRO Mathematical and Information Sciences, Canberra (2002)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT, 2601, Australia
Jie Chen, Hongxing He, Huidong Jin, Damien McAullay & Graham Williams
Australian Taxation Office, 51 Allara Street, Canberra, ACT, 2601, Australia
Graham Williams
National Centre for Epidemiology and Population Health, The Australian National University, Canberra, 0200, ACT, Australia
Chris Kelman

Authors

Jie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hongxing He
View author publications
You can also search for this author in PubMed Google Scholar
Huidong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Damien McAullay
View author publications
You can also search for this author in PubMed Google Scholar
Graham Williams
View author publications
You can also search for this author in PubMed Google Scholar
Chris Kelman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Australian Taxation Office,
Graham J. Williams
School of Computing and Mathematics, University of Western Sydney, Sydney, NSW, Australia
Simeon J. Simoff

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, J., He, H., Jin, H., McAullay, D., Williams, G., Kelman, C. (2006). Identifying Risk Groups Associated with Colorectal Cancer. In: Williams, G.J., Simoff, S.J. (eds) Data Mining. Lecture Notes in Computer Science(), vol 3755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677437_20

Download citation

DOI: https://doi.org/10.1007/11677437_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32547-5
Online ISBN: 978-3-540-32548-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics