Abstract
In this paper, we explore data mining techniques for the task of identifying and describing risk groups for colorectal cancer (CRC) from population based administrative health data. Association rule discovery, association classification and scalable clustering analysis are applied to the colorectal cancer patients’ profiles in contrast to background patients’ profiles. These data mining methods enable us to identify the most common characteristics of the colorectal cancer patients. The knowledge discovered by data mining methods which are quite different from traditional survey approaches. Although it is heuristic, the data mining methods may identify risk groups for further epidemiological study, such as older patients living near health facilities yet seldom utilising those facilities, and with respiratory and circulatory diseases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Colorectal cancer: The importance of prevention and early detection. Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, U.S. Department of Health and Human Services (2004)
Chen, J., He, H., Williams, G., Jin, H.: Temporal sequence associations for rare events. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 235–239. Springer, Heidelberg (2004)
Cios, K.J., Moore, G.W.: Uniqueness of medical data mining. Artificial Intelligence in Medicine 26(1-2), 1–24 (2002)
Gu, L., Li, J., He, H., Williams, G., Hawkins, S., Kelman, C.: Association rule discovery with unbalanced class. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 221–232. Springer, Heidelberg (2003)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
He, H., Chen, J., Jin, H., Hawkins, S., Williams, G., McAullay, D., Sparks, R., Cui, J., Kelman, C.: QLDS: Colorectal cancer data mining analysis. Technical Report 04/92, CSIRO Mathematical and Information Sciences, Canberra (2004)
Jin, H.-D., Leung, K.-S., Wong, M.-L., Xu, Z.-B.: Scalable model-based cluster analysis using clustering features. Pattern Recognition 38(5), 637–649 (2005)
Jin, H.-D., Shum, W., Leung, K.-S., Wong, M.-L.: Expanding self-organizing map for data visualization and cluster analysis. Information Sciences 163, 157–173 (2004)
Jin, H.-D., Wong, M.-L., Leung, K.-S.: Scalable model-based clustering by working on data summaries. In: Proceedings of Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida, USA, November 2003, pp. 91–98 (2003)
Li, J., Shen, H., Topor, R.: Mining the optimal class association rule set. Knowledge-Based Systems 15(7), 399–405 (2002)
McClisha, D., Penberthyb, L., Pughc, A.: Using medicare claims to identify second primary cancers and recurrences in order to supplement a cancer registry. Journal of Clinical Epidemiology 56, 760–767 (2003)
R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2004) ISBN 3-900051-00-3
Rao, R.B., Sandilya, S., Niculescu, R.S., Germond, C., Rao, H.: Clinical and financial outcomes analysis with existing hospital patient records. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 416–425 (2003)
Roddick, J., Fule, P., Graco, W.: Exploratory medical knowledge discovery: Experiences and issues. SIGKDD Exploration 5(1), 94–99 (2003)
Smith, A.E., Anand, S.S.: Patient survival estimation with multiple attributes: adaptation of coxs regression to give an individuals point prediction. In: Proceedings of European Conference in Artificial Intelligence in Intelligent Datamining in Medicine & Pharmacology, Berlin, pp. 51–54 (2000)
Webb, G.I.: Efficient search for association rules. In: Proceedings of SIGKDD 2000, pp. 99–107 (2000)
Williams, G., Vickers, D., Baxter, R., Hawkins, S., Kelman, C., Solon, R., He, H., Gu, L.: The Queensland Linked Data Set. Technical Report CMIS 02/21, CSIRO, Canberra (2002)
Williams, G., Vickers, D., Rainsford, C., Gu, L., He, H., Baxter, R., Hawkins, S.: Bias in the Queensland Linked Data Set. Technical Report 02/117, CSIRO Mathematical and Information Sciences, Canberra (2002)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chen, J., He, H., Jin, H., McAullay, D., Williams, G., Kelman, C. (2006). Identifying Risk Groups Associated with Colorectal Cancer. In: Williams, G.J., Simoff, S.J. (eds) Data Mining. Lecture Notes in Computer Science(), vol 3755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677437_20
Download citation
DOI: https://doi.org/10.1007/11677437_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32547-5
Online ISBN: 978-3-540-32548-2
eBook Packages: Computer ScienceComputer Science (R0)