Data Mining Techniques in Health Informatics: A Case Study from Breast Cancer Research

  • Jing LuEmail author
  • Alan Hales
  • David Rew
  • Malcolm Keech
  • Christian Fröhlingsdorf
  • Alex Mills-Mullett
  • Christian Wette
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9267)


This paper presents a case study of using data mining techniques in the analysis of diagnosis and treatment events related to Breast Cancer disease. Data from over 16,000 patients has been pre-processed and several data mining techniques have been implemented by using Weka (Waikato Environment for Knowledge Analysis). In particular, Generalized Sequential Patterns mining has been used to discover frequent patterns from disease event sequence profiles based on groups of living and deceased patients. Furthermore, five models have been evaluated in Classification with the objective to classify the patients based on selected attributes. This research showcases the data mining process and techniques to transform large amounts of patient data into useful information and potentially valuable patterns to help understand cancer outcomes.


Health informatics Database technology Clinical data environment Electronic patient records Breast cancer datasets Data mining techniques Knowledge discovery 


  1. 1.
    Burke, H.B., Rosen, D., Goodman, P.: Comparing the prediction accuracy of artificial neural networks and other statistical models for breast cancer survival. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, pp. 1063–1068. MIT Press, Cambridge (1995)Google Scholar
  2. 2.
    Campbell, K., Thygeson, N.N., Srivastava, J., Speedie, S.: Exploration of Classification Techniques as a Treatment Decision Support Tool for Patients with Uterine Fibroids. In: International Workshop on Data Mining for HealthCare Management, PAKDD (2010)Google Scholar
  3. 3.
    Delen, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med. 34(2), 113–127 (2005)CrossRefGoogle Scholar
  4. 4.
    Fayyad, U., PiatetskyShapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Magazine. 17(3), 37–54 (1996)Google Scholar
  5. 5.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann. (2011)Google Scholar
  6. 6.
    Jacob, S.G., Ramani, R.G.: Data mining in clinical data sets: a review. Int. J. Appl. Inf. Syst. 4(6), 15–16 (2012)Google Scholar
  7. 7.
    Jerez-Aragones, J.M., Gomez-Ruiz, J.A., Ramos-Jimenez, G., MunozPerez, J., Alba-Conejo, E.: A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif. Intell. Med. 27(1), 45–63 (2003)CrossRefGoogle Scholar
  8. 8.
    Holzinger, A.: Trends in interactive knowledge discovery for personalized medicine: cognitive science meets machine learning. IEEE Intell. Inform. Bull. 15(1), 6–14 (2014)Google Scholar
  9. 9.
    Laxminarayan, P., Alvarez, S.A., Ruiz, C., Moonis, M.: Mining statistically significant associations for exploratory analysis of human sleep data. IEEE Trans. Inf Technol. Biomed. 10(3), 440–450 (2006)CrossRefGoogle Scholar
  10. 10.
    Lee, Y.J., Mangasarian, O.L., Wolberg, W.H.: Survival-time classification of breast cancer patients. Comput. Optim. Appl. 25(1–3), 151–166 (2003)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Li, Q., Feng, J., Wang, L., Chu, H., Yu, H.: Method for knowledge acquisition and decision-making process analysis in clinical decision support system. In: Bursa, M., Khuri, S., Renda, M. (eds.) ITBAM 2014. LNCS, vol. 8649, pp. 79–82. Springer, Heidelberg (2014)Google Scholar
  12. 12.
    Lu, J., Chen, W.R., Adjei, O., Keech, M.: Sequential patterns post-processing for structural relation patterns mining. Int. J. Data Warehousing and Mining 4(3), 71–89 (2008). IGI Global, Hershey, PennsylvaniaCrossRefGoogle Scholar
  13. 13.
    Mahajan, R., Shneiderman, B.: Visual and textual consistency checking tools for graphical user interfaces. IEEE Trans. Software Eng. 23(11), 722–735 (1997)CrossRefGoogle Scholar
  14. 14.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11(1), 10–11 (2009)CrossRefGoogle Scholar
  15. 15.
    Martin, M.A., Meyricke, R., O’Neill, T., Roberts, S.: Mastectomy or breast conserving surgery? factors affecting type of surgical treatment for breast cancer: a classification tree approach. BMC Cancer 6, 98 (2006)CrossRefGoogle Scholar
  16. 16.
    Quinlan, J. Ross. C4.5: Programs for Machine Learning. Elsevier (2014)Google Scholar
  17. 17.
    Razavi, A.R., Gill, H., Ahlfeldt, H., Shahsavar, N.: Predicting metastasis in breast cancer: comparing a decision tree with domain experts. J. Med. Syst. 31, 263–273 (2007)CrossRefGoogle Scholar
  18. 18.
    Reps, J., Garibaldi, J.M., Aickelin, U., Soria, D., Gibson, J.E., Hubbard, R.B.: Discovering Sequential Patterns in a UK General Practice Database. In: IEEE-EMBS International Conference on Biomedical and Health Informatics, pp. 960–963 (2012)Google Scholar
  19. 19.
    Rew, D.A.: Understanding outcomes in cancer surgery through time structured patient records. Indian J. Surg. Oncol. 2(4), 265–270 (2011)CrossRefzbMATHGoogle Scholar
  20. 20.
    Stolba, N., Tjoa, A.: The relevance of data warehousing and data mining in the field of evidence-based medicine to support healthcare decision making. Int. J. Comput. Syst. Sci. Eng. 3(3), 143–148 (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Jing Lu
    • 1
    Email author
  • Alan Hales
    • 1
    • 2
  • David Rew
    • 2
  • Malcolm Keech
    • 3
  • Christian Fröhlingsdorf
    • 1
  • Alex Mills-Mullett
    • 1
  • Christian Wette
    • 1
  1. 1.Southampton Solent UniversitySouthamptonUK
  2. 2.University Hospital SouthamptonSouthamptonUK
  3. 3.University of BedfordshireLutonUK

Personalised recommendations