Analysis of Medical Documents with Text Mining and Association Rule Mining

  • Ruth ReáteguiEmail author
  • Sylvie Ratté
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 918)


Text mining techniques extracts meaningful information from large amounts of semi-structured and unstructured texts. In this work, the MetaMap tool was used to extract medical entities like diseases and syndromes from discharge summaries. Also, association rule mining algorithms such as Apriori and FP-Growth were applied to the extracted entities in order to find associations between them. The dataset used consists of 1237 discharge summaries obtained from the 2008 i2b2 Obesity Challenge. The rules that have a principal diagnosis as antecedent showed that the cardiac disease frequently occurred with other diseases like hypertension and diabetes. Most of the rules describe associations between diabetes and other diseases like hypertension, dyslipidemia, nephropathy, heart disease, lung diseases, and arthritis. These rules have a confidence parameter of above 0.5.


Clinical text Text mining Association rule mining 


  1. 1.
    Chiaramello, E., Paglialonga, A., Pinciroli, F., Tognola, G.: Attempting to use MetaMap in clinical practice: a feasibility study on the identification of medical concepts from italian clinical notes. Stud. Health Technol. Inform. 228, 28–32 (2016)Google Scholar
  2. 2.
    Reategui, R., Ratte, S.: Comparison of MetaMap and cTAKES for entity extraction in clinical notes. BMC Med. Inform. Decis. Mak. 18, 74 (2018)CrossRefGoogle Scholar
  3. 3.
    Pradhan, S., Elhadad, N., South, B.R., Martinez, D., Christensen, L., Vogel, A., Suominen, H., Chapman, W.W., Savova, G.: Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J. Am. Med. Inf. Assoc.: JAMIA 22, 143–154 (2015)CrossRefGoogle Scholar
  4. 4.
    Sun, W., Cai, Z., Li, Y., Liu, F., Fang, S., Wang, G.: Data processing and text mining technologies on electronic medical records: a review. J. Healthc. Eng. 2018 (2018). 4302425CrossRefGoogle Scholar
  5. 5.
    Miner, G., Delen, D., Elder, J., Fast, A., Hill, T., Nisbet, R.A.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Elsevier Inc., New York (2012)Google Scholar
  6. 6.
    Weiss, S., Indurkhya, N., Zhang, T., Damerau, F.: Text Mining Predictive Methods (2005)Google Scholar
  7. 7.
    Bukhanov, N., Balakhontceva, M., Krikunov, A., Sabirov, A., Semakova, A., Zvartau, N., Konradi, A.: Clustering of comorbidities based on conditional probabilities of diseases in hypertensive patients. Procedia Comput. Sci. 108, 2478–2487 (2017)CrossRefGoogle Scholar
  8. 8.
    Kang’ethe, S., Wagacha, P.: Extracting Diagnosis Patterns in Electronic Medical Records using Association Rule Mining (2014)Google Scholar
  9. 9.
    Kim, H.S., Shin, A.M., Kim, M.K., Kim, Y.N.: Comorbidity study on type 2 diabetes mellitus using data mining. Korean J. Internal Med. 27, 197–202 (2012)CrossRefGoogle Scholar
  10. 10.
    Lakshmi, K.S., Vadivu, G.: Extracting association rules from medical health records using multi-criteria decision analysis. Procedia Comput. Sci. 115, 290–295 (2017)CrossRefGoogle Scholar
  11. 11.
    Raghavan, P.: Medical Event Timeline Generation from Clinical Narratives. Doctor of Philosophy, The Ohio State University (2014)Google Scholar
  12. 12.
    Uzuner, Ö.: Recognizing obesity and comorbidities in sparse data. JAMIA 16, 561–570 (2009)Google Scholar
  13. 13.
    Aronson, A.R., Lang, F.-M.: An overview of MetaMap: historical perspective and recent advances. JAMIA 17, 229–236 (2010)Google Scholar
  14. 14.
    Aronso, A.: Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. In: AMIA Annual Symposium Proceedings 2001, pp. 17–21 (2001)Google Scholar
  15. 15.
    Kotsiantis, S., Kanellopoulos, D.: Association rules mining: a recent overview. GESTS Int. Trans. Comput. Sci. Eng. 32, 71–82 (2006)Google Scholar
  16. 16.
    Han, J.W., Pei, J., Yin, Y.W.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29, 1–12 (2000)CrossRefGoogle Scholar
  17. 17.
    Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., Chouvarda, I.: Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 15, 104–116 (2017)CrossRefGoogle Scholar
  18. 18.
    Aronson, D., Edelman, E.R.: Coronary artery disease and diabetes mellitus. Cardiol. Clin. 32, 439–455 (2014)CrossRefGoogle Scholar
  19. 19.
    Aune, D., Feng, T., Schlesinger, S., Janszky, I., Norat, T., Riboli, E.: Diabetes mellitus, blood glucose and the risk of atrial fibrillation: a systematic review and meta-analysis of cohort studies. J. Diabetes Complications 32, 501–511 (2018)CrossRefGoogle Scholar
  20. 20.
    Long, A.N., Dagogo-Jack, S.: Comorbidities of diabetes and hypertension: mechanisms and approach to target organ protection. J. Clin. Hypertens. (Greenwich) 13, 244–251 (2011)CrossRefGoogle Scholar
  21. 21.
    Lin, P.J., Kent, D.M., Winn, A., Cohen, J.T., Neumann, P.J.: Multiple chronic conditions in type 2 diabetes mellitus: prevalence and consequences. Am. J. Manag. Care 21, e23–e34 (2015)Google Scholar
  22. 22.
    Edeoga, C., Owei, I., Siwakoti, K., Umekwe, N., Ceesay, F., Wan, J., Dagogo-Jack, S.: Relationships between blood pressure and blood glucose among offspring of parents with type 2 diabetes: prediction of incident dysglycemia in a biracial cohort. J. Diabetes Complications 31, 1580–1586 (2017)CrossRefGoogle Scholar
  23. 23.
    Wang, Y.-Z., Xu, W.-W., Zhu, D.-Y., Zhang, N., Wang, Y.-L., Ding, M., Xie, X.-M., Sun, L.-L., Wang, X.-X.: Specific expression network analysis of diabetic nephropathy kidney tissue revealed key methylated sites. J. Cell. Physiol. 233, 7139–7147 (2018)CrossRefGoogle Scholar
  24. 24.
    Tziomalos, K., Athyros, V.G.: Diabetic nephropathy: new risk factors and improvements in diagnosis. Rev. Diabet. Stud. 12, 110–118 (2015)CrossRefGoogle Scholar
  25. 25.
    Thompson, G.R.: Management of dyslipidaemia. Heart 90, 949–955 (2004)CrossRefGoogle Scholar
  26. 26.
    Anderson, A.E., Kerr, W.T., Thames, A., Li, T., Xiao, J.Y., Cohen, M.S.: Electronic health record phenotyping improves detection and screening of type 2 diabetes in the general United States population: a cross-sectional, unselected, retrospective study. J. Biomed. Inform. 60, 162–168 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.École de technologie supérieureMontrealCanada
  2. 2.Universidad Técnica Particular de LojaLojaEcuador

Personalised recommendations