Identifying fall-related injuries: Text mining the electronic medical record

  • Monica Chiarini Tremblay
  • Donald J. Berndt
  • Stephen L. Luther
  • Philip R. Foulis
  • Dustin D. French


Unintentional injury due to falls is a serious and expensive health problem among the elderly. This is especially true in the Veterans Health Administration (VHA) ambulatory care setting, where nearly 40% of the male patients are 65 or older and at risk for falls. Health service researchers and clinicians can utilize VHA administrative data to identify and explore the frequency and nature of fall-related injuries (FRI) to aid in the implementation of clinical and prevention programs. Here we define administrative data as structured (coded) values that are generated as a result clinical services provided to veterans and stored in databases. However, the limitations of administrative data do not always allow for conclusive decision making, especially in areas where coding may be incomplete. This study utilizes data and text mining techniques to investigate if unstructured text-based information included in the electronic medical record can validate and enhance those records in the administrative data that should have been coded as fall-related injuries. The challenges highlighted by this study include data extraction and preparation from administrative sources and the full electronic medical records, de-indentifying the data (to assure HIPAA compliance), conducting chart reviews to construct a “gold standard” dataset, and performing both supervised and unsupervised text mining techniques in comparison with traditional medical chart review.


Healthcare informatics Electronic medical records Text mining Cluster analysis Latent semantic indexing Veterans administration 



The authors acknowledge research support of resources and use of facilities provided by the James A. Haley Veterans’ Hospital in Tampa, Florida.


  1. 1.
    Thomas EJ, Studdert DM, Brennan TA (2002) The reliability of medical record review for estimating adverse event rates. Ann Intern Med 136(11):812–816Google Scholar
  2. 2.
    Baker DW et al (2007) Automated review of electronic health records to assess quality of care for outpatients with heart failure. Ann Intern Med 146(4):270–277Google Scholar
  3. 3.
    Lee IN, Liao SC, Embrechts M (2000) Data mining techniques applied to medical information. Med Inform Internet Med 25(2):81–102CrossRefGoogle Scholar
  4. 4.
    Hunt P et al (2007) Completeness and accuracy of international classification of disease (ICD) external cause of injury codes in emergency department electronic data. Inj Prev 13(6):422–425CrossRefGoogle Scholar
  5. 5.
    Kannus P et al (1999) Fall-induced injuries and deaths among older adults. JAMA 281(20):1895–1899CrossRefGoogle Scholar
  6. 6.
    Rizzo JA et al (1998) Med Care 36(8):1174–1188CrossRefGoogle Scholar
  7. 7.
    Scuffham P, Chaplin S, Legood R (2003) Incidence and costs of unintentional falls in older people in the United Kingdom. J Epidemiol Community Health 57(9):740–744CrossRefGoogle Scholar
  8. 8.
    Koski K et al (1998) Risk factors for major injurious falls among the home-dwelling elderly by functional abilities. Gerontology 44:232–238CrossRefGoogle Scholar
  9. 9.
    Cesari M et al (2002) Prevalence and risk factors for falls in an older community-dwelling population. J Gerontol A Biol Sci Med Sci 57:722–726 Google Scholar
  10. 10.
    Nevitt MC, Cummings SR, Hudes ES (1991) Risk factors for injurious falls: a prospective study. J Gerontol 46:M164–M170Google Scholar
  11. 11.
    Nevitt MC, Cummings SR, Kidd S (1989) Risk factors for recurrent nonsyncopal falls: a prospective study. JAMA 261:2663–2668CrossRefGoogle Scholar
  12. 12.
    Rubenstein L, Joephson K (2002) The epidemiology of falls and syncope. In: Kenny RA, Oshea D (eds) Falls and syncope in elderly patients. Clinics in Geriatric Medicine, pp 141–158Google Scholar
  13. 13.
    Tinetti M, Speechley M, Ginter S (1998) Risk factors for falls among elderly persons living in the community. N Eng J Med 319:1703–1707Google Scholar
  14. 14.
    Jager T et al (2000) Traumatic brain injuries evaluated in U.S. emergency departments, 1992–1994. Acad Emerg Med 7(2):134–140CrossRefGoogle Scholar
  15. 15.
    Klein R, Stockford D (2000) The changing veteran population: 1999–2020. Office of the DAS for Program and Data AnalysesGoogle Scholar
  16. 16.
    Luther S et al (2005) Fall-related ambulatory care services in the veterans administration healthcare system. Aging Clin Exp Res 17(5):412–418Google Scholar
  17. 17.
    Kraft MR, Desouza KC, Androwich I (2003) Data mining in healthcare information systems: case study of a VeteransÂ’ administration spinal cord injury population. In: HICCS, Hawaii Google Scholar
  18. 18.
    Feldman R, Dagan I (1995) Knowledge discovery in textual databases (KDT). In: Proceeding of 1st international conference on knowledge discovery (KDD-95)Google Scholar
  19. 19.
    Loh S, Oliveira JPMD, Gameiro MA (2003) Knowledge discovery in texts for constructing decision support systems. Appl Intell 18:357–366CrossRefGoogle Scholar
  20. 20.
    Ribbeck BM, Runge JW, Thomason M (1992) Injury surveillance: a method for recording e codes for injured emergency department patients. Ann Emerg Med 21:37–40CrossRefGoogle Scholar
  21. 21.
    Coben J et al (2001) Completeness of cause of injury coding in healthcare administrative databases in the United States. Inj Prev 12(3):199–201CrossRefGoogle Scholar
  22. 22.
    Lawrence B et al (2007) Issues in using state hospital discharge data in injury control research and surveillance. Accid Anal Prev 39(2):319–325CrossRefGoogle Scholar
  23. 23.
    The American Geriatrics Society (2001) B.G.s.a.A.A.o.O.S.p.o.F.P., Guideline for the prevention of falls in older persons. J Am Geriatr Assoc 49(5):664–672Google Scholar
  24. 24.
    Nguyen TV, Eisman JA, Kelly PJ, Sambrook PN (1996) Risk factors for osteoporotic fractures in elderly men. Am J Epidemiol 144(3):255–263 Google Scholar
  25. 25.
    Kraft MR, Desouza KC, Androwich I (2003) Data mining in healthcare information systems: case study of a Veterans’ administration spinal cord injury population. In: Proceedings of the 36th Hawaii international conference on system sciences, HawaiiGoogle Scholar
  26. 26.
    Rubenstein LZ, Josephson KR, Robbins AS (1994) Falls in the nursing home. Ann Intern Med 121(6):442–451Google Scholar
  27. 27.
    Rubenstein LZ, Powers CM, MacLean CH (2001) Quality indicators for the management and prevention of falls and mobility problems in vulnerable elders. Ann Intern Med 135(8, Part 2):686–693Google Scholar
  28. 28.
    Nevitt MC (1997) Falls in the elderly: risk factors and prevention. In: ed. Masdeu JC SL, Wolfson L (eds) Gait disorders in aging. Lippincott-Raven, PhiladelphiaGoogle Scholar
  29. 29.
    Yates JS et al (2002) Falls in community-dwelling stroke survivors: an accumulated impairments model. J Rehabil Res Dev 39:385–393Google Scholar
  30. 30.
    Evans DA, Patel VL (1992) Advanced models of cognition for medical training and practice: proceedings of the NATO advanced research workshop on advanced models of cognition for medical training and practice, held at Il Ciocco, Barga, Italy, June 19–22, 1991. SpringerGoogle Scholar
  31. 31.
    Stead WW et al (1994) Designing medical informatics research and library-resource projects to increase what is learned. J Am Med Inform Assoc 1(1):28–33Google Scholar
  32. 32.
    Hripcsak G, Rothschild AS (2005) Agreement, the F-measure, and reliability in information retrieval. J Am Med Inform Assoc 12(3):296–298CrossRefGoogle Scholar
  33. 33.
    Ware H, Mullett CJ, Jagannathan V (2009) Natural language processing framework to assess clinical conditions. J Am Med Inform Assoc 16(4):585–589CrossRefGoogle Scholar
  34. 34.
    Brown SHE et al (2008) eQuality for all: extending automated quality measurement of free text clinical narratives. In: AMIA 2008, Washington DCGoogle Scholar
  35. 35.
    Unified Medical Language System. Available from:
  36. 36.
    Woodfield T (2003) Text mining using SAS Software course notesGoogle Scholar
  37. 37.
    Wei C-P, Yang CC, Lin C-M (2008) A latent semantic indexing-based approach to multilingual document clustering. Decis Support Syst 45(3):606–620CrossRefGoogle Scholar
  38. 38.
    Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407 CrossRefGoogle Scholar
  39. 39.
    Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305CrossRefGoogle Scholar
  40. 40.
    Berry MW, Browne M (1999) Understanding search engines: mathematical modeling and text retrieval. P.S.f.I.a.A. MathematicsGoogle Scholar
  41. 41.
    Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:398–403Google Scholar
  42. 42.
    Han J, Kamber M (2001) Data mining: concepts and techniques. The Morgan Kaufmann Series in Data Management. M.K. Publishers. San Diego, Google Scholar
  43. 43.
    Spangler S, Kreulen JT (2008) Mining the talk: unlocking the business value in unstructured information. IBM Press/Pearson plc, Upper Saddle River, xix, 217 ppGoogle Scholar
  44. 44.
    Dash M, Liu H, Yao J (1997) Dimensionality reduction of unsupervised data. In: 9th International conference on tools with artificial intelligence. IEEE Computer Society, Washington DC, New Port Beach, CAGoogle Scholar
  45. 45.
    Tremblay MC, Berndt DJ, Studnicki J (2006) Feature selection for predicting surgical outcomes. In: Proceedings of the 39th annual Hawaii international conference on system sciences (HICSS’06)Google Scholar
  46. 46.
    Barbara D et al (1997) The new jersey data reduction report. IEEE Data Eng Bull 20(4):3–45Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Monica Chiarini Tremblay
    • 1
  • Donald J. Berndt
    • 2
  • Stephen L. Luther
    • 3
  • Philip R. Foulis
    • 4
  • Dustin D. French
    • 5
    • 6
  1. 1.Decision Sciences and Information SystemsFlorida International University College of Business AdministrationMiamiUSA
  2. 2.Information Systems and Decision SciencesUniversity of South Florida College of BusinessTampaUSA
  3. 3.HSR&D/RR&D Center of Excellence: Maximizing Rehabilitation OutcomesJames A. Haley Veterans HospitalTampaUSA
  4. 4.James A. Haley Veterans HospitalTampaUSA
  5. 5.Indianapolis VA Center of Excellence, Regenstrief Institute IncIndianapolisUSA
  6. 6.Division of General Internal Medicine and GeriatricsIndiana University School of MedicineIndianapolisUSA

Personalised recommendations