Skip to main content

Lexical Acquisition for Clinical Text Mining Using Distributional Similarity

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7182))

Abstract

We describe experiments into the use of distributional similarity for acquiring lexical information from clinical free text, in particular notes typed by primary care physicians (general practitioners). We also present a novel approach to lexical acquisition from ‘sensitive’ text, which does not require the text to be manually anonymised – a very expensive process – and therefore allows much larger datasets to be used than would normally be possible.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bentley, T., Price, C., Brown, P.: Structural and lexical features of successive versions of the Read Codes. In: Teasdale, S. (ed.) Proceedings of the Annual Conference of The Primary Health Care Specialist Group of the British Computer Society, Worcester, UK, pp. 91–103 (1996), http://www.phcsg.org/main/pastconf/camb96/readcode.htm

  2. Curran, J., Moens, M.: Scaling context space. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp. 231–238 (2002)

    Google Scholar 

  3. Fan, J.W., Friedman, C.: Semantic classification of biomedical concepts using distributional similarity. JAMIA 14(4), 467–477 (2007)

    Google Scholar 

  4. Firth, J.R.: A synopsis of linguistic theory 1930-1955. Studies in Linguistic Analysis, 1–32 (1957)

    Google Scholar 

  5. Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New experiments in distributional representations of synonymy. In: Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL), Ann Arbor, MI, pp. 25–32 (2005)

    Google Scholar 

  6. Hamilton, W., Peters, T., Bankhead, C., Sharp, D.: Risk of ovarian cancer in women with symptoms in primary care: population based case-control study. British Medical Journal 339, b2998 (2009)

    Article  Google Scholar 

  7. Henriksson, A., Hassel, M., Kvist, M.: Diagnosis Code Assignment Support using Random Indexing of Patient Records a Qualitative Feasibility Study. In: Peleg, M., Lavrač, N., Combi, C. (eds.) AIME 2011. LNCS, vol. 6747, pp. 348–352. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Johansen, M., Scholl, J., Hasvold, P., Ellingsen, G., Bellika, J.: “Garbage in, garbage out” – extracting disease surveillance data from EPR systems in primary care. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, San Diego, CA, pp. 525–534 (2008)

    Google Scholar 

  9. Kalra, D., Ingram, D.: Electronic health records. In: Zielinski, K., Duplaga, M., Ingram, D. (eds.) Information Technology Solutions for Healthcare. Springer, Heidelberg (2006), http://eprints.ucl.ac.uk/1598/

    Google Scholar 

  10. Koeling, R., Carroll, J., Tate, A.R., Nicholson, A.: Annotating a corpus of clinical text records for learning to recognize symptoms automatically. In: Proceedings of the 3rd Louhi Workshop on Text and Data Mining of Health Documents, Bled, Slovenia, pp. 43–50 (2011)

    Google Scholar 

  11. Koeling, R., McCarthy, D., Carroll, J.: Domain-specific sense distributions and predominant sense acquisition. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 419–426 (2005)

    Google Scholar 

  12. Koeling, R., Tate, A.R., Carroll, J.: Automatically estimating the incidence of symptoms recorded in GP free text notes. In: Proceedings of the First International Workshop on Managing Interoperability and Complexity in Health Systems, Glasgow, UK, pp. 43–50 (2011)

    Google Scholar 

  13. Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the ACL, Montreal, Canada, pp. 768–774 (1998)

    Google Scholar 

  14. McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Unsupervised acquisition of predominant word senses. Computational Linguistics 33(4), 553–590 (2007)

    Article  Google Scholar 

  15. NIST: Proceedings of the 2011 Text REtrieval Conference (TREC 2011). National Institute for Standards in Technology, Gaithersburg, MD (2011)

    Google Scholar 

  16. Pestian, J., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of BioNLP 2007: Biological, Translational, and Clinical Language Processing, Prague, Czech Republic, pp. 97–104 (2007)

    Google Scholar 

  17. van der Plas, L., Tiedemann, J.: Finding medical term variations using parallel corpora and distributional similarity. In: Proceedings of the 6th Workshop on Ontologies and Lexical Resources, Beijing, China, pp. 28–37 (2010)

    Google Scholar 

  18. Resnik, P., Niv, M., Nossal, M., Kapit, A., Toren, R.: Communication of clinically relevant information in electronic health records: a comparison between structured data and unrestricted physician language. Perspectives in Health Information Management (2008)

    Google Scholar 

  19. Roberts, A., Gaizauskas, R., Hepple, M., Guo, Y.: Mining clinical relationships from patient narratives. BMC Bioinformatics 9(suppl. 11), S3 (2008)

    Article  Google Scholar 

  20. Tate, A.R., Martin, A., Ali, A., Cassell, J.: Using free text information to explore how and when GPs code a diagnosis of ovarian cancer: an observational study using primary care records of patients with ovarian cancer. BMJ. Open. (2011) doi:10.1136/bmjopen-2010-000025

    Google Scholar 

  21. Uzuner, Ö., Goldstein, I., Luo, Y., Kohane, I.: Identifying patient smoking status from medical discharge records. JAMIA 15(1), 14–24 (2008)

    Google Scholar 

  22. Uzuner, Ö., Solti, I., Cadag, E.: Extracting medication information from clinical text. JAMIA 17(5), 514–518 (2010)

    Google Scholar 

  23. Weeds, J., Dowdall, J., Schneider, G., Keller, B., Weir, D.: Using distributional similarity to organise biomedical terminology. Terminology 11(1), 107–141 (2005)

    Article  Google Scholar 

  24. Weeds, J., Weir, D.: Co-occurrence Retrieval: a flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–476 (2005)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carroll, J., Koeling, R., Puri, S. (2012). Lexical Acquisition for Clinical Text Mining Using Distributional Similarity. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28601-8_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28600-1

  • Online ISBN: 978-3-642-28601-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics