Lexical Acquisition for Clinical Text Mining Using Distributional Similarity

Carroll, John; Koeling, Rob; Puri, Shivani

doi:10.1007/978-3-642-28601-8_20

Lexical Acquisition for Clinical Text Mining Using Distributional Similarity

John Carroll¹⁷,
Rob Koeling¹⁷ &
Shivani Puri¹⁸

Conference paper

1382 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7182))

Abstract

We describe experiments into the use of distributional similarity for acquiring lexical information from clinical free text, in particular notes typed by primary care physicians (general practitioners). We also present a novel approach to lexical acquisition from ‘sensitive’ text, which does not require the text to be manually anonymised – a very expensive process – and therefore allows much larger datasets to be used than would normally be possible.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bentley, T., Price, C., Brown, P.: Structural and lexical features of successive versions of the Read Codes. In: Teasdale, S. (ed.) Proceedings of the Annual Conference of The Primary Health Care Specialist Group of the British Computer Society, Worcester, UK, pp. 91–103 (1996), http://www.phcsg.org/main/pastconf/camb96/readcode.htm
Curran, J., Moens, M.: Scaling context space. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp. 231–238 (2002)
Google Scholar
Fan, J.W., Friedman, C.: Semantic classification of biomedical concepts using distributional similarity. JAMIA 14(4), 467–477 (2007)
Google Scholar
Firth, J.R.: A synopsis of linguistic theory 1930-1955. Studies in Linguistic Analysis, 1–32 (1957)
Google Scholar
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New experiments in distributional representations of synonymy. In: Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL), Ann Arbor, MI, pp. 25–32 (2005)
Google Scholar
Hamilton, W., Peters, T., Bankhead, C., Sharp, D.: Risk of ovarian cancer in women with symptoms in primary care: population based case-control study. British Medical Journal 339, b2998 (2009)
Article Google Scholar
Henriksson, A., Hassel, M., Kvist, M.: Diagnosis Code Assignment Support using Random Indexing of Patient Records a Qualitative Feasibility Study. In: Peleg, M., Lavrač, N., Combi, C. (eds.) AIME 2011. LNCS, vol. 6747, pp. 348–352. Springer, Heidelberg (2011)
Chapter Google Scholar
Johansen, M., Scholl, J., Hasvold, P., Ellingsen, G., Bellika, J.: “Garbage in, garbage out” – extracting disease surveillance data from EPR systems in primary care. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, San Diego, CA, pp. 525–534 (2008)
Google Scholar
Kalra, D., Ingram, D.: Electronic health records. In: Zielinski, K., Duplaga, M., Ingram, D. (eds.) Information Technology Solutions for Healthcare. Springer, Heidelberg (2006), http://eprints.ucl.ac.uk/1598/
Google Scholar
Koeling, R., Carroll, J., Tate, A.R., Nicholson, A.: Annotating a corpus of clinical text records for learning to recognize symptoms automatically. In: Proceedings of the 3rd Louhi Workshop on Text and Data Mining of Health Documents, Bled, Slovenia, pp. 43–50 (2011)
Google Scholar
Koeling, R., McCarthy, D., Carroll, J.: Domain-specific sense distributions and predominant sense acquisition. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 419–426 (2005)
Google Scholar
Koeling, R., Tate, A.R., Carroll, J.: Automatically estimating the incidence of symptoms recorded in GP free text notes. In: Proceedings of the First International Workshop on Managing Interoperability and Complexity in Health Systems, Glasgow, UK, pp. 43–50 (2011)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the ACL, Montreal, Canada, pp. 768–774 (1998)
Google Scholar
McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Unsupervised acquisition of predominant word senses. Computational Linguistics 33(4), 553–590 (2007)
Article Google Scholar
NIST: Proceedings of the 2011 Text REtrieval Conference (TREC 2011). National Institute for Standards in Technology, Gaithersburg, MD (2011)
Google Scholar
Pestian, J., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of BioNLP 2007: Biological, Translational, and Clinical Language Processing, Prague, Czech Republic, pp. 97–104 (2007)
Google Scholar
van der Plas, L., Tiedemann, J.: Finding medical term variations using parallel corpora and distributional similarity. In: Proceedings of the 6th Workshop on Ontologies and Lexical Resources, Beijing, China, pp. 28–37 (2010)
Google Scholar
Resnik, P., Niv, M., Nossal, M., Kapit, A., Toren, R.: Communication of clinically relevant information in electronic health records: a comparison between structured data and unrestricted physician language. Perspectives in Health Information Management (2008)
Google Scholar
Roberts, A., Gaizauskas, R., Hepple, M., Guo, Y.: Mining clinical relationships from patient narratives. BMC Bioinformatics 9(suppl. 11), S3 (2008)
Article Google Scholar
Tate, A.R., Martin, A., Ali, A., Cassell, J.: Using free text information to explore how and when GPs code a diagnosis of ovarian cancer: an observational study using primary care records of patients with ovarian cancer. BMJ. Open. (2011) doi:10.1136/bmjopen-2010-000025
Google Scholar
Uzuner, Ö., Goldstein, I., Luo, Y., Kohane, I.: Identifying patient smoking status from medical discharge records. JAMIA 15(1), 14–24 (2008)
Google Scholar
Uzuner, Ö., Solti, I., Cadag, E.: Extracting medication information from clinical text. JAMIA 17(5), 514–518 (2010)
Google Scholar
Weeds, J., Dowdall, J., Schneider, G., Keller, B., Weir, D.: Using distributional similarity to organise biomedical terminology. Terminology 11(1), 107–141 (2005)
Article Google Scholar
Weeds, J., Weir, D.: Co-occurrence Retrieval: a flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–476 (2005)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, University of Sussex, Brighton, BN1 9QH, UK
John Carroll & Rob Koeling
GPRD, 151 Buckingham Palace Road, London, SW1W 9SZ, UK
Shivani Puri

Authors

John Carroll
View author publications
You can also search for this author in PubMed Google Scholar
Rob Koeling
View author publications
You can also search for this author in PubMed Google Scholar
Shivani Puri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carroll, J., Koeling, R., Puri, S. (2012). Lexical Acquisition for Clinical Text Mining Using Distributional Similarity. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-28601-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28600-1
Online ISBN: 978-3-642-28601-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics