Abstract
We describe experiments into the use of distributional similarity for acquiring lexical information from clinical free text, in particular notes typed by primary care physicians (general practitioners). We also present a novel approach to lexical acquisition from ‘sensitive’ text, which does not require the text to be manually anonymised – a very expensive process – and therefore allows much larger datasets to be used than would normally be possible.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bentley, T., Price, C., Brown, P.: Structural and lexical features of successive versions of the Read Codes. In: Teasdale, S. (ed.) Proceedings of the Annual Conference of The Primary Health Care Specialist Group of the British Computer Society, Worcester, UK, pp. 91–103 (1996), http://www.phcsg.org/main/pastconf/camb96/readcode.htm
Curran, J., Moens, M.: Scaling context space. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp. 231–238 (2002)
Fan, J.W., Friedman, C.: Semantic classification of biomedical concepts using distributional similarity. JAMIA 14(4), 467–477 (2007)
Firth, J.R.: A synopsis of linguistic theory 1930-1955. Studies in Linguistic Analysis, 1–32 (1957)
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New experiments in distributional representations of synonymy. In: Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL), Ann Arbor, MI, pp. 25–32 (2005)
Hamilton, W., Peters, T., Bankhead, C., Sharp, D.: Risk of ovarian cancer in women with symptoms in primary care: population based case-control study. British Medical Journal 339, b2998 (2009)
Henriksson, A., Hassel, M., Kvist, M.: Diagnosis Code Assignment Support using Random Indexing of Patient Records a Qualitative Feasibility Study. In: Peleg, M., Lavrač, N., Combi, C. (eds.) AIME 2011. LNCS, vol. 6747, pp. 348–352. Springer, Heidelberg (2011)
Johansen, M., Scholl, J., Hasvold, P., Ellingsen, G., Bellika, J.: “Garbage in, garbage out” – extracting disease surveillance data from EPR systems in primary care. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, San Diego, CA, pp. 525–534 (2008)
Kalra, D., Ingram, D.: Electronic health records. In: Zielinski, K., Duplaga, M., Ingram, D. (eds.) Information Technology Solutions for Healthcare. Springer, Heidelberg (2006), http://eprints.ucl.ac.uk/1598/
Koeling, R., Carroll, J., Tate, A.R., Nicholson, A.: Annotating a corpus of clinical text records for learning to recognize symptoms automatically. In: Proceedings of the 3rd Louhi Workshop on Text and Data Mining of Health Documents, Bled, Slovenia, pp. 43–50 (2011)
Koeling, R., McCarthy, D., Carroll, J.: Domain-specific sense distributions and predominant sense acquisition. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 419–426 (2005)
Koeling, R., Tate, A.R., Carroll, J.: Automatically estimating the incidence of symptoms recorded in GP free text notes. In: Proceedings of the First International Workshop on Managing Interoperability and Complexity in Health Systems, Glasgow, UK, pp. 43–50 (2011)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the ACL, Montreal, Canada, pp. 768–774 (1998)
McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Unsupervised acquisition of predominant word senses. Computational Linguistics 33(4), 553–590 (2007)
NIST: Proceedings of the 2011 Text REtrieval Conference (TREC 2011). National Institute for Standards in Technology, Gaithersburg, MD (2011)
Pestian, J., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of BioNLP 2007: Biological, Translational, and Clinical Language Processing, Prague, Czech Republic, pp. 97–104 (2007)
van der Plas, L., Tiedemann, J.: Finding medical term variations using parallel corpora and distributional similarity. In: Proceedings of the 6th Workshop on Ontologies and Lexical Resources, Beijing, China, pp. 28–37 (2010)
Resnik, P., Niv, M., Nossal, M., Kapit, A., Toren, R.: Communication of clinically relevant information in electronic health records: a comparison between structured data and unrestricted physician language. Perspectives in Health Information Management (2008)
Roberts, A., Gaizauskas, R., Hepple, M., Guo, Y.: Mining clinical relationships from patient narratives. BMC Bioinformatics 9(suppl. 11), S3 (2008)
Tate, A.R., Martin, A., Ali, A., Cassell, J.: Using free text information to explore how and when GPs code a diagnosis of ovarian cancer: an observational study using primary care records of patients with ovarian cancer. BMJ. Open. (2011) doi:10.1136/bmjopen-2010-000025
Uzuner, Ö., Goldstein, I., Luo, Y., Kohane, I.: Identifying patient smoking status from medical discharge records. JAMIA 15(1), 14–24 (2008)
Uzuner, Ö., Solti, I., Cadag, E.: Extracting medication information from clinical text. JAMIA 17(5), 514–518 (2010)
Weeds, J., Dowdall, J., Schneider, G., Keller, B., Weir, D.: Using distributional similarity to organise biomedical terminology. Terminology 11(1), 107–141 (2005)
Weeds, J., Weir, D.: Co-occurrence Retrieval: a flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–476 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carroll, J., Koeling, R., Puri, S. (2012). Lexical Acquisition for Clinical Text Mining Using Distributional Similarity. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-28601-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28600-1
Online ISBN: 978-3-642-28601-8
eBook Packages: Computer ScienceComputer Science (R0)