Learning Derived Words from Medical Corpora

Zweigenbaum, Pierre; Grabar, Natalia

doi:10.1007/978-3-540-39907-0_27

Pierre Zweigenbaum⁹ &
Natalia Grabar⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2780))

Included in the following conference series:

Conference on Artificial Intelligence in Medicine in Europe

577 Accesses

Abstract

Morphological knowledge (inflection, derivation, compounds) is useful for medical language processing. Some is available for medical English in the UMLS Specialist Lexicon, but not for the French language. Large corpora of medical texts can nowadays be obtained from the Web. We propose here a method, based on the cooccurrence of formally similar words, which takes advantage of such a corpus to learn morphological knowledge for French medical words. The relations obtained before filtering have an average precision of 75.6% after 5,000 word pairs. Detailed examination of the results obtained on a sample of 376 French SNOMED anatomy nouns shows that 91–94% of the proposed derived adjectives are correct, that 36% of the nouns receive a correct adjective, and that this method can add 41% more derived adjectives than SNOMED already specifies. We discuss these results and propose directions for improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lovis, C., Baud, R., Michel, P.A., Scherrer, J.R.: A semi-automatic ICD encoder. J. Am. Med. Inform. Assoc. 3, 937–937 (1996)
Google Scholar
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. J. Am. Med. Inform. Assoc. 8 (2001)
Google Scholar
Hahn, U., Honeck, M., Piotrowski, M., Schulz, S.: Subword segmentation: Leveling out morphological variations for medical document retrieval. J. Am. Med. Inform. Assoc. 8, 229–233 (2001)
Google Scholar
Zweigenbaum, P., Darmoni, S.J., Grabar, N.: The contribution of morphological knowledge to French MeSH mapping for information retrieval. J. Am. Med. Inform. Assoc. 8, 796–800 (2001)
Google Scholar
McCray, A.T., Srinivasan, S., Browne, A.C.: Lexical methods for managing variation in biomedical terminologies. In: Proc 18th Annu. Symp. Comput. Appl. Med. Care, Washington, pp. 235–239. Mc Graw Hill, New York (1994)
Google Scholar
Weske-Heck, G., Zaiß, A., Zabel, M., Schulz, S., Giere, W., Schopen, M., Klar, R.: The German Specialist Lexicon. J. Am. Med. Inform. Assoc. 8 (2002)
Google Scholar
Zweigenbaum, P., Baud, R., Burgun, A., Namer, F., Jarrousse, E., Grabar, N., Ruch, P., Le Duff, F., Thirion, B., Darmoni, S.: Towards a unified medical lexicon for French. In: Baud, R., Fieschi, M., Le Beux, P., Ruch, P. (eds.) Proceedings Medical Informatics Europe, pp. 415–420. IOS Press, Amsterdam (2003)
Google Scholar
Lovis, C., Michel, P.A., Baud, R., Scherrer, J.R.: Word segmentation processing: a way to exponentially extend medical dictionaries. In: Greenes, R.A., Peterson, H.E., Protti, D.J. (eds.) Proc 8^th World Congress on Medical Informatics, pp. 28–32 (1995)
Google Scholar
Zweigenbaum, P.: Resources for the medical domain: medical terminologies, lexicons and corpora. ELRA Newsletter 6, 8–11 (2001)
Google Scholar
Zweigenbaum, P., Grabar, N.: Automatic acquisition of morphological knowledge for medical language processing. In: Horn, W., Shahar, Y., Lindberg, G., Andreassen, S., Wyatt, J. (eds.) Artificial Intelligence in Medicine. LNCS (LNAI), pp. 416–420. Springer, Heidelberg (1999)
Chapter Google Scholar
Grabar, N., Zweigenbaum, P.: Automatic acquisition of domain-specific morphological resources from thesauri. In: Proceedings of RIAO 2000: Content-Based Multimedia Information Access, Paris, France, C.I.D, pp. 765–784 (2000)
Google Scholar
Jacquemin, C.: Guessing morphology from terms and corpora. In: Proc. 20th ACM SIGIR, Philadelphia, PA, pp. 156–167 (1997)
Google Scholar
Xu, J., Croft, B.W.: Corpus-based stemming using co-occurrence of word variants. ACM Transactions on Information Systems 16, 61–81 (1998)
Article Google Scholar
Gaussier, E.: Unsupervised learning of derivational morphology from inflectional lexicons. In: Kehler, A., Stolcke, A. (eds.) ACL workshop on Unsupervised Methods in Natural Language Learning, College Park, Md (1999)
Google Scholar
Daille, B.: Identification des adjectifs relationnels en corpus. In: Amsili, P. (ed.) Proceedings of TALN 1999 (Traitement automatique des langues naturelles), Cargèse, ATALA, pp. 105–114 (1999)
Google Scholar
Hathout, N., Namer, F., Dal, G.: An experimental constructional database: the MorTAL project. In: Boucher, P. (ed.) Many morphologies, pp. 178–209. Cascadilla Press, Somerville (2002)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Google Scholar
Hadouche, F.: Acquisition de resources morphologiques à partir de corpus. DESS d’ingénierie multilingue, Institut National des Langues et Civilisations Orientales, Paris (2002)
Google Scholar
Côtè, R.A.: Répertoire d’anatomopathologie de la SNOMED internationale, vol. 3.4. Université de Sherbrooke, Sherbrooke, Québec. (1996)
Google Scholar
Darmoni, S.J., Leroy, J.P., Thirion, B., Baudic, F., Douyere, M., Piot, J.: CISMeF: a structured health resource guide. Methods Inf. Med. 39, 30–35 (2000)
Google Scholar
Grefenstette, G., Nioche, J.: Estimation of English and non-English language use on the WWW. In: Proceedings of RIAO 2000: Content-Based Multimedia Information Access, Paris, France, C.I.D, pp. 237–246 (2000)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994)
Google Scholar
Namer, F.: FLEMM: un analyseur flexionnel du français à base de règles. Traitement Automatique des Langues 41, 523–547 (2000)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (1999)
MATH Google Scholar
Bodenreider, O., Zweigenbaum, P.: Identifying proper names in parallel medical terminologies. In: Hasman, A., Blobel, B., Dudeck, J., Engelbrecht, R., Gell, G., Prokosh, H.U. (eds.) Medical Infobahn for Europe—Proceedings of MIE 2000 and GMDS 2000, pp. 443–447. IOS Press, Amsterdam (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Mission de recherche en Sciences et Technologies de l’Information Médicale, STIM/DPA/DSI, Assistance Publique – Hôpitaux de Paris & ERM 202 INSERM,
Pierre Zweigenbaum & Natalia Grabar

Authors

Pierre Zweigenbaum
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Grabar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSERM U836-UJF-CEA-CHU (Grenoble Institute of Neuroscience),
Michel Dojat
Department of Computer Science, University of Cyprus, P.O.Box 20537, CY-1678, Nicosia, Cyprus
Elpida T. Keravnou
Centro de Inteligência Artificial, Departamento de Informática, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
Pedro Barahona

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zweigenbaum, P., Grabar, N. (2003). Learning Derived Words from Medical Corpora. In: Dojat, M., Keravnou, E.T., Barahona, P. (eds) Artificial Intelligence in Medicine. AIME 2003. Lecture Notes in Computer Science(), vol 2780. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39907-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-39907-0_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20129-8
Online ISBN: 978-3-540-39907-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics