Skip to main content

Learning Derived Words from Medical Corpora

  • Conference paper
Artificial Intelligence in Medicine (AIME 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2780))

Included in the following conference series:

  • 577 Accesses

Abstract

Morphological knowledge (inflection, derivation, compounds) is useful for medical language processing. Some is available for medical English in the UMLS Specialist Lexicon, but not for the French language. Large corpora of medical texts can nowadays be obtained from the Web. We propose here a method, based on the cooccurrence of formally similar words, which takes advantage of such a corpus to learn morphological knowledge for French medical words. The relations obtained before filtering have an average precision of 75.6% after 5,000 word pairs. Detailed examination of the results obtained on a sample of 376 French SNOMED anatomy nouns shows that 91–94% of the proposed derived adjectives are correct, that 36% of the nouns receive a correct adjective, and that this method can add 41% more derived adjectives than SNOMED already specifies. We discuss these results and propose directions for improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lovis, C., Baud, R., Michel, P.A., Scherrer, J.R.: A semi-automatic ICD encoder. J. Am. Med. Inform. Assoc. 3, 937–937 (1996)

    Google Scholar 

  2. Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. J. Am. Med. Inform. Assoc. 8 (2001)

    Google Scholar 

  3. Hahn, U., Honeck, M., Piotrowski, M., Schulz, S.: Subword segmentation: Leveling out morphological variations for medical document retrieval. J. Am. Med. Inform. Assoc. 8, 229–233 (2001)

    Google Scholar 

  4. Zweigenbaum, P., Darmoni, S.J., Grabar, N.: The contribution of morphological knowledge to French MeSH mapping for information retrieval. J. Am. Med. Inform. Assoc. 8, 796–800 (2001)

    Google Scholar 

  5. McCray, A.T., Srinivasan, S., Browne, A.C.: Lexical methods for managing variation in biomedical terminologies. In: Proc 18th Annu. Symp. Comput. Appl. Med. Care, Washington, pp. 235–239. Mc Graw Hill, New York (1994)

    Google Scholar 

  6. Weske-Heck, G., Zaiß, A., Zabel, M., Schulz, S., Giere, W., Schopen, M., Klar, R.: The German Specialist Lexicon. J. Am. Med. Inform. Assoc. 8 (2002)

    Google Scholar 

  7. Zweigenbaum, P., Baud, R., Burgun, A., Namer, F., Jarrousse, E., Grabar, N., Ruch, P., Le Duff, F., Thirion, B., Darmoni, S.: Towards a unified medical lexicon for French. In: Baud, R., Fieschi, M., Le Beux, P., Ruch, P. (eds.) Proceedings Medical Informatics Europe, pp. 415–420. IOS Press, Amsterdam (2003)

    Google Scholar 

  8. Lovis, C., Michel, P.A., Baud, R., Scherrer, J.R.: Word segmentation processing: a way to exponentially extend medical dictionaries. In: Greenes, R.A., Peterson, H.E., Protti, D.J. (eds.) Proc 8th World Congress on Medical Informatics, pp. 28–32 (1995)

    Google Scholar 

  9. Zweigenbaum, P.: Resources for the medical domain: medical terminologies, lexicons and corpora. ELRA Newsletter 6, 8–11 (2001)

    Google Scholar 

  10. Zweigenbaum, P., Grabar, N.: Automatic acquisition of morphological knowledge for medical language processing. In: Horn, W., Shahar, Y., Lindberg, G., Andreassen, S., Wyatt, J. (eds.) Artificial Intelligence in Medicine. LNCS (LNAI), pp. 416–420. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  11. Grabar, N., Zweigenbaum, P.: Automatic acquisition of domain-specific morphological resources from thesauri. In: Proceedings of RIAO 2000: Content-Based Multimedia Information Access, Paris, France, C.I.D, pp. 765–784 (2000)

    Google Scholar 

  12. Jacquemin, C.: Guessing morphology from terms and corpora. In: Proc. 20th ACM SIGIR, Philadelphia, PA, pp. 156–167 (1997)

    Google Scholar 

  13. Xu, J., Croft, B.W.: Corpus-based stemming using co-occurrence of word variants. ACM Transactions on Information Systems 16, 61–81 (1998)

    Article  Google Scholar 

  14. Gaussier, E.: Unsupervised learning of derivational morphology from inflectional lexicons. In: Kehler, A., Stolcke, A. (eds.) ACL workshop on Unsupervised Methods in Natural Language Learning, College Park, Md (1999)

    Google Scholar 

  15. Daille, B.: Identification des adjectifs relationnels en corpus. In: Amsili, P. (ed.) Proceedings of TALN 1999 (Traitement automatique des langues naturelles), Cargèse, ATALA, pp. 105–114 (1999)

    Google Scholar 

  16. Hathout, N., Namer, F., Dal, G.: An experimental constructional database: the MorTAL project. In: Boucher, P. (ed.) Many morphologies, pp. 178–209. Cascadilla Press, Somerville (2002)

    Google Scholar 

  17. Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)

    Google Scholar 

  18. Hadouche, F.: Acquisition de resources morphologiques à partir de corpus. DESS d’ingénierie multilingue, Institut National des Langues et Civilisations Orientales, Paris (2002)

    Google Scholar 

  19. Côtè, R.A.: Répertoire d’anatomopathologie de la SNOMED internationale, vol. 3.4. Université de Sherbrooke, Sherbrooke, Québec. (1996)

    Google Scholar 

  20. Darmoni, S.J., Leroy, J.P., Thirion, B., Baudic, F., Douyere, M., Piot, J.: CISMeF: a structured health resource guide. Methods Inf. Med. 39, 30–35 (2000)

    Google Scholar 

  21. Grefenstette, G., Nioche, J.: Estimation of English and non-English language use on the WWW. In: Proceedings of RIAO 2000: Content-Based Multimedia Information Access, Paris, France, C.I.D, pp. 237–246 (2000)

    Google Scholar 

  22. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994)

    Google Scholar 

  23. Namer, F.: FLEMM: un analyseur flexionnel du français à base de règles. Traitement Automatique des Langues 41, 523–547 (2000)

    Google Scholar 

  24. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (1999)

    MATH  Google Scholar 

  25. Bodenreider, O., Zweigenbaum, P.: Identifying proper names in parallel medical terminologies. In: Hasman, A., Blobel, B., Dudeck, J., Engelbrecht, R., Gell, G., Prokosh, H.U. (eds.) Medical Infobahn for Europe—Proceedings of MIE 2000 and GMDS 2000, pp. 443–447. IOS Press, Amsterdam (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zweigenbaum, P., Grabar, N. (2003). Learning Derived Words from Medical Corpora. In: Dojat, M., Keravnou, E.T., Barahona, P. (eds) Artificial Intelligence in Medicine. AIME 2003. Lecture Notes in Computer Science(), vol 2780. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39907-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39907-0_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20129-8

  • Online ISBN: 978-3-540-39907-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics