A Co-occurrence Based MedDRA Terminology Generation: Some Preliminary Results

  • Margherita ZorziEmail author
  • Carlo Combi
  • Gabriele Pozzani
  • Elena Arzenton
  • Ugo Moretti
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10259)


The generation of medical terminologies is an important activity. A flexible and structured terminology both helps professionals in everyday manual classification of clinical texts and is crucial to build knowledge bases for encoding tools implementing software to support medical tasks. For these reasons, it would be nice to “enforce” medical dictionaries such as MedDRA with sets of locutions semantically related to official terms. Unfortunately, the manual generation of medical terminologies is time consuming. Even if the human validation is an irreplaceable step, a significative set of “high-quality” candidate terminologies can be automatically generated from clinical documents by statistical methods for linguistic. In this paper we adapt and use a co-occurrence based technique to generate new MedDRA locutions, starting from some large sets of narrative documents about adverse drug reactions. We describe here the methodology we designed and results of some first experiments.


Mutual Information Spontaneous Report Unify Medical Language System Narrative Text Medical Dictionary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Zorzi, M., Combi, C., Lora, R., Pagliarini, M., Moretti, U.: Automagically encoding adverse drug reactions in MedDRA. In: 2015 IEEE International Conference on Healthcare Informatics, ICHI 2015, pp. 90–99 (2015)Google Scholar
  2. 2.
    Schütze, H., Pedersen, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. Inform. Process. Manag. 33(3), 307–318 (1997)CrossRefGoogle Scholar
  3. 3.
    ICH: MedDRA data retrieval and presentation: points to consider (2016)Google Scholar
  4. 4.
    Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Raedt, L., Flach, P. (eds.) ECML 2001. LNCS, vol. 2167, pp. 491–502. Springer, Heidelberg (2001). doi: 10.1007/3-540-44795-4_42 CrossRefGoogle Scholar
  5. 5.
    Baroni, M., Bisi, S.: Using cooccurrence statistics and the web to discover synonyms in a technical language. In: Proceedings of LREC (2004)Google Scholar
  6. 6.
    Schulz, S., Costa, C.M., Kreuzthaler, M., et al.: Semantic relation discovery by using co-occurrence information. In: Proceedings of BioTxtM (2014)Google Scholar
  7. 7.
    Yang, C.C., Yang, H., Jiang, L., Zhang, M.: Social media mining for drug safety signal detection. In: Proceedings of SHB, pp. 33–40. ACM (2012)Google Scholar
  8. 8.
    Souvignet, J., Declerck, G., Asfari, H., Jaulent, M.C., Bousquet, C.: OntoADR a semantic resource describing adverse drug reactions to support searching, coding, and information retrieval. J. Biomed. Inform. 63, 100–107 (2016)CrossRefGoogle Scholar
  9. 9.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  10. 10.
    Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. In: Proceedings of ACL 1989, Stroudsburg, PA, USA, pp. 76–83 (1989)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Margherita Zorzi
    • 1
    Email author
  • Carlo Combi
    • 1
  • Gabriele Pozzani
    • 2
  • Elena Arzenton
    • 2
  • Ugo Moretti
    • 2
  1. 1.Department of Computer ScienceUniversity of VeronaVeronaItaly
  2. 2.Department of Diagnostics and Public HealthUniversity of VeronaVeronaItaly

Personalised recommendations