Comparison of Clustering Approaches through Their Application to Pharmacovigilance Terms

  • Marie Dupuch
  • Christopher Engström
  • Sergei Silvestrov
  • Thierry Hamon
  • Natalia Grabar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7885)


In different applications (i.e., information retrieval, filtering or analysis), it is useful to detect similar terms and to provide the possibility to use them jointly. Clustering of terms is one of the methods which can be exploited for this. In our study, we propose to test three methods dedicated to the clustering of terms (hierarchical ascendant classification, Radius and maximum), to combine them with the semantic distance algorithms and to compare them through the results they provide when applied to terms from the pharmacovigilance area. The comparison indicates that the non disjoint clustering (Radius and maximum) outperform the disjoint clusters by 10 to up to 20 points in all the experiments.


Semantic Distance Disjoint Cluster American Medical Informatics Association Terminological Resource MedDRA Term 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barzilay, R., Elhadad, N.: Sentence alignment for monolingual comparable corpora. In: EMNLP, pp. 25–32 (2003)Google Scholar
  2. 2.
    Paşca, M.: Mining paraphrases from self-anchored web sentence fragments. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 193–204. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Max, A., Bouamor, H., Vilnat, A.: Generalizing sub-sentential paraphrase acquisition across original signal type of text pairs. In: EMNLP, pp. 721–31 (2012)Google Scholar
  4. 4.
    Jacquemin, C.: A symbolic and surgical acquisition of terms through variation. In: Wermter, S., Riloff, E., Scheler, G. (eds.) IJCAI-WS 1995. LNCS, vol. 1040, pp. 425–438. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  5. 5.
    Daille, B., Habert, B., Jacquemin, C., Royauté, J.: Empirical observation of term variations and principles for their description. Terminology 3(2), 197–257 (1996)CrossRefGoogle Scholar
  6. 6.
    Hahn, U., Honeck, M., Piotrowsky, M., Schulz, S.: Subword segmentation - leveling out morphological variations for medical document retrieval. In: Annual Symposium of the American Medical Informatics Association (AMIA), Washington (2001)Google Scholar
  7. 7.
    Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 19(1), 17–30 (1989)CrossRefGoogle Scholar
  8. 8.
    Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of Associations for Computational Linguistics, pp. 133–138 (1994)Google Scholar
  9. 9.
    Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: WordNet: An Electronic Lexical Database, pp. 305–332 (1998)Google Scholar
  10. 10.
    Zhong, J., Zhu, H., Li, J., Yu, Y.: Conceptual graph matching for semantic search. In: Priss, U., Corbett, D.R., Angelova, G. (eds.) ICCS 2002. LNCS (LNAI), vol. 2393, pp. 92–106. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in wordnet. In: Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004), pp. 1089–1090 (2004)Google Scholar
  12. 12.
    Nguyen, H., Al-Mubaid, H.: New ontology-based semantic similarity measure for the biomedical domain. IEEE Eng. Med. Biol. Proc., 623–628 (2006)Google Scholar
  13. 13.
    Maedche, A., Staab, S.: Mining ontologies from text. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 189–202. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  14. 14.
    Bodenreider, O., Pakhomov, S.: Exploring adjectival modification in biomedical discourse across two genres. In: Workshop Natural Language Processing in Biomedical Applications of ACL, pp. 105–112 (2003)Google Scholar
  15. 15.
    Grabar, N., Zweigenbaum, P.: Lexically-based terminology structuring. Terminology 10, 23–54 (2004)CrossRefGoogle Scholar
  16. 16.
    D’aquin, M., Euzenat, J., Le Duc, C., Lewen, H.: Sharing and reusing aligned ontologies with cupboard. In: K-CAP 2009, pp. 179–180 (2009)Google Scholar
  17. 17.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)Google Scholar
  18. 18.
    Kaufman, L., Rousseeuw, P.: Clustering by means of medoids. In: Statistical Data Analysis based on the L1 Norm, pp. 405–416 (1987)Google Scholar
  19. 19.
    Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algoritms. Plenum Press, New York (1981)CrossRefGoogle Scholar
  20. 20.
    Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low complexity fuzzy relational clustering algorithms for web mining. IEEE Trans. Fuzzy System, 595–607 (2001)Google Scholar
  21. 21.
    Lelu, A.: Modles neuronaux pour lanalyse de donnes documentaires et textuelles. Phd thesis, Universite de Paris VI, Paris, France (1993)Google Scholar
  22. 22.
    Dupuch, M., Bousquet, C., Grabar, N.: Automatic creation and refinement of the clusters of pharmacovigilance terms. In: ACM IHI, pp. 181–190 (2012)Google Scholar
  23. 23.
    Cleuziou, G., Martin, L., Vrain, C.: PoBOC: An overlapping clustering algorithm. application to rule-based classification and textual data. In: ECAI, pp. 440–444 (2004)Google Scholar
  24. 24.
    Cleuziou, G.: OKM: Une extension des k-moyennes pour la recherche de classes recouvrantes. In: EGC, pp. 691–702 (2007)Google Scholar
  25. 25.
    Johnson, S.: Hierarchical clustering schemes. Psychometrika 32, 241–254 (1967)CrossRefGoogle Scholar
  26. 26.
    Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)CrossRefGoogle Scholar
  27. 27.
    Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: ACM SIGMOD, pp. 103–114 (1996)Google Scholar
  28. 28.
    Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: ACM SIGMOD, pp. 73–84 (1998)Google Scholar
  29. 29.
    Alecu, I., Bousquet, C., Jaulent, M.: A case report: Using snomed ct for grouping adverse drug reactions terms. BMC Med. Inform. Decis. Mak. 8(1), 4 (2008)CrossRefGoogle Scholar
  30. 30.
    Brown, E.G., Wood, L., Wood, S.: The medical dictionary for regulatory activities (MedDRA). Drug Saf. 20(2), 109–117 (1999)CrossRefGoogle Scholar
  31. 31.
    Stearns, M.Q., Price, C., Spackman, K.A., Wang, A.Y.: SNOMED clinical terms: Overview of the development process and project status. In: AMIA, pp. 662–666 (2001)Google Scholar
  32. 32.
    NLM: UMLS Knowledge Sources Manual. National Library of Medicine, Bethesda, Maryland (2008),

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Marie Dupuch
    • 1
  • Christopher Engström
    • 2
  • Sergei Silvestrov
    • 2
  • Thierry Hamon
    • 3
  • Natalia Grabar
    • 1
  1. 1.CNRS UMR 8163 STLUniversité Lille 3Villeneuve d’AscqFrance
  2. 2.Division of Applied MathematicsMälardalen UniversityVästeråsSweden
  3. 3.LIM&BIO (EA3969)Université Paris 13Sorbonne Paris CitéFrance

Personalised recommendations