MIRACLE at ImageCLEFmed 2008: Semantic vs. Statistical Strategies for Topic Expansion

  • Sara Lana-Serrano
  • Julio Villena-Román
  • José Carlos González-Cristóbal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5706)


This paper describes the participation of MIRACLE research consortium at the ImageCLEFmed task of ImageCLEF 2008. The main goal of our participation this year is to evaluate different text-based topic expansion approaches: methods based on linguistic information such as thesauri or knowledge bases, and statistical techniques based mainly on term frequency. First a common baseline algorithm is used to process the document collection: text extraction, medical-vocabulary recognition, tokenization, conversion to lowercase, filtering, stemming and indexing and retrieval. Then different expansion techniques are applied. For the semantic expansion, the MeSH concept hierarchy using UMLS entities as basic root elements was used. The statistical method expanded the topics using the apriori algorithm. Relevance-feedback techniques were also used.


Image retrieval medical domain-specific vocabulary thesaurus linguistic engineering information retrieval indexing topic expansion relevance feedback ImageCLEF Medical Retrieval Task ImageCLEF CLEF 2008 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lana-Serrano, S., Villena-Román, J., González-Cristóbal, J.C.: MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion. In: Working Notes of the 2008 CLEF Workshop, Aarhus, Denmark (2008)Google Scholar
  2. 2.
    Müller, H., Kalpathy-Cramer, J., Kahn Jr., C.E., Hatt, W., Bedrick, S., Hersh, W.: Overview of the ImageCLEFmed 2008 Medical Image Retrieval Task. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 512–522. Springer, Heidelberg (2009)Google Scholar
  3. 3.
    González, J.C., Villena, J., Moreno, C., Martínez, J.L.: Semiautomatic Extraction of Thesauri and Semantic Search in a Digital Image Archive. In: Integrating Technology and Culture: 10th International Conference on Electronic Publishing, ELPUB 2006, Bulgaria (2006)Google Scholar
  4. 4.
    U.S. National Library of Medicine. National Institutes of Health. Unified Medical Language System,
  5. 5.
    University of Neuchatel. Page of resources for CLEF,
  6. 6.
    Porter, M.: Snowball stemmers and resources page,
  7. 7.
    Apache Lucene project,
  8. 8.
    U.S. National Library of Medicine. National Institutes of Health. Medical Subject Headings,
  9. 9.
    Agrawal, R., Srikan, R.: Fast algorithms for mining association rules. In: Proceedings of the International Conference on Very Large Data Bases, pp. 407–419 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Sara Lana-Serrano
    • 1
    • 3
  • Julio Villena-Román
    • 2
    • 3
  • José Carlos González-Cristóbal
    • 1
    • 3
  1. 1.Universidad Politécnica de MadridSpain
  2. 2.Universidad Carlos III de MadridSpain
  3. 3.DAEDALUS - Data, Decisions and Language, S.A.Spain

Personalised recommendations