Creating Topic Hierarchies for Large Medical Libraries

  • David Sánchez
  • Antonio Moreno
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5943)


Web-based medical digital libraries contain a huge amount of valuable, up-to-date health care information. However, their size, their keyword-based access methods and their lack of semantic structure make it difficult to find the desired information. In this paper we present an automatic, unsupervised and domain-independent approach for structuring the resources available in an electronic repository. The system automatically detects and extracts the main topics related to a given domain, building a taxonomical structure. Our Web-based system is integrated smoothly with the digital library’s search engine, offering a tool for accessing the library’s resources by hierarchically browsing domain topics in a comprehensive and natural way. The system has been tested over the well-known PubMed medical library, obtaining better topic hierarchies than those generated by widely-used taxonomic search engines employing clustering techniques.


taxonomy learning digital libraries Web mining Web search engines topic hierarchies knowledge acquisition ontologies Semantic Web 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agirre, E., Ansa, O., Hovy, E., Martínez, D.: Enriching very large ontologies using the WWW. In: Proceedings of the Workshop on Ontology Construction of the European Conference of AI, Berlin, Germany (2000)Google Scholar
  2. 2.
    Agrawal, R., Imielinksi, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 207–216 (1993)Google Scholar
  3. 3.
    Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American 5(284), 34–43 (2001)CrossRefGoogle Scholar
  4. 4.
    Brill, E., Lin, J., Banko, M., Dumais, S.A.: Data-intensive Question Answering. In: Proceedings of the Tenth Text Retrieval Conference, pp. 393–400 (2001)Google Scholar
  5. 5.
    Chung, C.Y., Lieu, R., Luk, A., Mao, J., Raghavan, P.: Tematic Mapping – From Unstructured Documents to Taxonomies. In: Proceedings of the 11th International Conference on Information and Knowledge Management, USA, pp. 608–610 (2002)Google Scholar
  6. 6.
    Cilibrasi, R., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2006)CrossRefzbMATHGoogle Scholar
  7. 7.
    Cimiano, P., Staab, S.: Learning by Googling. Proceedings of SIGKDD Explorations 6(2), 24–33 (2004)CrossRefGoogle Scholar
  8. 8.
    Ciravegna, F., Dingli, A., Guthrie, D., Wilks, Y.: Integrating Information to Bootstrap Information Extraction from Web Sites. In: Proceedings of the IJCAI Workshop on Information Integration on the Web, pp. 9–14 (2003)Google Scholar
  9. 9.
    Cutting, D., Karger, D., Pedersen, J., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen, pp. 318–329 (1992)Google Scholar
  10. 10.
    da Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: Proceedings of Sixth Meeting on Mathematics of Language, pp. 369–381 (1999)Google Scholar
  11. 11.
    Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence 165, 91–134 (2005)CrossRefGoogle Scholar
  12. 12.
    Fano, R.: Transmission of Information. MIT Press, Cambridge (1961)zbMATHGoogle Scholar
  13. 13.
    Fensel, D.: Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer, Heidelberg (2001)CrossRefzbMATHGoogle Scholar
  14. 14.
    Freeman, R.T.: Topological Tree Clustering of Web Search Results. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 789–797. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Egineering, 2nd edn. (2004)Google Scholar
  16. 16.
    Grefenstette, G.: SQLET: Short Query Linguistic Expansion Techniques: Palliating One-Word Queries by Providing Intermediate Structure to Text. In: Proceedings of Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, RIAO 1997, Montreal, Canada, pp. 97–114 (1997)Google Scholar
  17. 17.
    Hahn, U., Schulz, S.: Towards Very Large Terminological Knowledge Bases: A Case Study from Medicine. In: Proceedings of Canadian Conference on AI, pp. 176–186 (2000)Google Scholar
  18. 18.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of 14th International Conference on Computational Linguistics, France, pp. 539–545 (1992)Google Scholar
  19. 19.
    Ismond, K.P., Shiri, A.: The medical digital library landscape. Online Information Review 31(6), 744–758 (2007)CrossRefGoogle Scholar
  20. 20.
    Kietz, J.U., Maedche, A., Volz, R.: A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet. In: Proceedings of the EKAW 2000 Workshop on Ontologies and Texts, Amsterdam, The Netherlands. CEUR Workshop Proceedings, vol. 51, pp. 4.1–4.14 (2000)Google Scholar
  21. 21.
    Kobayashi, M., Takeda, K.: Information Retrieval on the Web. ACM Computing Surveys 32(2), 144–173 (2000)CrossRefGoogle Scholar
  22. 22.
    Lin, D.: Automatic Retrieval and Clustering of Similar Words. In: Proceedings of the 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Canada, pp. 768–773 (1998)Google Scholar
  23. 23.
    Maarek, Y.S., Fagin, R., Ben-Shaul, I.Z., Pelleg, D.: Ephemeral document clustering for Web applications. Technical Report RJ 10186, IBM Research (2000)Google Scholar
  24. 24.
    Morin, E.: Automatic acquisition of semantic relations between terms from technical corpora. In: Proceedings of the fifth international congress on terminology and knowledge engineering. TermNet-Verlag, Vienna (1999)Google Scholar
  25. 25.
    Navigli, R., Velardi, P.: Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites. Computational Linguistics 30(2), 151–179 (2004)CrossRefzbMATHGoogle Scholar
  26. 26.
    Popescu, A., Etzioni, O.: Extracting Product Features and Opinions from Reviews. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 339–346 (2005)Google Scholar
  27. 27.
    Sánchez, D., Moreno, A.: Pattern-based automatic taxonomy learning from the Web. AI Communications 21(1), 27–48 (2008)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Sánchez, D., Moreno, A.: Automatic Discovery of Synonyms and Lexicalizations from the Web. In: Artificial Intelligence Research and Development, pp. 205–212. IOS Press, Amsterdam (2005)Google Scholar
  29. 29.
    Turney, P.D.: Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–499. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  30. 30.
    Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to Web search results. In: Proceedings of the Eighth International WWW Conference, Canada, pp. 1361–1374 (2000)Google Scholar
  31. 31.
    Zhang, D., Dong, Y.: Semantic, Hierarchical, Online Clustering of Web Search Results. In: Proceedings of the 6th Asia Pacific Web Conference, China (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • David Sánchez
    • 1
  • Antonio Moreno
    • 1
  1. 1.ITAKA-Intelligent Technologies for Advanced Knowledge Acquisition Department of Computer Science and MathematicsUniversity Rovira i VirgiliTarragonaSpain

Personalised recommendations