Automatic Topic Map Creation Using Term Crawling and Clustering Hierarchy Projection

  • Witold Abramowicz
  • Tomasz Kaczmarek
  • Marek Kowalkiewicz
Conference paper

Abstract

There is an increasing interest in automating creation of semantic structures, especially topic maps, by taking advantage of existing, structured information resources. This paper gives a preview of the most popular method — based on RDF triples, and suggests a way to automate topic map creation from unstructured information sources. The method can be applied in information systems development domain when analyzing vast unstructured data repositories in preparation for system design, or when migrating large amounts of unstructured data from legacy systems. There are two innovative methods presented in the paper — Term Crawling (TC) and Clustering History Projection (CHP), which are used in order to build a topic map based on free text documents downloaded from the Internet. A sample tool, which uses described techniques, has been implemented. The preliminary results that have been achieved on the test collection are presented in concluding sections of the article.

Keywords

Natural Language Processing Resource Description Framework Document Collection Semantic Network Hierarchical Cluster Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abramowicz, W., Kowalkiewicz, M., and Zawadzki, P., 2002, Tell me what you know or I’ll tell you what you know: Skill map ontology for information technology courseware, Issues and Trends of Information Technology Management in Contemporary Organizations, Mehdi Khosrow-Pour, ed., Information Resources Management Association International Conference, Seattle, USA, 2002, Information Science Publishing.Google Scholar
  2. Abramowicz, W., Kowalkiewicz, M., and Zawadzki, P., 2003, Ontology Frames for IT Courseware Representation, in: Knowledge Management: Current Issues and Challenges, E. Coakes, ed., IRM Press.Google Scholar
  3. Abramowicz, W., and Piskorski, J. 2002, Information Extraction for Free Text Business Documents, Issues and Trends of Information Technology Management in Contemporary Organizations, Mehdi Khosrow-Pour, ed., Information Resources Management Association International Conference, Seattle, USA, 2002, Information Science Publishing.Google Scholar
  4. Ahmed, K., 2000, Topic maps for repositories, proc. XML Europe Conference, Paris.Google Scholar
  5. Baeza-Yates, R., and Ribeiro-Neto, B., 1999, Modern Information Retrieval, ACM Press, Addison Wesley Longman Limited, USA.Google Scholar
  6. Bhatia, S. K., and, Deogun, J. S., 1998, Conceptual clustering in information retrieval, IEEE Transactions on Systems, Man and Cybernetics, Part B, 427–436.Google Scholar
  7. The DARPA Agent Markup Language, Retrieved April 26, 2003 from: http://www.daml.org/Google Scholar
  8. Ding, C., and He, X., 2002, Cluster merging and splitting in hierarchical clustering algorithms, 2002 IEEE International Conference on Data Mining, Maebashi, Japan. forskning.no, Retrieved April 26, 2003 from: http://www.forskning.no/
  9. Dornfest, R., 2002, Google Web API. The O’Reilly Network. Retrieved September 16, 2002, from http://www.oreillynet.corn/Ipt/w1g/1283
  10. Gómez-Pérez, A., 1999, Evaluation of taxonomic knowledge in ontologies and knowledge bases, Proc. of the Knowledge Acquisition Workshop.Google Scholar
  11. Grenmo, G. O., 2000, Creating semantically valid topic maps, XML Europe Conference, Paris, France. Gronmo, G. O., Automagic topic maps, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/automagic.html
  12. Gruber, T. R., 1993, Toward principles for the design of ontologies used for knowledge sharing, International Workshop on Formal Ontology, Padova, Italy.Google Scholar
  13. International Organization for Standardization, ISO/IEC 13250, 2000, Information technology - SGML applications - topic maps, Geneva.Google Scholar
  14. Knight, J. R., 1996, Discrete Pattern Matching Over Sequences and Interval Sets, Ph.D. Dissertation, Department of Computer Science, The University of Arizona.Google Scholar
  15. Ksiezyk, R., 2000, Answer is just a question [of matching topic maps], XML Europe Conference, Paris, France. Principality of Liechtenstein, Retrieved April 26, 2003 from: http://llvweb.liechtenstein.li/lisite/html/liechtenstein/index.html.en
  16. Lernen mit Topic Maps, Retrieved April 26, 2003 from: http://www.lmtm.de/Google Scholar
  17. Moore, G., 2001, RDF and Topic Maps - An Exercise in Convergence, Retrieved April 26, 2003 from: http://www.topicmaps.com/topicmapsrdfpdf
  18. Oommen, B. J., and de St. Croix E. V., 1994, String taxonomy using learning automata, IEEETSMC: IEEE Transactions on Systems, Man, and Cybernetics.Google Scholar
  19. Pepper, S., 2000, The TAO of Topic Maps, finding the way in the age of infoglut, XML Europe Conference, Paris, France.Google Scholar
  20. Pepper, S., 2002, Ten Theses on Topic Maps and RDF, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/rdfhtml
  21. Pepper. S., 2002, The Ontopia MapMaker, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/MapMaker_files/frame.htm
  22. Porter, M. F., 1980, An algorithm for suffix stripping.Google Scholar
  23. Resource Description Framework (RDF) Model and Syntax Specification, Feb. 1999. W3C Recommendation.Google Scholar
  24. Sowa, J. F., 2000, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks Cole Publishing Co., Pacific Grove, CA.Google Scholar
  25. Steinbach, M., and Karypis, G., and Kumar V., 2000, A Comparison of document clustering techniques, retrieved April 2003 from: http://www-users.cs.umn.edu/-karypis/publications/Papers/PDF/doccluster.pdf
  26. Walsh N., and Muellner L., 1999, DocBook: The definitive guide, O’Reilly amp; Associates, Retrieved April 26, 2003 from: http://www.oasisopen.org/docbook/documentation/reference/html/docbook.html
  27. Wrightson, A., 2001, Topic Maps and knowledge representation, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/kr-tm.html
  28. Zhao, Y., and, Karypis, G., Evaluation of hierarchical clustering algorithms for document datasets“, Retrieved April 2003 from: http://citeseer.nj.nee.com/zhao02evaluation.html

Copyright information

© Springer Science+Business Media New York 2004

Authors and Affiliations

  • Witold Abramowicz
    • 1
  • Tomasz Kaczmarek
    • 1
  • Marek Kowalkiewicz
    • 1
  1. 1.The Poznañ University of EconomicsPoznañPoland

Personalised recommendations