Skip to main content

Automatic Topic Map Creation Using Term Crawling and Clustering Hierarchy Projection

  • Conference paper
Constructing the Infrastructure for the Knowledge Economy

Abstract

There is an increasing interest in automating creation of semantic structures, especially topic maps, by taking advantage of existing, structured information resources. This paper gives a preview of the most popular method — based on RDF triples, and suggests a way to automate topic map creation from unstructured information sources. The method can be applied in information systems development domain when analyzing vast unstructured data repositories in preparation for system design, or when migrating large amounts of unstructured data from legacy systems. There are two innovative methods presented in the paper — Term Crawling (TC) and Clustering History Projection (CHP), which are used in order to build a topic map based on free text documents downloaded from the Internet. A sample tool, which uses described techniques, has been implemented. The preliminary results that have been achieved on the test collection are presented in concluding sections of the article.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abramowicz, W., Kowalkiewicz, M., and Zawadzki, P., 2002, Tell me what you know or I’ll tell you what you know: Skill map ontology for information technology courseware, Issues and Trends of Information Technology Management in Contemporary Organizations, Mehdi Khosrow-Pour, ed., Information Resources Management Association International Conference, Seattle, USA, 2002, Information Science Publishing.

    Google Scholar 

  • Abramowicz, W., Kowalkiewicz, M., and Zawadzki, P., 2003, Ontology Frames for IT Courseware Representation, in: Knowledge Management: Current Issues and Challenges, E. Coakes, ed., IRM Press.

    Google Scholar 

  • Abramowicz, W., and Piskorski, J. 2002, Information Extraction for Free Text Business Documents, Issues and Trends of Information Technology Management in Contemporary Organizations, Mehdi Khosrow-Pour, ed., Information Resources Management Association International Conference, Seattle, USA, 2002, Information Science Publishing.

    Google Scholar 

  • Ahmed, K., 2000, Topic maps for repositories, proc. XML Europe Conference, Paris.

    Google Scholar 

  • Baeza-Yates, R., and Ribeiro-Neto, B., 1999, Modern Information Retrieval, ACM Press, Addison Wesley Longman Limited, USA.

    Google Scholar 

  • Bhatia, S. K., and, Deogun, J. S., 1998, Conceptual clustering in information retrieval, IEEE Transactions on Systems, Man and Cybernetics, Part B, 427–436.

    Google Scholar 

  • The DARPA Agent Markup Language, Retrieved April 26, 2003 from: http://www.daml.org/

    Google Scholar 

  • Ding, C., and He, X., 2002, Cluster merging and splitting in hierarchical clustering algorithms, 2002 IEEE International Conference on Data Mining, Maebashi, Japan. forskning.no, Retrieved April 26, 2003 from: http://www.forskning.no/

  • Dornfest, R., 2002, Google Web API. The O’Reilly Network. Retrieved September 16, 2002, from http://www.oreillynet.corn/Ipt/w1g/1283

  • Gómez-Pérez, A., 1999, Evaluation of taxonomic knowledge in ontologies and knowledge bases, Proc. of the Knowledge Acquisition Workshop.

    Google Scholar 

  • Grenmo, G. O., 2000, Creating semantically valid topic maps, XML Europe Conference, Paris, France. Gronmo, G. O., Automagic topic maps, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/automagic.html

  • Gruber, T. R., 1993, Toward principles for the design of ontologies used for knowledge sharing, International Workshop on Formal Ontology, Padova, Italy.

    Google Scholar 

  • International Organization for Standardization, ISO/IEC 13250, 2000, Information technology - SGML applications - topic maps, Geneva.

    Google Scholar 

  • Knight, J. R., 1996, Discrete Pattern Matching Over Sequences and Interval Sets, Ph.D. Dissertation, Department of Computer Science, The University of Arizona.

    Google Scholar 

  • Ksiezyk, R., 2000, Answer is just a question [of matching topic maps], XML Europe Conference, Paris, France. Principality of Liechtenstein, Retrieved April 26, 2003 from: http://llvweb.liechtenstein.li/lisite/html/liechtenstein/index.html.en

  • Lernen mit Topic Maps, Retrieved April 26, 2003 from: http://www.lmtm.de/

    Google Scholar 

  • Moore, G., 2001, RDF and Topic Maps - An Exercise in Convergence, Retrieved April 26, 2003 from: http://www.topicmaps.com/topicmapsrdfpdf

  • Oommen, B. J., and de St. Croix E. V., 1994, String taxonomy using learning automata, IEEETSMC: IEEE Transactions on Systems, Man, and Cybernetics.

    Google Scholar 

  • Pepper, S., 2000, The TAO of Topic Maps, finding the way in the age of infoglut, XML Europe Conference, Paris, France.

    Google Scholar 

  • Pepper, S., 2002, Ten Theses on Topic Maps and RDF, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/rdfhtml

  • Pepper. S., 2002, The Ontopia MapMaker, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/MapMaker_files/frame.htm

  • Porter, M. F., 1980, An algorithm for suffix stripping.

    Google Scholar 

  • Resource Description Framework (RDF) Model and Syntax Specification, Feb. 1999. W3C Recommendation.

    Google Scholar 

  • Sowa, J. F., 2000, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks Cole Publishing Co., Pacific Grove, CA.

    Google Scholar 

  • Steinbach, M., and Karypis, G., and Kumar V., 2000, A Comparison of document clustering techniques, retrieved April 2003 from: http://www-users.cs.umn.edu/-karypis/publications/Papers/PDF/doccluster.pdf

  • Walsh N., and Muellner L., 1999, DocBook: The definitive guide, O’Reilly amp; Associates, Retrieved April 26, 2003 from: http://www.oasisopen.org/docbook/documentation/reference/html/docbook.html

  • Wrightson, A., 2001, Topic Maps and knowledge representation, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/kr-tm.html

  • Zhao, Y., and, Karypis, G., Evaluation of hierarchical clustering algorithms for document datasets“, Retrieved April 2003 from: http://citeseer.nj.nee.com/zhao02evaluation.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer Science+Business Media New York

About this paper

Cite this paper

Abramowicz, W., Kaczmarek, T., Kowalkiewicz, M. (2004). Automatic Topic Map Creation Using Term Crawling and Clustering Hierarchy Projection. In: Linger, H., et al. Constructing the Infrastructure for the Knowledge Economy. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-4852-9_42

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-4852-9_42

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-3459-8

  • Online ISBN: 978-1-4757-4852-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics