Abstract
There is an increasing interest in automating creation of semantic structures, especially topic maps, by taking advantage of existing, structured information resources. This paper gives a preview of the most popular method — based on RDF triples, and suggests a way to automate topic map creation from unstructured information sources. The method can be applied in information systems development domain when analyzing vast unstructured data repositories in preparation for system design, or when migrating large amounts of unstructured data from legacy systems. There are two innovative methods presented in the paper — Term Crawling (TC) and Clustering History Projection (CHP), which are used in order to build a topic map based on free text documents downloaded from the Internet. A sample tool, which uses described techniques, has been implemented. The preliminary results that have been achieved on the test collection are presented in concluding sections of the article.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abramowicz, W., Kowalkiewicz, M., and Zawadzki, P., 2002, Tell me what you know or I’ll tell you what you know: Skill map ontology for information technology courseware, Issues and Trends of Information Technology Management in Contemporary Organizations, Mehdi Khosrow-Pour, ed., Information Resources Management Association International Conference, Seattle, USA, 2002, Information Science Publishing.
Abramowicz, W., Kowalkiewicz, M., and Zawadzki, P., 2003, Ontology Frames for IT Courseware Representation, in: Knowledge Management: Current Issues and Challenges, E. Coakes, ed., IRM Press.
Abramowicz, W., and Piskorski, J. 2002, Information Extraction for Free Text Business Documents, Issues and Trends of Information Technology Management in Contemporary Organizations, Mehdi Khosrow-Pour, ed., Information Resources Management Association International Conference, Seattle, USA, 2002, Information Science Publishing.
Ahmed, K., 2000, Topic maps for repositories, proc. XML Europe Conference, Paris.
Baeza-Yates, R., and Ribeiro-Neto, B., 1999, Modern Information Retrieval, ACM Press, Addison Wesley Longman Limited, USA.
Bhatia, S. K., and, Deogun, J. S., 1998, Conceptual clustering in information retrieval, IEEE Transactions on Systems, Man and Cybernetics, Part B, 427–436.
The DARPA Agent Markup Language, Retrieved April 26, 2003 from: http://www.daml.org/
Ding, C., and He, X., 2002, Cluster merging and splitting in hierarchical clustering algorithms, 2002 IEEE International Conference on Data Mining, Maebashi, Japan. forskning.no, Retrieved April 26, 2003 from: http://www.forskning.no/
Dornfest, R., 2002, Google Web API. The O’Reilly Network. Retrieved September 16, 2002, from http://www.oreillynet.corn/Ipt/w1g/1283
Gómez-Pérez, A., 1999, Evaluation of taxonomic knowledge in ontologies and knowledge bases, Proc. of the Knowledge Acquisition Workshop.
Grenmo, G. O., 2000, Creating semantically valid topic maps, XML Europe Conference, Paris, France. Gronmo, G. O., Automagic topic maps, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/automagic.html
Gruber, T. R., 1993, Toward principles for the design of ontologies used for knowledge sharing, International Workshop on Formal Ontology, Padova, Italy.
International Organization for Standardization, ISO/IEC 13250, 2000, Information technology - SGML applications - topic maps, Geneva.
Knight, J. R., 1996, Discrete Pattern Matching Over Sequences and Interval Sets, Ph.D. Dissertation, Department of Computer Science, The University of Arizona.
Ksiezyk, R., 2000, Answer is just a question [of matching topic maps], XML Europe Conference, Paris, France. Principality of Liechtenstein, Retrieved April 26, 2003 from: http://llvweb.liechtenstein.li/lisite/html/liechtenstein/index.html.en
Lernen mit Topic Maps, Retrieved April 26, 2003 from: http://www.lmtm.de/
Moore, G., 2001, RDF and Topic Maps - An Exercise in Convergence, Retrieved April 26, 2003 from: http://www.topicmaps.com/topicmapsrdfpdf
Oommen, B. J., and de St. Croix E. V., 1994, String taxonomy using learning automata, IEEETSMC: IEEE Transactions on Systems, Man, and Cybernetics.
Pepper, S., 2000, The TAO of Topic Maps, finding the way in the age of infoglut, XML Europe Conference, Paris, France.
Pepper, S., 2002, Ten Theses on Topic Maps and RDF, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/rdfhtml
Pepper. S., 2002, The Ontopia MapMaker, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/MapMaker_files/frame.htm
Porter, M. F., 1980, An algorithm for suffix stripping.
Resource Description Framework (RDF) Model and Syntax Specification, Feb. 1999. W3C Recommendation.
Sowa, J. F., 2000, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks Cole Publishing Co., Pacific Grove, CA.
Steinbach, M., and Karypis, G., and Kumar V., 2000, A Comparison of document clustering techniques, retrieved April 2003 from: http://www-users.cs.umn.edu/-karypis/publications/Papers/PDF/doccluster.pdf
Walsh N., and Muellner L., 1999, DocBook: The definitive guide, O’Reilly amp; Associates, Retrieved April 26, 2003 from: http://www.oasisopen.org/docbook/documentation/reference/html/docbook.html
Wrightson, A., 2001, Topic Maps and knowledge representation, Retrieved April 26, 2003 from: http://www.ontopia.net/topicmaps/materials/kr-tm.html
Zhao, Y., and, Karypis, G., Evaluation of hierarchical clustering algorithms for document datasets“, Retrieved April 2003 from: http://citeseer.nj.nee.com/zhao02evaluation.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer Science+Business Media New York
About this paper
Cite this paper
Abramowicz, W., Kaczmarek, T., Kowalkiewicz, M. (2004). Automatic Topic Map Creation Using Term Crawling and Clustering Hierarchy Projection. In: Linger, H., et al. Constructing the Infrastructure for the Knowledge Economy. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-4852-9_42
Download citation
DOI: https://doi.org/10.1007/978-1-4757-4852-9_42
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-3459-8
Online ISBN: 978-1-4757-4852-9
eBook Packages: Springer Book Archive