Abstract
Common text clustering techniques offer rather poor capabilities for explaining to their users why a particular result has been achieved. They have the disadvantage that they do not relate semantically nearby terms and that they cannot explain how resulting clusters are related to each other. In this paper, we discuss a way of integrating a large thesaurus and the computation of lattices of resulting clusters into common text clustering in order to overcome these two problems. As its major result, our approach achieves an explanation using an appropriate level of granularity at the concept level as well as an appropriate size and complexity of the explaining lattice of resulting clusters.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Buckley, C., Lewit, A.: Optimizations of inverted vector searches. In: SIGIR 1985, pp. 97–110 (1985)
Cohen, W.W.: Fast effective rule induction. In: Proc. of ICML 1995, July 9–12, pp. 115–123. Morgan Kaufmann, San Francisco (1995)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: A cluster-based approach to browsing large document collections. In: SIGIR 1992, pp. 318–329 (1992)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998)
Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999)
Hofmann, T.: Probabilistic latent semantic indexing. In: Research and Development in Information Retrieval, pp. 50–57 (1999)
Hotho, A., Maedche, A., Staab, S., Studer, R.: SEAL-II — the soft spot between richly structured and unstructured knowledge. Journal of Universal Computer Science (J.UCS) 7(7), 566–590 (2001)
Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Proc. Of the SIGIR 2003 Semantic Web Workshop (2003)
Karypis, G., Han, E.-H.: Fast supervised dimensionality reduction algorithm with applications to document categorization and retrieval. In: Proceedings of CIKM 2000, pp. 12–19. ACM Press, New York (2000)
Kowalski, G.: Information Retrieval systems-theory and implementations. Kluwer Academic Publishers, Dordrecht (1997)
Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intelligent Systems 16(2), 72–79 (2001)
Mladenic, D.: Text learning and related intelligent agents. IEEE Expert (July/August 1999)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Van Rijsbergen, C.: Information Retrieval. Buttersworth, London (1989)
Salton, G.: Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)
Stumme, G., Taouil, R., Bastide, Y., Pasqier, N., Lakhal, L.: Computing iceberg concept lattices with Titanic. J. on Knowledge and Data Engineering 42, 189–222 (2002)
Zamir, O., Etzioni, O., Madani, O., Karp, R.M.: Fast and intuitive clustering of web documents. In: KDD 1997, pp. 287–290 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hotho, A., Staab, S., Stumme, G. (2003). Explaining Text Clustering Results Using Semantic Structures. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-39804-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive