Abstract
Clustering is used to determine partitions and prototypes from pattern sets. Sets of numerical patterns can be clustered by alternating optimization (AO) of clustering objective functions or by alternating cluster estimation (ACE). Sets of non-numerical patterns can often be represented numerically by (pairwise) relations. For text data, relational data can be automatically computed using the Levenshtein (or edit) distance. These relational data sets can be clustered by relational ACE (RACE). For text data, the RACE cluster centers can be used as keywords. In particular, the cluster centers extracted by RACE from internet newsgroup articles serve as keywords for those articles. These keywords can be used for automatic document classification.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
BALL, G. H. and HALL, D. J. (1965): Isodata, an iterative method of multivariate analysis and pattern classification. In: Proceedings of the IFIPS Congress
BEZDEK, J. C. (1981): Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
DUNN, J. C. (1974): A fuzzy relative of the ISODATA process and its use in detecting compact, well separated clusters. Journal of Cybernetics, Vol. 3, 32–57.
HAMMING, R. W. (1950): Error detecting and error correcting codes. The Bell System Technical Journal, Vol. 26, Number 2, 147–160.
HAN, E.-H. and KARYPIS, G. (2000): Centroid-based document classification: Analysis & experimental results. Technical Report 00-017, University of Minnesota, Department of Computer Science.
JAIN, A. K., MURTY, M. N., and FLYNN, P. J. (1999): Data clustering: A review. ACM Computing Surveys, Vol. 31, Number 3, 264–323.
KRISHNAPURAM, R. and KELLER, J. M. (1993): A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems, Vol. 1, Number 2, 98–110.
LARSEN, B. and AONE, C. (1999): Fast and effective text mining using linear time document clustering. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 16–22.
LEVENSHTEIN, V. I. (1966): Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl., Vol. 6, 705–710.
PFAFFENBERGER, B. (1995): The Usenet Book: finding, using, and surviving newsgroups on the internet. Addison-Wesley
RUNKLER, T. A. (2000): Information Mining — Methoden, Algorithmen und Anwendungen intelligenter Datenanalyse. Vieweg, Wiesbaden
RUNKLER, T. A. and BEZDEK, J. C. (1998): RACE: Relational alternating cluster estimation and the wedding table problem, in Brauer (Ed.): Proceedings of the Workshop Fuzzy-Neuro-Systems, München, Infix, Sankt Augustin, 330–337.
RUNKLER, T. A. and BEZDEK, J. C. (1999): Alternating cluster estimation: A new tool for clustering and function approximation. IEEE Transactions on Fuzzy Systems, Vol. 7, Number 4, 377–393.
RUNKLER, T. A. and BEZDEK, J. C. (2000): Automatic keyword extraction with relational clustering and Levenshtein distances, in Langari (Ed.): Proceedings of the IEEE International Conference on Fuzzy Systems, San Antonio, USA, IEEE Press, Piscatway, 636–640.
RUNKLER, T. A. and BEZDEK, J. C. (2001): Classification of internet newsgroup articles using RACE, in Hall (Ed.): Proceedings of the Joint IFSA World Congress and NAFIPS International Conference, Vancouver, Canada.
STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000): A comparison of document clustering techniques. KDD Workshop on Text Mining.
YANG, Y. and Liu, X. (1999): A re-examination of text categorization methods. International SIGIR Conference.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Runkler, T.A., Bezdek, J.C. (2003). Relational Clustering for the Analysis of Internet Newsgroups. In: Schwaiger, M., Opitz, O. (eds) Exploratory Data Analysis in Empirical Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55721-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-55721-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44183-0
Online ISBN: 978-3-642-55721-7
eBook Packages: Springer Book Archive