Relational Clustering for the Analysis of Internet Newsgroups

Runkler, T. A.; Bezdek, J. C.

doi:10.1007/978-3-642-55721-7_30

Relational Clustering for the Analysis of Internet Newsgroups

T. A. Runkler⁶ &
J. C. Bezdek⁷

Conference paper

1039 Accesses
1 Citations

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

Clustering is used to determine partitions and prototypes from pattern sets. Sets of numerical patterns can be clustered by alternating optimization (AO) of clustering objective functions or by alternating cluster estimation (ACE). Sets of non-numerical patterns can often be represented numerically by (pairwise) relations. For text data, relational data can be automatically computed using the Levenshtein (or edit) distance. These relational data sets can be clustered by relational ACE (RACE). For text data, the RACE cluster centers can be used as keywords. In particular, the cluster centers extracted by RACE from internet newsgroup articles serve as keywords for those articles. These keywords can be used for automatic document classification.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BALL, G. H. and HALL, D. J. (1965): Isodata, an iterative method of multivariate analysis and pattern classification. In: Proceedings of the IFIPS Congress
Google Scholar
BEZDEK, J. C. (1981): Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Book MATH Google Scholar
DUNN, J. C. (1974): A fuzzy relative of the ISODATA process and its use in detecting compact, well separated clusters. Journal of Cybernetics, Vol. 3, 32–57.
Article Google Scholar
HAMMING, R. W. (1950): Error detecting and error correcting codes. The Bell System Technical Journal, Vol. 26, Number 2, 147–160.
MathSciNet Google Scholar
HAN, E.-H. and KARYPIS, G. (2000): Centroid-based document classification: Analysis & experimental results. Technical Report 00-017, University of Minnesota, Department of Computer Science.
Google Scholar
JAIN, A. K., MURTY, M. N., and FLYNN, P. J. (1999): Data clustering: A review. ACM Computing Surveys, Vol. 31, Number 3, 264–323.
Article Google Scholar
KRISHNAPURAM, R. and KELLER, J. M. (1993): A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems, Vol. 1, Number 2, 98–110.
Article Google Scholar
LARSEN, B. and AONE, C. (1999): Fast and effective text mining using linear time document clustering. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 16–22.
Google Scholar
LEVENSHTEIN, V. I. (1966): Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl., Vol. 6, 705–710.
Google Scholar
PFAFFENBERGER, B. (1995): The Usenet Book: finding, using, and surviving newsgroups on the internet. Addison-Wesley
Google Scholar
RUNKLER, T. A. (2000): Information Mining — Methoden, Algorithmen und Anwendungen intelligenter Datenanalyse. Vieweg, Wiesbaden
MATH Google Scholar
RUNKLER, T. A. and BEZDEK, J. C. (1998): RACE: Relational alternating cluster estimation and the wedding table problem, in Brauer (Ed.): Proceedings of the Workshop Fuzzy-Neuro-Systems, München, Infix, Sankt Augustin, 330–337.
Google Scholar
RUNKLER, T. A. and BEZDEK, J. C. (1999): Alternating cluster estimation: A new tool for clustering and function approximation. IEEE Transactions on Fuzzy Systems, Vol. 7, Number 4, 377–393.
Article Google Scholar
RUNKLER, T. A. and BEZDEK, J. C. (2000): Automatic keyword extraction with relational clustering and Levenshtein distances, in Langari (Ed.): Proceedings of the IEEE International Conference on Fuzzy Systems, San Antonio, USA, IEEE Press, Piscatway, 636–640.
Google Scholar
RUNKLER, T. A. and BEZDEK, J. C. (2001): Classification of internet newsgroup articles using RACE, in Hall (Ed.): Proceedings of the Joint IFSA World Congress and NAFIPS International Conference, Vancouver, Canada.
Google Scholar
STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000): A comparison of document clustering techniques. KDD Workshop on Text Mining.
Google Scholar
YANG, Y. and Liu, X. (1999): A re-examination of text categorization methods. International SIGIR Conference.
Google Scholar

Download references

Author information

Authors and Affiliations

Corporate Technology, Information and Communications, Siemens AG, D-81730, München, Germany
T. A. Runkler
Computer Science Department, University of West Florida, 11000 University Parkway, Pensacola, FL, 32514, USA
J. C. Bezdek

Authors

T. A. Runkler
View author publications
You can also search for this author in PubMed Google Scholar
J. C. Bezdek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Munich School of Management Institute of Corporate Development and Organization, University of Munich, Kaulbachstraße 45/1, 80539, Munich, Germany
Manfred Schwaiger
Department of Mathematical Methods in Economics, University of Augsburg, Universitätsstraße 16, 86159, Augsburg, Germany
Otto Opitz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Runkler, T.A., Bezdek, J.C. (2003). Relational Clustering for the Analysis of Internet Newsgroups. In: Schwaiger, M., Opitz, O. (eds) Exploratory Data Analysis in Empirical Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55721-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-55721-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44183-0
Online ISBN: 978-3-642-55721-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics