Skip to main content

Relational Clustering for the Analysis of Internet Newsgroups

  • Conference paper

Abstract

Clustering is used to determine partitions and prototypes from pattern sets. Sets of numerical patterns can be clustered by alternating optimization (AO) of clustering objective functions or by alternating cluster estimation (ACE). Sets of non-numerical patterns can often be represented numerically by (pairwise) relations. For text data, relational data can be automatically computed using the Levenshtein (or edit) distance. These relational data sets can be clustered by relational ACE (RACE). For text data, the RACE cluster centers can be used as keywords. In particular, the cluster centers extracted by RACE from internet newsgroup articles serve as keywords for those articles. These keywords can be used for automatic document classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BALL, G. H. and HALL, D. J. (1965): Isodata, an iterative method of multivariate analysis and pattern classification. In: Proceedings of the IFIPS Congress

    Google Scholar 

  • BEZDEK, J. C. (1981): Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York

    Book  MATH  Google Scholar 

  • DUNN, J. C. (1974): A fuzzy relative of the ISODATA process and its use in detecting compact, well separated clusters. Journal of Cybernetics, Vol. 3, 32–57.

    Article  Google Scholar 

  • HAMMING, R. W. (1950): Error detecting and error correcting codes. The Bell System Technical Journal, Vol. 26, Number 2, 147–160.

    MathSciNet  Google Scholar 

  • HAN, E.-H. and KARYPIS, G. (2000): Centroid-based document classification: Analysis & experimental results. Technical Report 00-017, University of Minnesota, Department of Computer Science.

    Google Scholar 

  • JAIN, A. K., MURTY, M. N., and FLYNN, P. J. (1999): Data clustering: A review. ACM Computing Surveys, Vol. 31, Number 3, 264–323.

    Article  Google Scholar 

  • KRISHNAPURAM, R. and KELLER, J. M. (1993): A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems, Vol. 1, Number 2, 98–110.

    Article  Google Scholar 

  • LARSEN, B. and AONE, C. (1999): Fast and effective text mining using linear time document clustering. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 16–22.

    Google Scholar 

  • LEVENSHTEIN, V. I. (1966): Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl., Vol. 6, 705–710.

    Google Scholar 

  • PFAFFENBERGER, B. (1995): The Usenet Book: finding, using, and surviving newsgroups on the internet. Addison-Wesley

    Google Scholar 

  • RUNKLER, T. A. (2000): Information Mining — Methoden, Algorithmen und Anwendungen intelligenter Datenanalyse. Vieweg, Wiesbaden

    MATH  Google Scholar 

  • RUNKLER, T. A. and BEZDEK, J. C. (1998): RACE: Relational alternating cluster estimation and the wedding table problem, in Brauer (Ed.): Proceedings of the Workshop Fuzzy-Neuro-Systems, München, Infix, Sankt Augustin, 330–337.

    Google Scholar 

  • RUNKLER, T. A. and BEZDEK, J. C. (1999): Alternating cluster estimation: A new tool for clustering and function approximation. IEEE Transactions on Fuzzy Systems, Vol. 7, Number 4, 377–393.

    Article  Google Scholar 

  • RUNKLER, T. A. and BEZDEK, J. C. (2000): Automatic keyword extraction with relational clustering and Levenshtein distances, in Langari (Ed.): Proceedings of the IEEE International Conference on Fuzzy Systems, San Antonio, USA, IEEE Press, Piscatway, 636–640.

    Google Scholar 

  • RUNKLER, T. A. and BEZDEK, J. C. (2001): Classification of internet newsgroup articles using RACE, in Hall (Ed.): Proceedings of the Joint IFSA World Congress and NAFIPS International Conference, Vancouver, Canada.

    Google Scholar 

  • STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000): A comparison of document clustering techniques. KDD Workshop on Text Mining.

    Google Scholar 

  • YANG, Y. and Liu, X. (1999): A re-examination of text categorization methods. International SIGIR Conference.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Runkler, T.A., Bezdek, J.C. (2003). Relational Clustering for the Analysis of Internet Newsgroups. In: Schwaiger, M., Opitz, O. (eds) Exploratory Data Analysis in Empirical Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55721-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55721-7_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44183-0

  • Online ISBN: 978-3-642-55721-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics