Analysis of Clustering Algorithms for Web-Based Search

Eissen, Sven Meyer zu; Stein, Benno

doi:10.1007/3-540-36277-0_16

Sven Meyer zu Eissen³ &
Benno Stein³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2569))

Included in the following conference series:

International Conference on Practical Aspects of Knowledge Management

753 Accesses
14 Citations

Abstract

Automatic document categorization plays a key role in the development of future interfaces for Web-based search. Clustering algorithms are considered as a technology that is capable of mastering this “ad-hoc” categorization task. This paper presents results of a comprehensive analysis of clustering algorithms in connection with document categorization. The contributions relate to exemplarbased, hierarchical, and density-based clustering algorithms. In particular, we contrast ideal and real clustering settings and present runtime results that are based on efficient implementations of the investigated algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Thomas Bailey and John Cowles. Cluster Definition by the Optimization of Simple Measures. IEEE Transactions on Pattern Analysis and Machine Intelligence, September 1983.
Google Scholar
J. C. Bezdek, W. Q. Li, Y. Attikiouzel, and M. Windham. A Geometric Approach to Cluster Validity for Normal Mixtures. Soft Computing 1, September 1997.
Google Scholar
J. C. Bezdek and N. R. Pal. ClusterValidation with Generalized Dunn’s Indices. In N. Kasabov and G. Coghill, editors, Proceedings of the 2nd international two-stream conference on ANNES, pages 190–193, Piscataway, NJ, 1995. IEEE Press.
Google Scholar
Simon Dennis, Peter Bruza, and Robert McArthur. Web searching: A process-oriented experimental study of three interactive search paradigms. JASIST, 53(2):120–133, 2002.
Article Google Scholar
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD96), 1996.
Google Scholar
Zubrzchi] K. Florek, J. Lukaszewiez, J. Perkal, H. Steinhaus, and S. Zubrzchi. Sur la liason et la division des points d’un ensemble fini. Colloquium Methematicum, 2, 1951.
Google Scholar
S.C. Johnson. Hierarchical clustering schemes. Psychometrika, 32, 1967.
Google Scholar
G. Karypis, E.-H. Han, and V. Kumar. Chameleon:A hierarchical clustering algorithm using dynamic modeling. Technical Report Paper No. 432, University of Minnesota, Minneapolis, 1999.
Google Scholar
Leonard Kaufman and Peter J. Rousseuw. Finding Groups in Data. Wiley, 1990.
Google Scholar
T. Kohonen. Self Organization and Assoziative Memory. Springer, 1990.
Google Scholar
Gerald Kowalsky. Information Retrieval Systems-Theory and Implementation. Kluwer Academic, 1997.
Google Scholar
Bjornar Larsen and Chinatsu Aone. Fast and Effective Text Mining Using Linear-time DocumentClustering. In Proceedings of the KDD-99 Workshop San Diego USA, San Diego, CA,USA, 1999.
Google Scholar
Thomas Lengauer. Combinatorical algorithms for integrated circuit layout. Applicable Theory in Computer Science. Teubner-Wiley, 1990.
Google Scholar
David D. Lewis. Reuters-21578 Text Categorization Test Collection. http://www.research.att.com/~lewis, 1994.
J. B. MacQueen. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281–297, 1967.
Google Scholar
M.F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
Google Scholar
C. J. van Rijsbergen. Information Retrieval. Buttersworth, London, 1979.
Google Scholar
Tom Roxborough and Arunabha. Graph Clustering using Multiway Ratio Cut. In Stephen North, editor, Graph Drawing, Lecture Notes in Computer Science, Springer, 1996.
Google Scholar
Reinhard Sablowski and Arne Frick. Automatic Graph Clustering. In Stephan North, editor, Graph Drawing, Lecture Notes in Computer Science, Springer, 1996.
Google Scholar
G. Salton. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, 1988.
Google Scholar
P.H.A. Sneath. The application of computers to taxonomy. J. Gen. Microbiol., 17, 1957.
Google Scholar
Benno Stein and Oliver Niggemann. 25.Workshop on Graph Theory, chapter On the Nature of Structure and its Identification. Lecture Notes on Computer Science, LNCS. Springer, Ascona, Italy, July 1999.
Google Scholar
Zhenyu Wu and Richard Leahy. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, November 1993.
Google Scholar
J. T. Yan and P. Y. Hsiao. A fuzzy clustering algorithm for graph bisection. Information Processing Letters, 52, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Paderborn University, D-33095, Paderborn, Germany
Sven Meyer zu Eissen & Benno Stein

Authors

Sven Meyer zu Eissen
View author publications
You can also search for this author in PubMed Google Scholar
Benno Stein
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Knowledge Engineering, University of Vienna, Brünner Str. 72, 1210, Vienna, Austria
Dimitris Karagiannis
Business Operation Systems, Esslenstr. 3, 8280, Kreuzlingen, Switzerland
Ulrich Reimer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Eissen, S.M.z., Stein, B. (2002). Analysis of Clustering Algorithms for Web-Based Search. In: Karagiannis, D., Reimer, U. (eds) Practical Aspects of Knowledge Management. PAKM 2002. Lecture Notes in Computer Science(), vol 2569. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36277-0_16

Download citation

DOI: https://doi.org/10.1007/3-540-36277-0_16
Published: 16 December 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00314-4
Online ISBN: 978-3-540-36277-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics