Abstract
The biases of individual algorithms for non-parametric document clustering can lead to non-optimal solutions. Ensemble clustering methods may overcome this limitation, but have not been applied to document collections. This paper presents a comparison of strategies for non-parametric document ensemble clustering.
Keywords
- Cluster Algorithm
- Document Collection
- Ensemble Method
- Hierarchical Agglomerative Cluster
- Document Cluster
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: Proc. of CIKM (2002)
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55(3) (2004)
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrica 50 (1985)
Li, T., Ma, S., Ogihara, M.: Document clustering via adaptive subspace iteration. In: Proc. of SIGIR (2004)
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society Series B 63(2) (2001)
Fraley, C., Raftery, A.: How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal 41(8) (1998)
Surdeanu, M., Turmo, J., Ageno, A.: A hybrid unsupervised approach for document clustering. In: Proc. of KDD (2005)
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(12) (2005)
Strehl, A., Ghosh, J.: Cluster ensembles - A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3 (2002)
Siersdorfer, S., Sizov, S.: Restrictive clustering and metaclustering for self-organizing document collections. In: Proc. of SIGIR (2004)
Greene, D., Cunningham, P.: Efficient ensemble methods for document clustering. Technical report, Department of Computer Science, Trinity College Dublin (2006)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proc. of ICDE (2005)
Fred, A., Jain, A.: Robust data clustering. In: Proc. of CVPR (2003)
Li, T., Ogihara, M., Ma, S.: On combining multiple clusterings. In: Proc. of CIKM (2004)
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3) (2000)
Slonim, N.: The Information Bottleneck: Theory and Applications. PhD thesis, The Hebrew University (2003)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics 3 (1974)
Dhillon, I., Guan, Y.: Information theoretic clustering of sparse co-occurrence data. In: Proc. of ICDM (2003)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gonzàlez, E., Turmo, J. (2008). Comparing Non-parametric Ensemble Methods for Document Clustering. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds) Natural Language and Information Systems. NLDB 2008. Lecture Notes in Computer Science, vol 5039. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69858-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-69858-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69857-9
Online ISBN: 978-3-540-69858-6
eBook Packages: Computer ScienceComputer Science (R0)