Topic Number Estimation by Consensus Soft Clustering with NMF
We propose here a novel method to estimate the number of topics in a document set using consensus clustering based on Non-negative Matrix Factorization (NMF). It is useful to automatically estimate the number of topics from a document set since various approaches to extract topics can determine their number through heuristics. Consensus clustering makes it possible to obtain a consensus of multiple results of clustering so that robust clustering is achieved and the number of clusters is regarded as the optimized number. In this paper, we have proposed a novel consensus soft clustering algorithm based on NMF and estimated an optimized number of topics by searching through a robust classification of documents for the topics obtained.
KeywordsConsensus Clustering Estimation of the number of topics Soft Clustering Topic extraction
Unable to display preview. Download preview PDF.
- 1.Larsen, B., Aone, C.: Fast and Effective Text Mining using Linear-time Document Clustering. In: 5th International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 16–22 (1999)Google Scholar
- 2.Pelleg, D., Moore, A.: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: 17th International Conference on Machine Learning, pp. 727–734 (2000)Google Scholar
- 4.The, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet Process. Technical Report 653, Department of Statistics, University of California at Berkeley (2004)Google Scholar
- 8.Rui, X., Wunsch II, D.C.: Clustering, pp. 267–268. J. Wiley & Sons Inc., NJ (2009)Google Scholar
- 9.Berry, M.W., Browne, M., Langville, A.N.: Algorithms and Applications for Approximate Nonnegative Matrix Factorization, V. In: Pauca, V.P., Plemmons, R.J. (eds.) Computational Statistics & Data Analysis, vol. 52(1), pp. 155–173 (2008)Google Scholar
- 11.Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. Advanced Neural Information Processing Systems 13, 556–562 (2001)Google Scholar
- 12.Punera, K., Ghosh, J.: Consensus-Based Ensembles of Soft Clustering. In: International Conference on Machine Learning: Models, Technologies & Applications (MLMTA 2007), pp. 3–9 (2007)Google Scholar