Topic Number Estimation by Consensus Soft Clustering with NMF

Yokoi, Takeru

doi:10.1007/978-3-642-17569-5_9

Takeru Yokoi²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6485))

Included in the following conference series:

International Conference on Future Generation Information Technology

2020 Accesses

Abstract

We propose here a novel method to estimate the number of topics in a document set using consensus clustering based on Non-negative Matrix Factorization (NMF). It is useful to automatically estimate the number of topics from a document set since various approaches to extract topics can determine their number through heuristics. Consensus clustering makes it possible to obtain a consensus of multiple results of clustering so that robust clustering is achieved and the number of clusters is regarded as the optimized number. In this paper, we have proposed a novel consensus soft clustering algorithm based on NMF and estimated an optimized number of topics by searching through a robust classification of documents for the topics obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Larsen, B., Aone, C.: Fast and Effective Text Mining using Linear-time Document Clustering. In: 5th International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 16–22 (1999)
Google Scholar
Pelleg, D., Moore, A.: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: 17th International Conference on Machine Learning, pp. 727–734 (2000)
Google Scholar
Windham, M., Culter, A.: Information Ratios for Validating Mixture Analysis. Journal of the American Statistical Association 87, 1182–1192 (1992)
Article Google Scholar
The, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet Process. Technical Report 653, Department of Statistics, University of California at Berkeley (2004)
Google Scholar
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Journal of Machine Learning 52, 91–118 (2003)
Article MATH Google Scholar
Li, T., Ding, C.: Weighted Consensus Clustering. In: Jonker, W., Petković, M. (eds.) SDM 2008. LNCS, vol. 5159, pp. 798–809. Springer, Heidelberg (2008)
Chapter Google Scholar
Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and Molecular Pattern Discovery using Matrix Factorization. PNAS 101(12), 4164–4169 (2004)
Article Google Scholar
Rui, X., Wunsch II, D.C.: Clustering, pp. 267–268. J. Wiley & Sons Inc., NJ (2009)
Google Scholar
Berry, M.W., Browne, M., Langville, A.N.: Algorithms and Applications for Approximate Nonnegative Matrix Factorization, V. In: Pauca, V.P., Plemmons, R.J. (eds.) Computational Statistics & Data Analysis, vol. 52(1), pp. 155–173 (2008)
Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1983)
MATH Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. Advanced Neural Information Processing Systems 13, 556–562 (2001)
Google Scholar
Punera, K., Ghosh, J.: Consensus-Based Ensembles of Soft Clustering. In: International Conference on Machine Learning: Models, Technologies & Applications (MLMTA 2007), pp. 3–9 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Tokyo Metropolitan College of Industrial Technology, Japan
Takeru Yokoi

Authors

Takeru Yokoi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hannam University, 133 Ojeong-dong, daeduk-gu, 306-791, Daejeon, South Korea
Tai-hoon Kim
Hannam University, Daejeon, South Korea
Young-hoon Lee
University of Tasmania, Hobart, Tasmania, Australia
Byeong-Ho Kang
University of Warsaw & Infobright Inc.,, Poland
Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yokoi, T. (2010). Topic Number Estimation by Consensus Soft Clustering with NMF. In: Kim, Th., Lee, Yh., Kang, BH., Ślęzak, D. (eds) Future Generation Information Technology. FGIT 2010. Lecture Notes in Computer Science, vol 6485. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17569-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-17569-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17568-8
Online ISBN: 978-3-642-17569-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics