Abstract
Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering method using the clustering method based NMF (Non-negative Matrix Factorization) and refinement of documents in clusters by using coherence of cluster. The proposed method can improve the quality of document clustering because the re-assigned documents in cluster by using coherence of cluster based similarity between documents, the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set better. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2003)
Frankes, W.B., Ricardo, B.Y.: Information Retrieval, Data Structure & Algorithms. Prentice-Hall, Englewood Cliffs (1992)
Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Hu, T., Xiong, H., Zhou, W., Sung, S.Y., Luo, H.: Hypergraph Partitioning for Document Clustering: A Unified Clique Perspective. In: Proceeding of SIGIR 2008, pp. 871–872 (2008)
Ji, X., Xu, W., Zhu, S.: Document Clustering with Prior Knowledge. In: Proceeding of SIGIR 2006, pp. 405–412 (2006)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13, 556–562 (2001)
Li, T., Ding, C.: The Relationships Among Various Nonegative Matrix Factorization Method for Clustering. In: Proceeding of ICDM 2006 (2006)
Liu, X., Gong, Y., Xu, W., Zhu, S.: Document Clustering with Cluster Refinement and Model Selection Capabilities. In: Proceeding of SIGIR 2002, pp. 191–198 (2002)
Li, T., Ma, S., Ogihara, M.: Document Clustering via Adaptive Subspace Iteration. In: Proceeding of SIGIR 2004, pp. 218–225 (2004)
Ricardo, B.Y., Berthier, R.N.: Moden Information Retrieval. ACM Press, New York (1999)
Wang, J., Zeng, H., Chen, Z., Lu, H., Tao, L., Ma, W.Y.: ReCoM: Reinforcement Clustering of Multi-Type Interrelated Data Objects. In: Proceeding of SIGIR 2003 (2003)
Wang, F., Zhang, C.: Regularized Clustering for Documents. In: Proceeding of ACM SIGIR 2007, pp. 95–102 (2007)
Wild, S., Curry, J., Dougherty, A.: Motivating Non-Negative Matrix Factorizations. In: Proceeding of SIAM ALA 2003 (2003)
Xu, W., Gong, Y.: Document Clustering by Concept Factorization. In: Proceeding of SIGIR 2004, pp. 202–209 (2004)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceeding of ACM SIGIR 2003 (2003)
Zhang, X., Hu, X., Zhou, X.: A Comparative Evaluation of Different Link Types on Enhancing Document Clustering. In: Proceeding of SIGIR 2008, pp. 555–562 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Park, S., An, D.U., Char, B., Kim, CW. (2009). Document Clustering with Cluster Refinement and Non-negative Matrix Factorization. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10684-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-10684-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10682-8
Online ISBN: 978-3-642-10684-2
eBook Packages: Computer ScienceComputer Science (R0)