Skip to main content

Document Clustering with Cluster Refinement and Non-negative Matrix Factorization

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5864))

Abstract

Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering method using the clustering method based NMF (Non-negative Matrix Factorization) and refinement of documents in clusters by using coherence of cluster. The proposed method can improve the quality of document clustering because the re-assigned documents in cluster by using coherence of cluster based similarity between documents, the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set better. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  2. Frankes, W.B., Ricardo, B.Y.: Information Retrieval, Data Structure & Algorithms. Prentice-Hall, Englewood Cliffs (1992)

    Google Scholar 

  3. Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)

    Google Scholar 

  4. Hu, T., Xiong, H., Zhou, W., Sung, S.Y., Luo, H.: Hypergraph Partitioning for Document Clustering: A Unified Clique Perspective. In: Proceeding of SIGIR 2008, pp. 871–872 (2008)

    Google Scholar 

  5. Ji, X., Xu, W., Zhu, S.: Document Clustering with Prior Knowledge. In: Proceeding of SIGIR 2006, pp. 405–412 (2006)

    Google Scholar 

  6. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)

    Article  Google Scholar 

  7. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13, 556–562 (2001)

    Google Scholar 

  8. Li, T., Ding, C.: The Relationships Among Various Nonegative Matrix Factorization Method for Clustering. In: Proceeding of ICDM 2006 (2006)

    Google Scholar 

  9. Liu, X., Gong, Y., Xu, W., Zhu, S.: Document Clustering with Cluster Refinement and Model Selection Capabilities. In: Proceeding of SIGIR 2002, pp. 191–198 (2002)

    Google Scholar 

  10. Li, T., Ma, S., Ogihara, M.: Document Clustering via Adaptive Subspace Iteration. In: Proceeding of SIGIR 2004, pp. 218–225 (2004)

    Google Scholar 

  11. Ricardo, B.Y., Berthier, R.N.: Moden Information Retrieval. ACM Press, New York (1999)

    Google Scholar 

  12. Wang, J., Zeng, H., Chen, Z., Lu, H., Tao, L., Ma, W.Y.: ReCoM: Reinforcement Clustering of Multi-Type Interrelated Data Objects. In: Proceeding of SIGIR 2003 (2003)

    Google Scholar 

  13. Wang, F., Zhang, C.: Regularized Clustering for Documents. In: Proceeding of ACM SIGIR 2007, pp. 95–102 (2007)

    Google Scholar 

  14. Wild, S., Curry, J., Dougherty, A.: Motivating Non-Negative Matrix Factorizations. In: Proceeding of SIAM ALA 2003 (2003)

    Google Scholar 

  15. Xu, W., Gong, Y.: Document Clustering by Concept Factorization. In: Proceeding of SIGIR 2004, pp. 202–209 (2004)

    Google Scholar 

  16. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceeding of ACM SIGIR 2003 (2003)

    Google Scholar 

  17. Zhang, X., Hu, X., Zhou, X.: A Comparative Evaluation of Different Link Types on Enhancing Document Clustering. In: Proceeding of SIGIR 2008, pp. 555–562 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Park, S., An, D.U., Char, B., Kim, CW. (2009). Document Clustering with Cluster Refinement and Non-negative Matrix Factorization. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10684-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10684-2_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10682-8

  • Online ISBN: 978-3-642-10684-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics