Skip to main content

Enhancing Document Clustering Using Reweighting Terms Based on Semantic Features

  • Chapter
  • First Online:
Future Information Communication Technology and Applications

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 235))

  • 159 Accesses

Abstract

This paper proposes a new document clustering method using the reweighted term based on semantic features for enhancing document clustering. The proposed method uses document samples of cluster by user to reduce the semantic gap between the user’s requirement and clustering results by machine. The method can enhance the document clustering because it uses the reweighted term which can well represent an inherent structure of document set relevant to a user’s requirement. The experimental results demonstrate that the proposed method achieves better performance than related document clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hu X, Zhang X, Lu C, Park EK, Zhou X (2009) Exploiting wikipedia as external knowledge for document clustering. In: Proceeding of the 15th ACM SIGKDD conference on knowledge discovery and data mining (KDD’09). Paris, France, pp 389–396

    Google Scholar 

  2. Hu T, Xiong H, Zhou WS, Sung Y, Luo H (2008) Hypergraph partitioning for document clustering: a unified clique perspective. In: Proceeding of the ACM SIGIR conference on research and development in information retrieval (SIGIR’08). Singapore, pp 871–872

    Google Scholar 

  3. Park S, Kim KJ (2010) Document clustering using non-negative matrix factorization and fuzzy relationship. J Korea Navig Inst 14(2):239–246

    Google Scholar 

  4. Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceeding of the ACM SIGIR conference on research and development in information retrieval (SIGIR’03). Toronto, Canada

    Google Scholar 

  5. Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceeding of the ACM SIGIR conference on research and development in information retrieval (SIGIR’04). UK, pp 202–209

    Google Scholar 

  6. Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. In: Proceeding of the ACM SIGIR conference on research and development in information retrieval (SIGIR’04). UK, pp 218–225

    Google Scholar 

  7. Wang F, Zhang C (2007) Regularized clustering for documents. In: Proceeding of the ACM SIGIR conference on research and development in information retrieval (SIGIR’07). Amsterdam, pp 95–102

    Google Scholar 

  8. Park S, An DU, Cha BR, Kim CW (2009) Document clustering with cluster refinement and non-negative matrix factorization. In: Proceeding of the 16th international conference on neural information processing (ICONIP’09). Bangkok, Thailand

    Google Scholar 

  9. Park S, An DU, Choi IC (2010) Document clustering using weighted semantic features and cluster similarity. In: Proceeding of the 3rd IEEE international conference on digital game and intelligent toy enhanced learning (DIGITEL’10). Kaohsiung, Taiwan

    Google Scholar 

  10. Park S, An DU, Cha BR, Kim CW (2010) Document clustering with semantic feature and fuzzy association. In: Proceeding of the international conference on information systems, technology and management (ICISTM’10). Bangkok, Thailand

    Google Scholar 

  11. Park S, Kim KJ (2010) Document Clustering using non-negative matrix factorization and fuzzy relationship. J Korea Navig Inst 14(2):239–246

    Google Scholar 

  12. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  Google Scholar 

  13. Frankes WB, Ricardo BY (1992) Information retrieval, data structure & algorithms. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  14. Ricardo BY, Berthier RN (1999) Moden information retrieval. ACM Press, New York

    Google Scholar 

  15. The 20 newsgroups data set (2012). http://people.csail.mit.edu/jrennie/20Newsgroups/

Download references

Acknowledgments

This work was supported by Priority Research Centers Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2009-0093828). “This research was supported by the The Ministry of Knowledge Economy (MKE), Korea, under the Information Technology Research Center (ITRC) support program supervised by the National IT Industry Promotion Agency (NIPA)” (NIPA-2012-H0301-12-2005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sun Park .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Park, S., Park, J.G., Jeong, M.A., Jeong, J.G., Lee, Y., Lee, S.R. (2013). Enhancing Document Clustering Using Reweighting Terms Based on Semantic Features. In: Jung, HK., Kim, J., Sahama, T., Yang, CH. (eds) Future Information Communication Technology and Applications. Lecture Notes in Electrical Engineering, vol 235. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6516-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-6516-0_28

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-6515-3

  • Online ISBN: 978-94-007-6516-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics