Abstract
This paper proposes a new document clustering method using the reweighted term based on semantic features for enhancing document clustering. The proposed method uses document samples of cluster by user to reduce the semantic gap between the user’s requirement and clustering results by machine. The method can enhance the document clustering because it uses the reweighted term which can well represent an inherent structure of document set relevant to a user’s requirement. The experimental results demonstrate that the proposed method achieves better performance than related document clustering methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hu X, Zhang X, Lu C, Park EK, Zhou X (2009) Exploiting wikipedia as external knowledge for document clustering. In: Proceeding of the 15th ACM SIGKDD conference on knowledge discovery and data mining (KDD’09). Paris, France, pp 389–396
Hu T, Xiong H, Zhou WS, Sung Y, Luo H (2008) Hypergraph partitioning for document clustering: a unified clique perspective. In: Proceeding of the ACM SIGIR conference on research and development in information retrieval (SIGIR’08). Singapore, pp 871–872
Park S, Kim KJ (2010) Document clustering using non-negative matrix factorization and fuzzy relationship. J Korea Navig Inst 14(2):239–246
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceeding of the ACM SIGIR conference on research and development in information retrieval (SIGIR’03). Toronto, Canada
Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceeding of the ACM SIGIR conference on research and development in information retrieval (SIGIR’04). UK, pp 202–209
Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. In: Proceeding of the ACM SIGIR conference on research and development in information retrieval (SIGIR’04). UK, pp 218–225
Wang F, Zhang C (2007) Regularized clustering for documents. In: Proceeding of the ACM SIGIR conference on research and development in information retrieval (SIGIR’07). Amsterdam, pp 95–102
Park S, An DU, Cha BR, Kim CW (2009) Document clustering with cluster refinement and non-negative matrix factorization. In: Proceeding of the 16th international conference on neural information processing (ICONIP’09). Bangkok, Thailand
Park S, An DU, Choi IC (2010) Document clustering using weighted semantic features and cluster similarity. In: Proceeding of the 3rd IEEE international conference on digital game and intelligent toy enhanced learning (DIGITEL’10). Kaohsiung, Taiwan
Park S, An DU, Cha BR, Kim CW (2010) Document clustering with semantic feature and fuzzy association. In: Proceeding of the international conference on information systems, technology and management (ICISTM’10). Bangkok, Thailand
Park S, Kim KJ (2010) Document Clustering using non-negative matrix factorization and fuzzy relationship. J Korea Navig Inst 14(2):239–246
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Frankes WB, Ricardo BY (1992) Information retrieval, data structure & algorithms. Prentice-Hall, Englewood Cliffs
Ricardo BY, Berthier RN (1999) Moden information retrieval. ACM Press, New York
The 20 newsgroups data set (2012). http://people.csail.mit.edu/jrennie/20Newsgroups/
Acknowledgments
This work was supported by Priority Research Centers Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2009-0093828). “This research was supported by the The Ministry of Knowledge Economy (MKE), Korea, under the Information Technology Research Center (ITRC) support program supervised by the National IT Industry Promotion Agency (NIPA)” (NIPA-2012-H0301-12-2005).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Park, S., Park, J.G., Jeong, M.A., Jeong, J.G., Lee, Y., Lee, S.R. (2013). Enhancing Document Clustering Using Reweighting Terms Based on Semantic Features. In: Jung, HK., Kim, J., Sahama, T., Yang, CH. (eds) Future Information Communication Technology and Applications. Lecture Notes in Electrical Engineering, vol 235. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6516-0_28
Download citation
DOI: https://doi.org/10.1007/978-94-007-6516-0_28
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-6515-3
Online ISBN: 978-94-007-6516-0
eBook Packages: EngineeringEngineering (R0)