Abstract
Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. This area is growing rapidly mainly because of the strong need for analysing the huge and large amount of textual data that reside on internal file systems and the Web. Text document clustering provides an effective navigation mechanism to organize this large amount of data by grouping their documents into a small number of meaningful classes. In this paper we proposed a fuzzy text document clustering approach using WordNet lexical categories and Fuzzy c-Means algorithm. Some experiments are performed to compare efficiency of the proposed approach with the recently reported approaches. Experimental results show that Fuzzy clustering leads to great performance results. Fuzzy c-means algorithm overcomes other classical clustering algorithms like k-means and bisecting k-means in both clustering quality and running time efficiency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
F. Beil, M. Ester, and X. Xu, “Frequent term-based text clustering.” KDD 02, 2002, pp. 436–442.
J.C. Bezdec, “Pattern Recognition with Fuzzy Objective Function Algorithms”, Plenum Press, New York, 1981.
M. Lan, C.L. Tan, H.B. Low, and S.Y. Sung, “A Comprehensive Comparative Study on Term Weighting Schemes”, 14th International World Wide Web (WWW2005) Conference, Japan 2005
C. Borgelt, and A. Nurnberger, “Fast Fuzzy Clustering of Web Page Collections”. PKDD Workshop on Statistical Approaches for Web Mining, 2004.
S. Chakrabarti, “Mining the Web: Discovering Knowledge from Hypertext Data”. Morgan Kaufmann Publishers, 2002.
S. Chua, and N. Kulathuramaiyer, “Semantic feature selection using wordnet”. In Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, 2004, pp. 166–172.
D.M.P.K. Dave, & S. Lawrence, “Mining the peanut gallery: Opinion extraction and semantic classification of product reviews” WWW 03 ACM, 2003, pp. 519–528.
S.J. Green, “Building hypertext links in newspaper articles using semantic similarity”. NLDB 97, 1997, pp. 178–190.
S.J. Green, “Building hypertext links by computing semantic similarity”. TKDE 1999, 11(5), pp.50–57.
A. Hotho, S. Staab, and G. Stumme, “Wordnet improves text document clustering”. ACM SIGIR Workshop on Semantic Web 2003.
L. Kaufman, and P.J. Rousseeuw, “Finding Groups in Data: an Introduction to Cluster Analysis”, John Wiley & Sons, 1999.
WordNet project available at: http://wordnet.princeton.edu/
B. Larsen, and C. Aone, “Fast and effective text mining using lineartime document clustering”. The 5th ACM SIGKDD international conference on knowledge discovery and data mining, 1999, pp. 16–22.
D.I. Moldovan, and R. Mihalcea, “Using wordnet and lexical operators to improve internet searches”. IEEE Internet Computing 2000, 4(1), pp. 34–43.
D. Reforgiato, “A new unsupervised method for document clustering by using WordNet lexical and conceptual relations”. Journal of Information Retrieval, Vol (10), 2007, pp.563–579.
J. Sedding, and D. Kazakov, “WordNet-based Text Document Clustering”, COLING 3rd Workshop on Robust Methods in Analysis of Natural Language Data, 2004.
M. Steinbach, G. Karypis, and V. Kumar, “A Comparison of Document Clustering Techniques”, Department of Computer Science and Engineering, University of Minnesota, Technical Report #00- 034, 2000
E.M. Voorhees, “Query expansion using lexical-semantic relations”. In Proceedings of ACM-SIGIR, 1994, pp. 61–69.
B.B. Wang, R.I. McKay, H.A. Abbass, and M. Barlow, “Learning text classifier using the domain concept hierarchy.” In Proceedings of International Conference on Communications, Circuits and Systems, China, 2002.
O. Zamir, O. Etzioni, O. Madani, and R.M. Karp, “Fast and intuitive clustering of web documents”. KDD 97, 1997, pp. 287–290.
M. Friedman, A. Kandel, M. Schneider, M. Last, B. Shapka, Y. Elovici and O. Zaafrany, “A Fuzzy-Based Algorithm for Web Document Clustering”. Fuzzy Information, Processing NAFIPS '04, IEEE, 2004.
M.E.S. Mendes Rodrigues, L. Sacks, “A Scalable Hierarchical Fuzzy Clustering Algorithm for Text Mining”. In Proceedings of the 5th International Conference on Recent Advances in Soft Computing, 2004.
M.E.S. Mendes Rodrigues, L. Sacks, “Evaluating fuzzy clustering for relevance-based information access”. In Proceedings of the 12th IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2003
EMail1200 dataset available at: http://boole.cs.iastate.edu/book/acad/bag/data/lingspam
SCOTS dataset available at: http://www.scottishcorpus.ac.uk/
Reuters dataset available at: http://www.daviddlewis.com/resources/testcollections/reuters21578/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this paper
Cite this paper
Gharib, T.F., Fouad, M.M., Aref, M.M. (2010). Fuzzy Document Clustering Approach using WordNet Lexical Categories. In: Elleithy, K. (eds) Advanced Techniques in Computing Sciences and Software Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3660-5_31
Download citation
DOI: https://doi.org/10.1007/978-90-481-3660-5_31
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-3659-9
Online ISBN: 978-90-481-3660-5
eBook Packages: Computer ScienceComputer Science (R0)