Fuzzy Document Clustering Approach using WordNet Lexical Categories

Gharib, Tarek F.; Fouad, Mohammed M.; Aref, Mostafa M.

doi:10.1007/978-90-481-3660-5_31

Fuzzy Document Clustering Approach using WordNet Lexical Categories

Tarek F. Gharib²,
Mohammed M. Fouad³ &
Mostafa M. Aref²

Conference paper
First Online: 15 December 2009

2426 Accesses
2 Citations

Abstract

Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. This area is growing rapidly mainly because of the strong need for analysing the huge and large amount of textual data that reside on internal file systems and the Web. Text document clustering provides an effective navigation mechanism to organize this large amount of data by grouping their documents into a small number of meaningful classes. In this paper we proposed a fuzzy text document clustering approach using WordNet lexical categories and Fuzzy c-Means algorithm. Some experiments are performed to compare efficiency of the proposed approach with the recently reported approaches. Experimental results show that Fuzzy clustering leads to great performance results. Fuzzy c-means algorithm overcomes other classical clustering algorithms like k-means and bisecting k-means in both clustering quality and running time efficiency.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

F. Beil, M. Ester, and X. Xu, “Frequent term-based text clustering.” KDD 02, 2002, pp. 436–442.
Google Scholar
J.C. Bezdec, “Pattern Recognition with Fuzzy Objective Function Algorithms”, Plenum Press, New York, 1981.
Google Scholar
M. Lan, C.L. Tan, H.B. Low, and S.Y. Sung, “A Comprehensive Comparative Study on Term Weighting Schemes”, 14th International World Wide Web (WWW2005) Conference, Japan 2005
Google Scholar
C. Borgelt, and A. Nurnberger, “Fast Fuzzy Clustering of Web Page Collections”. PKDD Workshop on Statistical Approaches for Web Mining, 2004.
Google Scholar
S. Chakrabarti, “Mining the Web: Discovering Knowledge from Hypertext Data”. Morgan Kaufmann Publishers, 2002.
Google Scholar
S. Chua, and N. Kulathuramaiyer, “Semantic feature selection using wordnet”. In Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, 2004, pp. 166–172.
Google Scholar
D.M.P.K. Dave, & S. Lawrence, “Mining the peanut gallery: Opinion extraction and semantic classification of product reviews” WWW 03 ACM, 2003, pp. 519–528.
Google Scholar
S.J. Green, “Building hypertext links in newspaper articles using semantic similarity”. NLDB 97, 1997, pp. 178–190.
Google Scholar
S.J. Green, “Building hypertext links by computing semantic similarity”. TKDE 1999, 11(5), pp.50–57.
Google Scholar
A. Hotho, S. Staab, and G. Stumme, “Wordnet improves text document clustering”. ACM SIGIR Workshop on Semantic Web 2003.
Google Scholar
L. Kaufman, and P.J. Rousseeuw, “Finding Groups in Data: an Introduction to Cluster Analysis”, John Wiley & Sons, 1999.
Google Scholar
WordNet project available at: http://wordnet.princeton.edu/
B. Larsen, and C. Aone, “Fast and effective text mining using lineartime document clustering”. The 5th ACM SIGKDD international conference on knowledge discovery and data mining, 1999, pp. 16–22.
Google Scholar
D.I. Moldovan, and R. Mihalcea, “Using wordnet and lexical operators to improve internet searches”. IEEE Internet Computing 2000, 4(1), pp. 34–43.
Article Google Scholar
D. Reforgiato, “A new unsupervised method for document clustering by using WordNet lexical and conceptual relations”. Journal of Information Retrieval, Vol (10), 2007, pp.563–579.
Article Google Scholar
J. Sedding, and D. Kazakov, “WordNet-based Text Document Clustering”, COLING 3rd Workshop on Robust Methods in Analysis of Natural Language Data, 2004.
Google Scholar
M. Steinbach, G. Karypis, and V. Kumar, “A Comparison of Document Clustering Techniques”, Department of Computer Science and Engineering, University of Minnesota, Technical Report #00- 034, 2000
Google Scholar
E.M. Voorhees, “Query expansion using lexical-semantic relations”. In Proceedings of ACM-SIGIR, 1994, pp. 61–69.
Google Scholar
B.B. Wang, R.I. McKay, H.A. Abbass, and M. Barlow, “Learning text classifier using the domain concept hierarchy.” In Proceedings of International Conference on Communications, Circuits and Systems, China, 2002.
Google Scholar
O. Zamir, O. Etzioni, O. Madani, and R.M. Karp, “Fast and intuitive clustering of web documents”. KDD 97, 1997, pp. 287–290.
Google Scholar
M. Friedman, A. Kandel, M. Schneider, M. Last, B. Shapka, Y. Elovici and O. Zaafrany, “A Fuzzy-Based Algorithm for Web Document Clustering”. Fuzzy Information, Processing NAFIPS '04, IEEE, 2004.
Google Scholar
M.E.S. Mendes Rodrigues, L. Sacks, “A Scalable Hierarchical Fuzzy Clustering Algorithm for Text Mining”. In Proceedings of the 5th International Conference on Recent Advances in Soft Computing, 2004.
Google Scholar
M.E.S. Mendes Rodrigues, L. Sacks, “Evaluating fuzzy clustering for relevance-based information access”. In Proceedings of the 12th IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2003
Google Scholar
EMail1200 dataset available at: http://boole.cs.iastate.edu/book/acad/bag/data/lingspam
SCOTS dataset available at: http://www.scottishcorpus.ac.uk/
Reuters dataset available at: http://www.daviddlewis.com/resources/testcollections/reuters21578/

Download references

Author information

Authors and Affiliations

Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
Tarek F. Gharib & Mostafa M. Aref
Akhbar El-Yom Academy, Cairo, Egypt
Mohammed M. Fouad

Authors

Tarek F. Gharib
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed M. Fouad
View author publications
You can also search for this author in PubMed Google Scholar
Mostafa M. Aref
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, University of Bridgeport, University Avenue 221, Bridgeport, 06604, U.S.A.
Khaled Elleithy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gharib, T.F., Fouad, M.M., Aref, M.M. (2010). Fuzzy Document Clustering Approach using WordNet Lexical Categories. In: Elleithy, K. (eds) Advanced Techniques in Computing Sciences and Software Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3660-5_31

Download citation

DOI: https://doi.org/10.1007/978-90-481-3660-5_31
Published: 15 December 2009
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-3659-9
Online ISBN: 978-90-481-3660-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics