Skip to main content

Fuzzy Document Clustering Approach using WordNet Lexical Categories

  • Conference paper
  • First Online:

Abstract

Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. This area is growing rapidly mainly because of the strong need for analysing the huge and large amount of textual data that reside on internal file systems and the Web. Text document clustering provides an effective navigation mechanism to organize this large amount of data by grouping their documents into a small number of meaningful classes. In this paper we proposed a fuzzy text document clustering approach using WordNet lexical categories and Fuzzy c-Means algorithm. Some experiments are performed to compare efficiency of the proposed approach with the recently reported approaches. Experimental results show that Fuzzy clustering leads to great performance results. Fuzzy c-means algorithm overcomes other classical clustering algorithms like k-means and bisecting k-means in both clustering quality and running time efficiency.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. F. Beil, M. Ester, and X. Xu, “Frequent term-based text clustering.” KDD 02, 2002, pp. 436–442.

    Google Scholar 

  2. J.C. Bezdec, “Pattern Recognition with Fuzzy Objective Function Algorithms”, Plenum Press, New York, 1981.

    Google Scholar 

  3. M. Lan, C.L. Tan, H.B. Low, and S.Y. Sung, “A Comprehensive Comparative Study on Term Weighting Schemes”, 14th International World Wide Web (WWW2005) Conference, Japan 2005

    Google Scholar 

  4. C. Borgelt, and A. Nurnberger, “Fast Fuzzy Clustering of Web Page Collections”. PKDD Workshop on Statistical Approaches for Web Mining, 2004.

    Google Scholar 

  5. S. Chakrabarti, “Mining the Web: Discovering Knowledge from Hypertext Data”. Morgan Kaufmann Publishers, 2002.

    Google Scholar 

  6. S. Chua, and N. Kulathuramaiyer, “Semantic feature selection using wordnet”. In Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, 2004, pp. 166–172.

    Google Scholar 

  7. D.M.P.K. Dave, & S. Lawrence, “Mining the peanut gallery: Opinion extraction and semantic classification of product reviews” WWW 03 ACM, 2003, pp. 519–528.

    Google Scholar 

  8. S.J. Green, “Building hypertext links in newspaper articles using semantic similarity”. NLDB 97, 1997, pp. 178–190.

    Google Scholar 

  9. S.J. Green, “Building hypertext links by computing semantic similarity”. TKDE 1999, 11(5), pp.50–57.

    Google Scholar 

  10. A. Hotho, S. Staab, and G. Stumme, “Wordnet improves text document clustering”. ACM SIGIR Workshop on Semantic Web 2003.

    Google Scholar 

  11. L. Kaufman, and P.J. Rousseeuw, “Finding Groups in Data: an Introduction to Cluster Analysis”, John Wiley & Sons, 1999.

    Google Scholar 

  12. WordNet project available at: http://wordnet.princeton.edu/

  13. B. Larsen, and C. Aone, “Fast and effective text mining using lineartime document clustering”. The 5th ACM SIGKDD international conference on knowledge discovery and data mining, 1999, pp. 16–22.

    Google Scholar 

  14. D.I. Moldovan, and R. Mihalcea, “Using wordnet and lexical operators to improve internet searches”. IEEE Internet Computing 2000, 4(1), pp. 34–43.

    Article  Google Scholar 

  15. D. Reforgiato, “A new unsupervised method for document clustering by using WordNet lexical and conceptual relations”. Journal of Information Retrieval, Vol (10), 2007, pp.563–579.

    Article  Google Scholar 

  16. J. Sedding, and D. Kazakov, “WordNet-based Text Document Clustering”, COLING 3rd Workshop on Robust Methods in Analysis of Natural Language Data, 2004.

    Google Scholar 

  17. M. Steinbach, G. Karypis, and V. Kumar, “A Comparison of Document Clustering Techniques”, Department of Computer Science and Engineering, University of Minnesota, Technical Report #00- 034, 2000

    Google Scholar 

  18. E.M. Voorhees, “Query expansion using lexical-semantic relations”. In Proceedings of ACM-SIGIR, 1994, pp. 61–69.

    Google Scholar 

  19. B.B. Wang, R.I. McKay, H.A. Abbass, and M. Barlow, “Learning text classifier using the domain concept hierarchy.” In Proceedings of International Conference on Communications, Circuits and Systems, China, 2002.

    Google Scholar 

  20. O. Zamir, O. Etzioni, O. Madani, and R.M. Karp, “Fast and intuitive clustering of web documents”. KDD 97, 1997, pp. 287–290.

    Google Scholar 

  21. M. Friedman, A. Kandel, M. Schneider, M. Last, B. Shapka, Y. Elovici and O. Zaafrany, “A Fuzzy-Based Algorithm for Web Document Clustering”. Fuzzy Information, Processing NAFIPS '04, IEEE, 2004.

    Google Scholar 

  22. M.E.S. Mendes Rodrigues, L. Sacks, “A Scalable Hierarchical Fuzzy Clustering Algorithm for Text Mining”. In Proceedings of the 5th International Conference on Recent Advances in Soft Computing, 2004.

    Google Scholar 

  23. M.E.S. Mendes Rodrigues, L. Sacks, “Evaluating fuzzy clustering for relevance-based information access”. In Proceedings of the 12th IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2003

    Google Scholar 

  24. EMail1200 dataset available at: http://boole.cs.iastate.edu/book/acad/bag/data/lingspam

  25. SCOTS dataset available at: http://www.scottishcorpus.ac.uk/

  26. Reuters dataset available at: http://www.daviddlewis.com/resources/testcollections/reuters21578/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this paper

Cite this paper

Gharib, T.F., Fouad, M.M., Aref, M.M. (2010). Fuzzy Document Clustering Approach using WordNet Lexical Categories. In: Elleithy, K. (eds) Advanced Techniques in Computing Sciences and Software Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3660-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-3660-5_31

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-3659-9

  • Online ISBN: 978-90-481-3660-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics