An Integration of Fuzzy Association Rules and WordNet for Document Clustering

Chen, Chun-Ling; Tseng, Frank S. C.; Liang, Tyne

doi:10.1007/978-3-642-01307-2_16

Chun-Ling Chen²³,
Frank S. C. Tseng²⁴ &
Tyne Liang²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3158 Accesses
10 Citations

Abstract

With the rapid growth of text documents, document clustering has become one of the main techniques for organizing large amount of documents into a small number of meaningful clusters. However, there still exist several challenges for document clustering, such as high dimensionality, scalability, accuracy, meaningful cluster labels, and extracting semantics from texts. In order to improve the quality of document clustering results, we propose an effective Fuzzy Frequent Itemset-based Document Clustering (F²IDC) approach that combines fuzzy association rule mining with the background knowledge embedded in WordNet. A term hierarchy generated from WordNet is applied to discovery fuzzy frequent itemsets as candidate cluster labels for grouping documents. We have conducted experiments to evaluate our approach on Reuters-21578 dataset. The experimental result shows that our proposed method outperforms the accuracy quality of FIHC, HFTC, and UPGMA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beil, F., Ester, M., Xu, X.: Frequent Term-based Text Clustering. In: Int’l. Conf. on knowledge Discovery and Data Mining (KDD 2002), pp. 436–442 (2002)
Google Scholar
Cutting, D.R., Karger, D.R., Pederson, J.O., Tukey, J.W.: Scatter/gather: a Cluster-based approach to Browsing Large Document Collections. In: 15^th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 318–329 (1992)
Google Scholar
de Campos, L.M., Moral, S.: Learning Rules for a Fuzzy Inference Model. J. Fuzzy Sets and Systems. 59, 247–257 (1993)
Article MathSciNet MATH Google Scholar
Fung, B., Wang, K., Ester, M.: Hierarchical Document Clustering Using Frequent Itemsets. In: SIAM Int’l Conf. on Data Mining (SDM 2003), pp. 59–70 (2003)
Google Scholar
Hong, T.P., Lin, K.Y., Wang, S.L.: Fuzzy Data Mining for Interesting Generalized Association Rules. J. Fuzzy Sets and Systems 138(2), 255–269 (2003)
Article MathSciNet Google Scholar
Hotho, A., Staab, S., Stumme, G.: Wordnet Improves Text Document Clustering. In: SIGIR Int’l Conf. on Semantic Web Workshop (2003)
Google Scholar
Kaya, M., Alhajj, R.: Utilizing genetic algorithms to optimize membership functions for fuzzy weighted association rule mining. Applied Intelligence 24(1), 7–15 (2006)
Article Google Scholar
Kushal Dave, D.M.P., Lawrence, S.: Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In: 12^thInt’l Conf. on World Wide Web (WWW 2003), pp. 519–528 (2003)
Google Scholar
Martín-Bautista, M.J., Sánchez, D., Chamorro-Martínez, J., Serrano, J.M., Vila, M.A.: Mining web documents to find additional query terms using fuzzy association rules. Fuzzy Sets and Systems 148(1), 85–104 (2004)
Article MathSciNet MATH Google Scholar
Miller, G.A.: WordNet: a Lexical Database for English. J. Communications of the ACM 38(11), 39–41 (1995)
Article Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Sedding, J., Kazakov, D.: WordNet-based Text Document Clustering. In: COLING 2004 Workshop on Robust Methods in Analysis of Natural Language Data (2004)
Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: KDD 2000 Workshop on Text Mining, ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD) (2000)
Google Scholar
Yu, H., Searsmith, D., Li, X., Han, J.: Scalable Construction of Topic Directory with Nonparametric Closed Termset Mining. In: ICDM 2004, pp. 563–566 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, National Chiao Tung University, Taiwan, ROC
Chun-Ling Chen & Tyne Liang
Dept. of Information Management National Kaohsiung 1st Univ. of Sci. & Tech., Taiwan, ROC
Frank S. C. Tseng

Authors

Chun-Ling Chen
View author publications
You can also search for this author in PubMed Google Scholar
Frank S. C. Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Tyne Liang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sirindhorn International Institute of Technology, Thammasat University, 131 Moo 5 Tiwanont Road, 12000, Bangkadi, Muang, Pathumthani, Thailand
Thanaruk Theeramunkong
Dept. of Computer Engineering, Faculty of Engineering, Chulalongkorn University, 10330, Bangkok, Thailand
Boonserm Kijsirikul
Faculty of Science & Engineering, York University, 355 Lumbers Building, 4700 Keele Street, M3J 1P3, Toronto, Ontario, Canada
Nick Cercone
School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, 923-1292, Ishikawa, Japan
Tu-Bao Ho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, CL., Tseng, F.S.C., Liang, T. (2009). An Integration of Fuzzy Association Rules and WordNet for Document Clustering. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-01307-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics