Abstract
It is a problem that established document categorization method reflects the semantic relation inaccurately at feature expression of document. For the purpose of solving this problem, we propose a genetic algorithm and C-Means clustering algorithm for choosing an appropriate set of fuzzy clustering for classification problems of documents. The aim of the proposed method is to find a minimum set of fuzzy cluster that can correctly classify all training documents. The number of fuzzy pseudo-partition and the shapes of the fuzzy membership functions that we use the classification criteria are determined by the genetic algorithms. Then, the classifier decides using fuzzy c-means clustering algorithms for documents classification. A solution obtained by the genetic algorithm is a set of fuzzy clustering, and its fitness function is determined by fuzzy membership function.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ko, S.-J.: Bayesian Automatic Document Categorization Using Apriori-Genetic Algorithm 8(3), 6 (2003)
Soon, H.K.: A Cluster Validity Index for Fuzzy Clustering. Electronics Letters 34(22) (2002)
Baeza-ates, R., Ribeiro-Neto, B.: Modern Information Retrieval, 230–255 (1998)
Klir, G.J., Yuan, B.: Fuzzy Sets and Fuzzy Logic Theory and Applications (1998)
Ko, S.-J.: Optimization of Associative Word Knowledge Base Using Apriori-genetic algorithm. KISS 28(8) (2003)
Lee, K.-M.: Classification Rule Mining from Fuzzy Data based on fuzzy decision Tree. KISS 28(1) (2003)
Hyun-Jin, K.: Clustering Korean Nouns Based On Syntactic Relation and Corpus Data. In: Proceedings of the LASTED International Conference Artificial Intelligence and Soft Computing (2003)
Gondon, M.: Probabilistic and genetic algorithms for document retrieval. Communication of the ACM 31 (2000)
Koczy, L.T.: Information retrieval by fuzzy relations and hierarchical co-occurrence (1997)
Baranyi, P., Gedeon, T.D., Koczy, L.T.: Improved fuzzy and neural network algorithms for frequency prediction in document filtering. TR 97-02 (1997)
Koczy, L.T., Gedeon, T.D., Koczy, J.A.: The construction of fuzzy relational maps in information retrieval. IETR 98-01 (1998)
Koczy, L.T., Gedeon, T.: Information retrieval by fuzzy relations and hierarchical cooccurrence. Part I. TR99-01, Dept. of Info. Eng., School of Comp. Sci. & Eng., UNSW (1999)
Han, S.-W., Eun, H.-J., Kim, Y.-S., Koczy, L.T.: A Document Classification Algorithm Using the Fuzzy set Theory and Hierarchical Structure of Document. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds.) ICCSA 2004. LNCS, vol. 3043, pp. 122–133. Springer, Heidelberg (2004)
Eun, H.-j.: An Algorithm of Documents classification and Query Extension using fuzzy function. Journal of KISS: Software and applications 28(2) (2001)
Chen, T.C.: A Fuzzy Network for the Document Clustering Based on the Measurement of Information Pattern. Neural Networks 4 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Youn, JI., Eun, HJ., Kim, YS. (2005). Fuzzy Clustering for Documents Based on Optimization of Classifier Using the Genetic Algorithm. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2005. ICCSA 2005. Lecture Notes in Computer Science, vol 3481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424826_2
Download citation
DOI: https://doi.org/10.1007/11424826_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25861-2
Online ISBN: 978-3-540-32044-9
eBook Packages: Computer ScienceComputer Science (R0)