Abstract
In recent years, kNN algorithm is paid attention by many researchers and is proved one of the best text categorization algorithms. Text categorization is according to training set which is assigned class label to decide a new document which is not assigned class label belongs to some kind of document. Until now, kNN algorithm has still some issues to need to study further. Such as: improvement of decision rule; selection of k value; selection of dimensions (i.e. feature set selection); problems of multiclass text categorization; the algorithm’s executive efficiency (time and space) etc. In this paper, we mainly focus on improvement of decision rule and dimension selection. We design an adaptive fuzzy kNN text classifier. Here the adaptive indicate the adaptive of dimension selection. The experiment results show that our algorithm is effective and feasible.
Chapter PDF
References
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transaction on Information Theory IT-13, 21–27 (1967)
Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1, 76–88 (1997)
Yang, Y., Lin, X.: A Re-examination of Text Categorization Methods. In: Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in the Information Retrieval, pp. 42–49. ACM Press, New York (1999)
Masand, B., Lino, G., Waltz, D.: Classifying news stories using memory based reasoning. In: 15th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, pp. 59–64 (1992)
Lewis, D.D.: Naïve (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Mccallum, A., Nigam, K.: A comparison of event models for naïve bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization, Madison, Wisconsin, pp. 41–48 (1998)
Lewis, D.D., Ringuette, M.: Comparison of two learning algorithms for text categorization. In: Proc. of the Third Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, pp. 81–93 (1994)
Apte, C., Damerau, F., Weiss, S.: Text mining with decision rules and decision trees. In: Proc. of the Conference on Automated Learning and Discovery, Workshop 6: Learning from Text and the Web, CMU, pp. 487–499 (1998)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Transaction on Information System 12, 252–277 (1994)
Ng, H.T., Goh, W.B., Low, K.L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: 20th Ann. Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 67–73 (1997)
Wiener, E., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proc. of the 4th Annual Symposium on Document Analysis and Information Retrieval, pp. 317–332 (1995)
Tan, S.: Neighbor-weighted K-nearest neighbor for unbalanced text corpus. Expert Systems with Application 28, 667–671 (2005)
Han, E., Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 53–66. Springer, Heidelberg (2001)
Shankar, S., Karpis, G.: A Feature Weight Adjustment Algorithm for Document Categorization. In: Proc. of the International Workshop on Multimedia Data Mining (2000)
Li, B., Lu, Q., Yu, S.: An Adaptive k-Nearest Neighbor Text Categorization Strategy. ACM Transactions on Asian Language Information Processing 3, 215–226 (2004)
Lim, H.: An Improved KNN Learning Based Korean Text Classifier with Heuristic Information. In: Proc. of the 9th International Conference on Neural Information Processing, pp. 731–735 (2002)
Dubois, D., Prade, H.: Fuzzy sets and systems (Theory and application). Academic Press, Oxford (1980)
Zhao, S.: The method of fuzzy mathematics in pattern recognition. School of the West-North Electronic Engineering Press, Xi’an (1987)
Bian, J., Zhang, X.: Pattern recognition. Tsinghua University Press, Beijing (2000)
Cardoso-Cachopo, A., Oliveira, A.L.: An empirical comparison of text categorization methods. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 183–196. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Dong, H. (2006). An Adaptive Fuzzy kNN Text Classifier. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758532_30
Download citation
DOI: https://doi.org/10.1007/11758532_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34383-7
Online ISBN: 978-3-540-34384-4
eBook Packages: Computer ScienceComputer Science (R0)