Abstract
Text categorization has become one of the key techniques for handling and organizing web data. Though the native features of SVM (Support Vector Machines) are better than Naive Bayes’ for text categorization in theory, the classification precision of SVM is lower than Bayesian method in real world. This paper tries to find out the mysteries by analyzing the shortages of SVM, and presents an anti-noise SVM method. The improved method has two characteristics: 1) It chooses the optimal n-dimension classifying hyperspace. 2) It separates noise samples by preprocessing, and trains the classifier using noise free samples. Compared with naive Bayes method, the classification precision of anti-noise SVM is increased about 3 to 9 percent.
This work is supported by the National Grand Fundamental Research 973 Program of China under Grant No. 2003CB314802.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Androutsopoulos, I., Koutsias, J., Konstantinos, V.: Chandrinos, George Paliouras and Constantine D. Spyropoulos. An Evaluation of Naive Bayesian Anti-Spam Filtering (2000)
Cross Validation for the naive Bayes Classifier of SPAM (2004), http://stat-www.berkeley.edu/users/nolan/stat133/Spr04/Projects/SpamPart2.pdf
Lewim, D., Ringuuette, M.: A comparison of two learning algorithms for text categorization. In: Thirds Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)
Weiss, S.M., et al.: Maximizing Text-Mining Performance. IEEE Intelligent Systems, 2–8 (1999)
Zhou, Z., Chen, S., Chen, Z.: FANNC: A fast adaptive neural network classifier. International Journal of Knowledge and Information Systems 2, 115–129 (2000)
Kiven, J., Warmuth, M., Auer, P.: The perception algorithm vs. window: Linear vs. logarithmic mistake bounds when few input variables are relevant. In: Conference on Computational Learning Theory (1995)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Basu, A., Watters, C., Shepherd, M.: Support Vector Machines for Text Categorization. In: Proceedings of the 36th Hawaii International Conference on System Sciences (2003)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. ISBD 0-934613-73-7 (1988)
Xu, L., Xiong, J., et al.: Study on Algorithm for Rough Set based Outlier Detection in high Dimension Space. Computer Science 30 (2003) (in Chinese)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, L., Huang, J., Gong, ZH. (2005). An Anti-noise Text Categorization Method Based on Support Vector Machines. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds) Advances in Web Intelligence. AWIC 2005. Lecture Notes in Computer Science(), vol 3528. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11495772_43
Download citation
DOI: https://doi.org/10.1007/11495772_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26219-0
Online ISBN: 978-3-540-31900-9
eBook Packages: Computer ScienceComputer Science (R0)