Anti-spam Filters Based on Support Vector Machines

Xie, Chengwang; Ding, Lixin; Du, Xin

doi:10.1007/978-3-642-04843-2_37

Anti-spam Filters Based on Support Vector Machines

Chengwang Xie²⁰,
Lixin Ding²⁰ &
Xin Du^20,21

Conference paper

1340 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5821))

Abstract

Recently, spam has become an increasingly important problem. In this paper, a support vector machine (SVM) is used as the spam filter. Then a study is made of the effect of classification error rate when different subsets of corpora are used, and of the filter accuracy when SVM’s with linear, polynomial, or RBF kernels is used. Also an investigation is made of the effect of the size of attribute sets. Based on the experimental results and analysis, it is concluded that SVM will be a very good alternative for building anti-spam classifiers, with consideration of a good combination of accuracy, consistency, and speed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Faith Cranor, L., LaMacchia, B.H.: Spam. Commun. ACM 41(8), 74–83 (1998)
Article Google Scholar
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of 7th International Conference on Information and Knowledge Management, pp. 229–237 (1998)
Google Scholar
Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE transactions on neural networks 10(5) (September 1999)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study of feature selection in text categorization. In: Proc. 14th Int. Conf. Machine Learing (1997)
Google Scholar
Schneider, K.–M.: A comparison of event models for Naive Bayes anti-spam e-mail filtering. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, pp. 307–314 (2003)
Google Scholar
Manning, C., Schutze, H.: Foundations of statitical natural language processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to Filter Unsolicited Commercial E-Mail. NCSR “Demokritos” Technical Report, No. 2004/2 (March 2004)
Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Google Scholar
Vapnik, V.: Estimation of Dependencies Based on Empirical Data. Springer, New York (1992)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theroy. Springer, New York (1995)
Book MATH Google Scholar
Drucker, H., Burges, C.J.C., Kauffman, L., Smola, A., Vapnik, V.: Support vector regression machines. In: Mczer, M.C., Joradn, J.I., petsche, T. (eds.) Neural Inform. Processing Syst., vol. 9, pp. 155–161. MIT Press, Cambridge (1997)
Google Scholar
http://www-ai.informatik.uni-dortmund.de/thorsten/svm_light.html
Gomez Hidalgo, J.M., Ma na Lopez, M., Puertas Sanz, E.: Combining text and heuristics for cost-sensitive spam filtering. In: Proceedings of the Fourth Computational Natural Language Learning Workshop, CoNLL 2000. Association for Computational Linguistics (2000)
Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filter junk e-mail. In: Learning for Text Categorization: Papers from the, Workshop, Madison, Wisconsin, AAAI Technical Report WS-98-05 (1998)
Google Scholar
Maria, J., Hidalgo, G.: Evaluating Cost-Sensitive Unsolicited Bulk Email Categorization. In: International Conference Textual Data Statistical Analysis, Saint Malo, France, no. 6, pp. 323–334 (2002)
Google Scholar
Sakkis, G., Androutsopoulos, I.: Stacking classifiers for anti-spam filtering of e-mail. In: Proceedings of the 6th conference on Empirical Methods in Natural Language Processing, pp. 44–50. Carnegie Mellon, Pittsburgh (2001a)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Lab of Software Engineering, Wuhan University, Wuhan, 430072, China
Chengwang Xie, Lixin Ding & Xin Du
Department of Information and Engineering, Shijiazhuang University of Economics, Shijiazhuang, 050031, China
Xin Du

Authors

Chengwang Xie
View author publications
You can also search for this author in PubMed Google Scholar
Lixin Ding
View author publications
You can also search for this author in PubMed Google Scholar
Xin Du
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computer Science, China University of Geosciences, 430074, Wuhan, Hubei, P.R. China
Zhihua Cai
School of Computer Science, China University of Geosciences, 430074, Wuhan, Hubei, China
Zhenhua Li
Computation Center, Wuhan University, 430072, Wuhan, Hubei, China
Zhuo Kang
School of Computer Science and Engineering, The University of Aizu, 965-8580, i, Aizu-Wakamatsu City, Fukushima, Japan
Yong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, C., Ding, L., Du, X. (2009). Anti-spam Filters Based on Support Vector Machines. In: Cai, Z., Li, Z., Kang, Z., Liu, Y. (eds) Advances in Computation and Intelligence. ISICA 2009. Lecture Notes in Computer Science, vol 5821. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04843-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-04843-2_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04842-5
Online ISBN: 978-3-642-04843-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics