Abstract
A major bottleneck in electronic communications is the enormous dissemination of spam emails. Developing of suitable filters that can adequately capture those emails and achieve high performance rate become a main concern. Support vector machines (SVMs) have made a large contribution to the development of spam email filtering. Based on SVMs, the crucial problems in email classification are feature mapping of input emails and the choice of the kernels. In this paper, we present thorough investigation of several distance-based kernels and propose the use of string kernels and prove its efficiency in blocking spam emails. We detail a feature mapping variants in text classification (TC) that yield improved performance for the standard SVMs in filtering task. Furthermore, to cope for realtime scenarios we propose an online active framework for spam filtering.
Chapter PDF
Similar content being viewed by others
References
Brinker, K.: Incorporating diversity in active learning with support vector machines. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 59–66 (2003)
Cormack, G.V., Bratko, A.: Batch and on-line spam filter comparison. In: Proceedings of the Third Conference on Email and Anti-Spam, California, USA (2006)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(1), 229–273 (1995)
Drucker, H., Vapnik, V., Wu, D.: Support vector machines for spam categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the sixteenth International Conference on Machine Learning (ICML 1999), San Francisco, US, pp. 200–209 (1999)
Kolcz, A., Alspector, J.: Svm-based filtering of e-mail spam with content-specific misclassification costs. In: Proceedings of the Workshop on Text Mining, California, USA, pp. 123–130 (2001)
Lau, K.W., Wu, Q.H.: Online training of support vector machine. Pattern Recognition 36(8), 1913–1920 (2003)
Leopold, E., Kindermann, J.: Text categorization with support vector machines. how to represent texts in input space? Machine Learning 46(13), 423–444 (2002)
Leslie, C., Kuang, R.: Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research 5, 1435–1455 (2004)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. The Journal of Machine Learning Research 2(1), 419–444 (2002)
Rtsch, G., Sonnenburg, S., Schlkopf, B.: Rase: Recognition of alternatively spliced exons in c. elegans. Bioinformatics 21(1), i369–i377 (2005)
Scholkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Sculley, D., Wachman, G.: Relaxed online svms for spam filtering. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, Netherlands, pp. 415–422 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amayri, O., Bouguila, N. (2009). Improved Online Support Vector Machines Spam Filtering Using String Kernels. In: Bayro-Corrochano, E., Eklundh, JO. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2009. Lecture Notes in Computer Science, vol 5856. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10268-4_73
Download citation
DOI: https://doi.org/10.1007/978-3-642-10268-4_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10267-7
Online ISBN: 978-3-642-10268-4
eBook Packages: Computer ScienceComputer Science (R0)