Improved Online Support Vector Machines Spam Filtering Using String Kernels

  • Ola Amayri
  • Nizar Bouguila
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5856)

Abstract

A major bottleneck in electronic communications is the enormous dissemination of spam emails. Developing of suitable filters that can adequately capture those emails and achieve high performance rate become a main concern. Support vector machines (SVMs) have made a large contribution to the development of spam email filtering. Based on SVMs, the crucial problems in email classification are feature mapping of input emails and the choice of the kernels. In this paper, we present thorough investigation of several distance-based kernels and propose the use of string kernels and prove its efficiency in blocking spam emails. We detail a feature mapping variants in text classification (TC) that yield improved performance for the standard SVMs in filtering task. Furthermore, to cope for realtime scenarios we propose an online active framework for spam filtering.

Keywords

Support Vector Machines Feature Mapping Spam Online Active String Kernels 

References

  1. 1.
    Brinker, K.: Incorporating diversity in active learning with support vector machines. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 59–66 (2003)Google Scholar
  2. 2.
    Cormack, G.V., Bratko, A.: Batch and on-line spam filter comparison. In: Proceedings of the Third Conference on Email and Anti-Spam, California, USA (2006)Google Scholar
  3. 3.
    Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(1), 229–273 (1995)Google Scholar
  4. 4.
    Drucker, H., Vapnik, V., Wu, D.: Support vector machines for spam categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)CrossRefGoogle Scholar
  5. 5.
    Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the sixteenth International Conference on Machine Learning (ICML 1999), San Francisco, US, pp. 200–209 (1999)Google Scholar
  6. 6.
    Kolcz, A., Alspector, J.: Svm-based filtering of e-mail spam with content-specific misclassification costs. In: Proceedings of the Workshop on Text Mining, California, USA, pp. 123–130 (2001)Google Scholar
  7. 7.
    Lau, K.W., Wu, Q.H.: Online training of support vector machine. Pattern Recognition 36(8), 1913–1920 (2003)MATHCrossRefGoogle Scholar
  8. 8.
    Leopold, E., Kindermann, J.: Text categorization with support vector machines. how to represent texts in input space? Machine Learning 46(13), 423–444 (2002)MATHCrossRefGoogle Scholar
  9. 9.
    Leslie, C., Kuang, R.: Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research 5, 1435–1455 (2004)MathSciNetGoogle Scholar
  10. 10.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. The Journal of Machine Learning Research 2(1), 419–444 (2002)MATHCrossRefGoogle Scholar
  11. 11.
    Rtsch, G., Sonnenburg, S., Schlkopf, B.: Rase: Recognition of alternatively spliced exons in c. elegans. Bioinformatics 21(1), i369–i377 (2005)CrossRefGoogle Scholar
  12. 12.
    Scholkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  13. 13.
    Sculley, D., Wachman, G.: Relaxed online svms for spam filtering. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, Netherlands, pp. 415–422 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ola Amayri
    • 1
  • Nizar Bouguila
    • 1
  1. 1.Concordia UniversityMontrealCanada

Personalised recommendations