Abstract
Image spam is annoying email users around the world. Most previous work for image spam detection focuses on supervised learning approaches. However, it is costly to get enough trustworthy labels for learning, especially for an adversarial problem where spammers constantly modify patterns to evade the classifier. To address this issue, we employ the principle of active learning where the learner guides the user to label as few images as possible while maximizing the classification accuracy. Active learning is more suited for online image spam filtering since it dramatically reduces the labeling costs with negligible overhead while maintaining high recognition performance. We present and compare two active learning algorithms, based on an SVM and a Gaussian process classifier respectively. To the best of our knowledge, we are the first to apply active learning for the task of spam image filtering. Experimental results demonstrate that our active learning based approaches quickly achieve > 99% high detection rate and < 0.5% low false positive rate with small number of images being labeled.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sophos Plc: http://www.sophos.com/pressoffice/news/articles/2008/07/dirtydozj-ul08.html
Gao, Y., Yang, M., Zhao, X., Pardo, B., Wu, Y., Pappas, T., Choudhary, A.: Image spam hunter. In: Proc. of the 33th IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, NV, USA (2008)
Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. In: Proc. the 4th Conference on Email and Anti-Spam (CEAS), California, USA (2007)
Wang, Z., Josephson, W., Lv, Q., Charikar, M., Li, K.: Filtering image spam with near-duplicate detection. In: Proc. the 4th Conference on Email and Anti-Spam (CEAS), California, USA (2007)
Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28, 133–168 (1997)
Tong, S., Koller, D., Kaelbling, P.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 999–1006 (2001)
Goh, K.S., Chang, E.Y., Lai, W.C.: Multimodal concept-dependent active learning for image retrieval. In: Proceedings of the 12th annual ACM international conference on Multimedia. ACM, New York (2004)
Lawrence, N.D., Seeger, M., Herbrich, R.: Fast sparse gaussian process methods: The informative vector machine. In: Advances in Neural Information Processing Systems, vol. 15, pp. 609–616. MIT Press, Cambridge (2003)
MacKay, D.J.C.: Information-based objective functions for active data selection. Neural Computation 4, 590–604 (1992)
Madevska-Bogdanovaa, A., Nikolikb, D., Curfsc, L.: Probabilistic svm outputs for pattern recognition using analytical geometry. Neurocomputing 62, 293–303 (2004)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20, 273–297 (1995)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Kapoor, A., Grauman, K., Urtasun, R., Darrell, T.: Active learning with gaussian processes for object categorization. In: Eleventh IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil (2007)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)
Aizerman, A., Braverman, E.M., Rozoner, L.I.: Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25, 821–837 (1964)
Ng, T.T., Chang, S.F., Tsui, M.P.: Lessons learned from online classification of photo-realistic computer graphics and photographs. In: IEEE Workshop on Signal Processing Applications for Public Security and Forensics, SAFE (2007)
Mäenpä, T.: The local binary pattern approach to texture analysis extensions and applications. PhD thesis, Infotech Oulu, University of Oulu, Finland (2003)
Canny, J.: A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 8, 679–698 (1986)
Huang, J., Kumar, S.R., Mitra, M., Zhu, W.J., Zabih, R.: Image indexing using color correlograms. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, USA (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, Y., Choudhary, A. (2009). Active Learning Image Spam Hunter. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2009. Lecture Notes in Computer Science, vol 5876. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10520-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-10520-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10519-7
Online ISBN: 978-3-642-10520-3
eBook Packages: Computer ScienceComputer Science (R0)