Skip to main content

Active Learning Image Spam Hunter

  • Conference paper
Advances in Visual Computing (ISVC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5876))

Included in the following conference series:

Abstract

Image spam is annoying email users around the world. Most previous work for image spam detection focuses on supervised learning approaches. However, it is costly to get enough trustworthy labels for learning, especially for an adversarial problem where spammers constantly modify patterns to evade the classifier. To address this issue, we employ the principle of active learning where the learner guides the user to label as few images as possible while maximizing the classification accuracy. Active learning is more suited for online image spam filtering since it dramatically reduces the labeling costs with negligible overhead while maintaining high recognition performance. We present and compare two active learning algorithms, based on an SVM and a Gaussian process classifier respectively. To the best of our knowledge, we are the first to apply active learning for the task of spam image filtering. Experimental results demonstrate that our active learning based approaches quickly achieve > 99% high detection rate and < 0.5% low false positive rate with small number of images being labeled.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sophos Plc: http://www.sophos.com/pressoffice/news/articles/2008/07/dirtydozj-ul08.html

  2. McAfee: http://www.avertlabs.com/research/blog/?p=170

  3. Gao, Y., Yang, M., Zhao, X., Pardo, B., Wu, Y., Pappas, T., Choudhary, A.: Image spam hunter. In: Proc. of the 33th IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, NV, USA (2008)

    Google Scholar 

  4. Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. In: Proc. the 4th Conference on Email and Anti-Spam (CEAS), California, USA (2007)

    Google Scholar 

  5. Wang, Z., Josephson, W., Lv, Q., Charikar, M., Li, K.: Filtering image spam with near-duplicate detection. In: Proc. the 4th Conference on Email and Anti-Spam (CEAS), California, USA (2007)

    Google Scholar 

  6. Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28, 133–168 (1997)

    Article  MATH  Google Scholar 

  7. Tong, S., Koller, D., Kaelbling, P.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 999–1006 (2001)

    Google Scholar 

  8. Goh, K.S., Chang, E.Y., Lai, W.C.: Multimodal concept-dependent active learning for image retrieval. In: Proceedings of the 12th annual ACM international conference on Multimedia. ACM, New York (2004)

    Google Scholar 

  9. Lawrence, N.D., Seeger, M., Herbrich, R.: Fast sparse gaussian process methods: The informative vector machine. In: Advances in Neural Information Processing Systems, vol. 15, pp. 609–616. MIT Press, Cambridge (2003)

    Google Scholar 

  10. MacKay, D.J.C.: Information-based objective functions for active data selection. Neural Computation 4, 590–604 (1992)

    Article  Google Scholar 

  11. Madevska-Bogdanovaa, A., Nikolikb, D., Curfsc, L.: Probabilistic svm outputs for pattern recognition using analytical geometry. Neurocomputing 62, 293–303 (2004)

    Article  Google Scholar 

  12. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20, 273–297 (1995)

    MATH  Google Scholar 

  13. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  14. Kapoor, A., Grauman, K., Urtasun, R., Darrell, T.: Active learning with gaussian processes for object categorization. In: Eleventh IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil (2007)

    Google Scholar 

  15. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)

    Article  Google Scholar 

  16. Aizerman, A., Braverman, E.M., Rozoner, L.I.: Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25, 821–837 (1964)

    Google Scholar 

  17. Ng, T.T., Chang, S.F., Tsui, M.P.: Lessons learned from online classification of photo-realistic computer graphics and photographs. In: IEEE Workshop on Signal Processing Applications for Public Security and Forensics, SAFE (2007)

    Google Scholar 

  18. Mäenpä, T.: The local binary pattern approach to texture analysis extensions and applications. PhD thesis, Infotech Oulu, University of Oulu, Finland (2003)

    Google Scholar 

  19. Canny, J.: A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 8, 679–698 (1986)

    Article  Google Scholar 

  20. Huang, J., Kumar, S.R., Mitra, M., Zhu, W.J., Zabih, R.: Image indexing using color correlograms. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, USA (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gao, Y., Choudhary, A. (2009). Active Learning Image Spam Hunter. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2009. Lecture Notes in Computer Science, vol 5876. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10520-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10520-3_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10519-7

  • Online ISBN: 978-3-642-10520-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics