Advertisement

Misleading Learners: Co-opting Your Spam Filter

  • Blaine Nelson
  • Marco Barreno
  • Fuching Jack Chi
  • Anthony D. Joseph
  • Benjamin I. P. Rubinstein
  • Udam Saini
  • Charles Sutton
  • J. D. Tygar
  • Kai Xia
Chapter

Using statistical machine learning for making security decisions introduces new vulnerabilities in large scale systems. We show how an adversary can exploit statistical machine learning, as used in the SpamBayes spam filter, to render it useless—even if the adversary’s access is limited to only 1% of the spam training messages. We demonstrate three new attacks that successfully make the filter unusable, prevent victims from receiving specific email messages, and cause spam emails to arrive in the victim’s inbox.

Keywords

Intrusion Detection Dictionary Attack Attack Message Statistical Machine Learning Spam Message 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Barreno M, Nelson B, Sears R, Joseph AD, Tygar JD (2006) Can machine learning be secure? In: Proceedings of the ACM Symposium on InformAtion, Computer, and Communications Security (ASIACCS), pp 16-25Google Scholar
  2. [2]
    Barreno M, Nelson, Joseph AD, Tygar JD (2008) The security of machine learning. Tech. Rep. UCB/EECS-2008-43, EECS Department, University of California, Berkeley, URL http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-43.html
  3. [3]
    Chung SP, Mok AK (2006) Allergy attack against automatic signature generation. In: Proceedings of the International Symposium on Recent Advances in Intrusion Detection (RAID), pp 61-80Google Scholar
  4. [4]
    Chung SP, Mok AK (2007) Advanced allergy attacks: Does a corpus really help? In: Proceedings of the International Symposium on Recent Advances in Intrusion Detection (RAID), pp 236-255Google Scholar
  5. [5]
    Cormack G, Lynam T (2005) Spam corpus creation for TREC. In: Proceedings of the Conference on Email and Anti-Spam (CEAS)Google Scholar
  6. [6]
    Dalvi N, Domingos P, Mausam, Sanghai S, Verma D (2004) Adversarial classification. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 99-108Google Scholar
  7. [7]
    Fisher RA (1948) Question 14: Combining independent tests of significance. American Statistician 2(5):30-30JCrossRefGoogle Scholar
  8. [8]
    Graham P (2002) A plan for spam. http://www.paulgraham.com/spam.html
  9. [9]
    Karlberger C, Bayler G, Kruegel C, Kirda E (2007) Exploiting redundancy in natural language to penetrate Bayesian spam filters. In: Proceedings of the USENIX Workshop on Offensive Technologies (WOOT), pp 1-7Google Scholar
  10. [10]
    Kearns M, Li M (1993) Learning in the presence of malicious errors. SIAM Journal on Computing 22(4):807-837MATHCrossRefMathSciNetGoogle Scholar
  11. [11]
    Kim HA, Karp B (2004) Autograph: Toward automated, distributed worm signature detection. In: Proceedings of the USENIX Security Symposium, pp 271-286Google Scholar
  12. [12]
    Klimt B, Yang Y (2004) Introducing the Enron corpus. In: Proceedings of the Conference on Email and Anti-Spam (CEAS)Google Scholar
  13. [13]
    Lazarevic A, Ertöz L, Kumar V, Ozgur A, Srivastava J (2003) A comparative study of anomaly detection schemes in network intrusion detection. In: Barbará D, Kamath C (eds) Proceedings of the SIAM International Conference on Data Mining, pp 25-36Google Scholar
  14. [14]
    Liao Y, Vemuri VR (2002) Using text categorization techniques for intrusion detection. In: Proceedings of the USENIX Security Symposium, pp 51-59Google Scholar
  15. [15]
    Lowd D, Meek C (2005) Adversarial learning. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 641-647Google Scholar
  16. [16]
    Lowd D, Meek C (2005) Good word attacks on statistical spam filters. In: Proceedings of the Conference on Email and Anti-Spam (CEAS)Google Scholar
  17. [17]
    Meyer T, Whateley B (2004) SpamBayes: Effective open-source, Bayesian based, email classification system. In: Proceedings of the Conference on Email and Anti-Spam (CEAS)Google Scholar
  18. [18]
    Mukkamala S, Janoski G, Sung A (2002) Intrusion detection using neural networks and support vector machines. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp 1702-1707Google Scholar
  19. [19]
    Nelson B, Barreno M, Chi FJ, Joseph AD, Rubinstein BIP, Saini U, Sutton C, Tygar JD, Xia K (2008) Exploiting machine learning to subvert your spam filter. In: Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET)Google Scholar
  20. [20]
    Newsome J, Karp B, Song D (2005) Polygraph: Automatically generating signatures for polymorphic worms. In: Proceedings of the IEEE Symposium on Security and Privacy, pp 226-241Google Scholar
  21. [21]
    Newsome J, Karp B, Song D (2006) Paragraph: Thwarting signature learning by training maliciously. In: Proceedings of the International Symposium on Recent Advances in Intrusion Detection (RAID 2006), pp 81-105Google Scholar
  22. [22]
    Robinson G (2003) A statistical approach to the spam problem. Linux JournalGoogle Scholar
  23. [23]
    Shaoul C, Westbury C (2007) A USENET corpus (2005-2007)Google Scholar
  24. [24]
    Stolfo SJ, Li WJ, Hershkop S, Wang K, Hu CW, Nimeskern O (2004) Detecting viral propagations using email behavior profiles. ACM Transactions on Internet Technology (TOIT) pp 187-221Google Scholar
  25. [25]
    Wittel GL, Wu SF (2004) On attacking statistical spam filters. In: Proceedings of the Conference on Email and Anti-Spam (CEAS)Google Scholar

Copyright information

© Springer-Verlag US 2009

Authors and Affiliations

  • Blaine Nelson
    • 1
  • Marco Barreno
    • 1
  • Fuching Jack Chi
    • 1
  • Anthony D. Joseph
    • 1
  • Benjamin I. P. Rubinstein
    • 1
  • Udam Saini
    • 1
  • Charles Sutton
    • 1
  • J. D. Tygar
    • 1
  • Kai Xia
    • 1
  1. 1.Comp. Sci. Div.University of CaliforniaBerkeleyUSA

Personalised recommendations