Spam Email Filtering Using Network-Level Properties

Cortez, Paulo; Correia, André; Sousa, Pedro; Rocha, Miguel; Rio, Miguel

doi:10.1007/978-3-642-14400-4_37

Paulo Cortez²⁰,
André Correia²⁰,
Pedro Sousa²²,
Miguel Rocha²² &
…
Miguel Rio²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6171))

Included in the following conference series:

Industrial Conference on Data Mining

2475 Accesses
5 Citations

Abstract

Spam is serious problem that affects email users (e.g. phishing attacks, viruses and time spent reading unwanted messages). We propose a novel spam email filtering approach based on network-level attributes (e.g. the IP sender geographic coordinates) that are more persistent in time when compared to message content. This approach was tested using two classifiers, Naive Bayes (NB) and Support Vector Machines (SVM), and compared against bag-of-words models and eight blacklists. Several experiments were held with recent collected legitimate (ham) and non legitimate (spam) messages, in order to simulate distinct user profiles from two countries (USA and Portugal). Overall, the network-level based SVM model achieved the best discriminatory performance. Moreover, preliminary results suggests that such method is more robust to phishing attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beverly, R., Sollins, K.: Exploiting transport-level characteristics of spam. In: 5th Conference on Email and Anti-Spam, CEAS (2008)
Google Scholar
Bilisoly, R.: Practical text mining with Perl. Wiley Publishing, Chichester (2008)
Book MATH Google Scholar
Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review 29(1), 63–92 (2008)
Article Google Scholar
Cherkassy, V., Ma, Y.: Practical Selection of SVM Parameters and Noise Estimation for SVM Regression. Neural Networks 17(1), 113–126 (2004)
Article Google Scholar
Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47(4), 547–553 (2009)
Article Google Scholar
Cortez, P., Lopes, C., Sousa, P., Rocha, M., Rio, M.: Symbiotic Data Mining for Personalized Spam Filtering. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI 2009), pp. 149–156. IEEE, Los Alamitos (2009)
Google Scholar
Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Transactions on Neural networks 10(5), 1048–1054 (1999)
Article Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874 (2006)
Article Google Scholar
Flexer, A.: Statistical Evaluation of Neural Networks Experiments: Minimum Requirements and Current Practice. In: Proceedings of the 13th European Meeting on Cybernetics and Systems Research, Vienna, Austria, vol. 2, pp. 1005–1008 (1996)
Google Scholar
Leiba, B., Ossher, J., Rajan, V.T., Segal, R., Wegman, M.: SMTP path analysis. In: Proceedings of the Second Conference on E-mail and Anti-Spam, CEAS (2005)
Google Scholar
Lin, H.T., Lin, C.J., Weng, R.C.: A note on Platts probabilistic outputs for support vector machines. Machine Learning 68(3), 267–276 (2007)
Article Google Scholar
MAAWG. Email Metrics Program: The Network Operators’ Perspective. Report #10 – third and fourth quarter 2008, Messaging Anti-Abuse Working Group, S. Francisco, CA, USA (March 2009)
Google Scholar
Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes – Which Naive Bayes? In: Third Conference on Email and Anti-Spam, CEAS (2006)
Google Scholar
Nelson, B., Barreno, M., Chi, F., Joseph, A., Rubinstein, B., Saini, U., Sutton, C., Tygar, J., Xia, K.: Exploiting Machine Learning to Subvert Your Spam Filter. In: 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pp. 1–9. ACM Press, New York (2008)
Google Scholar
R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2009), ISBN 3-900051-00-3 http://www.R-project.org
Ramachandran, A., Feamster, N.: Understanding the Network-Level Behavior of Spammers. In: ACM (ed.) SIGCOMM 2006, pp. 291–302 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Dep. of Information Systems/Algoritmi, University of Minho, 4800-058, Guimarães, Portugal
Paulo Cortez & André Correia
Dep. of Informatics, University of Minho, 4710-059, Braga, Portugal
Miguel Rio
Department of Electronic and Electrical Engineering, University College London, Torrington Place, WC1E 7JE, London, UK
Pedro Sousa & Miguel Rocha

Authors

Paulo Cortez
View author publications
You can also search for this author in PubMed Google Scholar
André Correia
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Sousa
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Rocha
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Rio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Bildverarbeitung und angewandte Informatik, Körnerstr. 10, 04107, Leipzig, Deutschland
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cortez, P., Correia, A., Sousa, P., Rocha, M., Rio, M. (2010). Spam Email Filtering Using Network-Level Properties. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2010. Lecture Notes in Computer Science(), vol 6171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14400-4_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-14400-4_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14399-1
Online ISBN: 978-3-642-14400-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics