Skip to main content

Spam Filtering Using Regularized Neural Networks with Rectified Linear Units

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10037))

Abstract

The rapid growth of unsolicited and unwanted messages has inspired the development of many anti-spam methods. Machine-learning methods such as Naïve Bayes (NB), support vector machines (SVMs) or neural networks (NNs) have been particularly effective in categorizing spam /non-spam messages. They automatically construct word lists and their weights usually in a bag-of-words fashion. However, traditional multilayer perceptron (MLP) NNs usually suffer from slow optimization convergence to a poor local minimum and overfitting issues. To overcome this problem, we use a regularized NN with rectified linear units (RANN-ReL) for spam filtering. We compare its performance on three benchmark spam datasets (Enron, SpamAssassin, and SMS spam collection) with four machine algorithms commonly used in text classification, namely NB, SVM, MLP, and k-NN. We show that the RANN-ReL outperforms other methods in terms of classification accuracy, false negative and false positive rates. Notably, it classifies well both major (legitimate) and minor (spam) classes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://csmining.org/index.php/enron-spam-datasets.html.

  2. 2.

    http://csmining.org/index.php/spam-assassin-datasets.html.

  3. 3.

    https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection.

References

  1. Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retrieval 1(4), 335–455 (2006)

    Article  Google Scholar 

  2. Delany, S.J., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)

    Article  Google Scholar 

  3. Hoanca, B.: How good are our weapons in the spam wars? IEEE Technol. Soc. Mag. 25(1), 22–30 (2006)

    Article  Google Scholar 

  4. Laorden, C., Ugarte-Pedrero, X., Santos, I., Sanz, B., Nieves, J., Bringas, P.G.: Study on the effectiveness of anomaly detection for spam filtering. Inf. Sci. 277, 421–444 (2014)

    Article  Google Scholar 

  5. Shen, H., Li, Z.: Leveraging social networks for effective spam filtering. IEEE Trans. Comput. 63(11), 2743–2759 (2014)

    Article  MathSciNet  Google Scholar 

  6. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal E-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. ACM (2000)

    Google Scholar 

  7. Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with naive bayes - which naive bayes? In: Third Conference on Email and AntiSpam (CEAS), pp. 27–28 (2006)

    Google Scholar 

  8. Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. In: Proceedings of RANLP 2001, Bulgaria, pp. 58–64 (2001)

    Google Scholar 

  9. Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)

    Article  Google Scholar 

  10. Jiang, S., Pang, G., Wu, M., Kuang, L.: An Improved K-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 39(1), 1503–1509 (2012)

    Article  Google Scholar 

  11. Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence (WI 2003), pp. 702–705. IEEE Computer Society (2003)

    Google Scholar 

  12. Zhou, B., Yao, Y., Luo, J.: Cost-sensitive three-way email spam filtering. J. Intell. Inf. Syst. 42(1), 19–45 (2014)

    Article  Google Scholar 

  13. Guzella, T., Caminhas, W.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36(7), 10206–10222 (2009)

    Article  Google Scholar 

  14. Caruana, G., Li, M.: A survey of emerging approaches to spam filtering. ACM Comput. Surv. 44(2), 1–27 (2012)

    Article  Google Scholar 

  15. Nam, J., Kim, J., Mencía, E.L., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification - revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Melo, R. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 437–452. Springer, Berlin Heidelberg (2014)

    Google Scholar 

  16. Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012)

  17. Khan, A., Baharudin, B., Lee, L.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. 1(1), 4–20 (2010)

    Google Scholar 

  18. Carpinter, J., Hunt, R.: Tightening the net: a review of current and next generation spam filtering tools. Comput. Secur. 25(8), 566–578 (2006)

    Article  Google Scholar 

  19. Talbot, D.: Where Spam is born. MIT Technol. Rev. 111(3), 28 (2008)

    Google Scholar 

  20. Fawcett, T.: In vivo spam filtering: a challenge problem for KDD. ACM SIGKDD Explor. Newsl. 5(2), 140–148 (2003)

    Article  Google Scholar 

  21. Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)

    Article  Google Scholar 

  22. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk E-Mail. In: Papers from the 1998 Workshop Learning for Text Categorization, vol. 62, pp. 98–105 (1998)

    Google Scholar 

  23. Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Trans. Asian Lang. Inf. Process. 3(4), 243–269 (2004)

    Article  Google Scholar 

  24. Koprinska, I., Poon, J., Clark, J., Chan, J.: Learning to classify E-mail. Inf. Sci. 177(10), 2167–2187 (2007)

    Article  Google Scholar 

  25. Lai, C.: An empirical study of three machine learning methods for spam filtering. Knowl.-Based Syst. 20(3), 249–254 (2007)

    Article  Google Scholar 

  26. Vyas, T., Prajapati, P., Gadhwal, S.: A survey and evaluation of supervised machine learning techniques for spam E-mail filtering. In: IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–7. IEEE (2015)

    Google Scholar 

  27. Almeida, T.A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM Symposium on Document Engineering, pp. 259–262. ACM (2011)

    Google Scholar 

  28. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th International Conference on Machine Learning, vol. 30, pp. 1–6 (2013)

    Google Scholar 

  29. Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5887. IEEE (2011)

    Google Scholar 

  30. Hajek, P., Bohacova, J.: Predicting abnormal bank stock returns using textual analysis of annual reports - a neural network approach. In: Jayne, C., Iliadis, L. (eds.) Engineering Applications of Neural Networks (EANN), pp. 67–78. Springer, New York (2016)

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported by the grant No. SGS_2016_023 of the Student Grant Competition.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petr Hájek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Barushka, A., Hájek, P. (2016). Spam Filtering Using Regularized Neural Networks with Rectified Linear Units. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds) AI*IA 2016 Advances in Artificial Intelligence. AI*IA 2016. Lecture Notes in Computer Science(), vol 10037. Springer, Cham. https://doi.org/10.1007/978-3-319-49130-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49130-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49129-5

  • Online ISBN: 978-3-319-49130-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics