Spam Filtering Using Regularized Neural Networks with Rectified Linear Units

Barushka, Aliaksandr; Hájek, Petr

doi:10.1007/978-3-319-49130-1_6

Spam Filtering Using Regularized Neural Networks with Rectified Linear Units

Aliaksandr Barushka¹⁷ &
Petr Hájek¹⁷

Conference paper
First Online: 05 November 2016

1432 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10037))

Abstract

The rapid growth of unsolicited and unwanted messages has inspired the development of many anti-spam methods. Machine-learning methods such as Naïve Bayes (NB), support vector machines (SVMs) or neural networks (NNs) have been particularly effective in categorizing spam /non-spam messages. They automatically construct word lists and their weights usually in a bag-of-words fashion. However, traditional multilayer perceptron (MLP) NNs usually suffer from slow optimization convergence to a poor local minimum and overfitting issues. To overcome this problem, we use a regularized NN with rectified linear units (RANN-ReL) for spam filtering. We compare its performance on three benchmark spam datasets (Enron, SpamAssassin, and SMS spam collection) with four machine algorithms commonly used in text classification, namely NB, SVM, MLP, and k-NN. We show that the RANN-ReL outperforms other methods in terms of classification accuracy, false negative and false positive rates. Notably, it classifies well both major (legitimate) and minor (spam) classes.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retrieval 1(4), 335–455 (2006)
Article Google Scholar
Delany, S.J., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)
Article Google Scholar
Hoanca, B.: How good are our weapons in the spam wars? IEEE Technol. Soc. Mag. 25(1), 22–30 (2006)
Article Google Scholar
Laorden, C., Ugarte-Pedrero, X., Santos, I., Sanz, B., Nieves, J., Bringas, P.G.: Study on the effectiveness of anomaly detection for spam filtering. Inf. Sci. 277, 421–444 (2014)
Article Google Scholar
Shen, H., Li, Z.: Leveraging social networks for effective spam filtering. IEEE Trans. Comput. 63(11), 2743–2759 (2014)
Article MathSciNet Google Scholar
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal E-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. ACM (2000)
Google Scholar
Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with naive bayes - which naive bayes? In: Third Conference on Email and AntiSpam (CEAS), pp. 27–28 (2006)
Google Scholar
Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. In: Proceedings of RANLP 2001, Bulgaria, pp. 58–64 (2001)
Google Scholar
Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)
Article Google Scholar
Jiang, S., Pang, G., Wu, M., Kuang, L.: An Improved K-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 39(1), 1503–1509 (2012)
Article Google Scholar
Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence (WI 2003), pp. 702–705. IEEE Computer Society (2003)
Google Scholar
Zhou, B., Yao, Y., Luo, J.: Cost-sensitive three-way email spam filtering. J. Intell. Inf. Syst. 42(1), 19–45 (2014)
Article Google Scholar
Guzella, T., Caminhas, W.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36(7), 10206–10222 (2009)
Article Google Scholar
Caruana, G., Li, M.: A survey of emerging approaches to spam filtering. ACM Comput. Surv. 44(2), 1–27 (2012)
Article Google Scholar
Nam, J., Kim, J., Mencía, E.L., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification - revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Melo, R. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 437–452. Springer, Berlin Heidelberg (2014)
Google Scholar
Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012)
Khan, A., Baharudin, B., Lee, L.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. 1(1), 4–20 (2010)
Google Scholar
Carpinter, J., Hunt, R.: Tightening the net: a review of current and next generation spam filtering tools. Comput. Secur. 25(8), 566–578 (2006)
Article Google Scholar
Talbot, D.: Where Spam is born. MIT Technol. Rev. 111(3), 28 (2008)
Google Scholar
Fawcett, T.: In vivo spam filtering: a challenge problem for KDD. ACM SIGKDD Explor. Newsl. 5(2), 140–148 (2003)
Article Google Scholar
Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)
Article Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk E-Mail. In: Papers from the 1998 Workshop Learning for Text Categorization, vol. 62, pp. 98–105 (1998)
Google Scholar
Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Trans. Asian Lang. Inf. Process. 3(4), 243–269 (2004)
Article Google Scholar
Koprinska, I., Poon, J., Clark, J., Chan, J.: Learning to classify E-mail. Inf. Sci. 177(10), 2167–2187 (2007)
Article Google Scholar
Lai, C.: An empirical study of three machine learning methods for spam filtering. Knowl.-Based Syst. 20(3), 249–254 (2007)
Article Google Scholar
Vyas, T., Prajapati, P., Gadhwal, S.: A survey and evaluation of supervised machine learning techniques for spam E-mail filtering. In: IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–7. IEEE (2015)
Google Scholar
Almeida, T.A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM Symposium on Document Engineering, pp. 259–262. ACM (2011)
Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th International Conference on Machine Learning, vol. 30, pp. 1–6 (2013)
Google Scholar
Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5887. IEEE (2011)
Google Scholar
Hajek, P., Bohacova, J.: Predicting abnormal bank stock returns using textual analysis of annual reports - a neural network approach. In: Jayne, C., Iliadis, L. (eds.) Engineering Applications of Neural Networks (EANN), pp. 67–78. Springer, New York (2016)
Chapter Google Scholar

Download references

Acknowledgments

This work was supported by the grant No. SGS_2016_023 of the Student Grant Competition.

Author information

Authors and Affiliations

Institute of System Engineering and Informatics, Faculty of Economics and Administration, University of Pardubice, Pardubice, Czech Republic
Aliaksandr Barushka & Petr Hájek

Authors

Aliaksandr Barushka
View author publications
You can also search for this author in PubMed Google Scholar
Petr Hájek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petr Hájek .

Editor information

Editors and Affiliations

University of Genoa , Genova, Italy
Giovanni Adorni
University of Parma , Parma, Italy
Stefano Cagnoni
University of Siena , Siena, Italy
Marco Gori
University of Genova , Genova, Italy
Marco Maratea

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barushka, A., Hájek, P. (2016). Spam Filtering Using Regularized Neural Networks with Rectified Linear Units. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds) AI*IA 2016 Advances in Artificial Intelligence. AI*IA 2016. Lecture Notes in Computer Science(), vol 10037. Springer, Cham. https://doi.org/10.1007/978-3-319-49130-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-49130-1_6
Published: 05 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49129-5
Online ISBN: 978-3-319-49130-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics