Abstract
The rapid growth of unsolicited and unwanted messages has inspired the development of many anti-spam methods. Machine-learning methods such as Naïve Bayes (NB), support vector machines (SVMs) or neural networks (NNs) have been particularly effective in categorizing spam /non-spam messages. They automatically construct word lists and their weights usually in a bag-of-words fashion. However, traditional multilayer perceptron (MLP) NNs usually suffer from slow optimization convergence to a poor local minimum and overfitting issues. To overcome this problem, we use a regularized NN with rectified linear units (RANN-ReL) for spam filtering. We compare its performance on three benchmark spam datasets (Enron, SpamAssassin, and SMS spam collection) with four machine algorithms commonly used in text classification, namely NB, SVM, MLP, and k-NN. We show that the RANN-ReL outperforms other methods in terms of classification accuracy, false negative and false positive rates. Notably, it classifies well both major (legitimate) and minor (spam) classes.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retrieval 1(4), 335–455 (2006)
Delany, S.J., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)
Hoanca, B.: How good are our weapons in the spam wars? IEEE Technol. Soc. Mag. 25(1), 22–30 (2006)
Laorden, C., Ugarte-Pedrero, X., Santos, I., Sanz, B., Nieves, J., Bringas, P.G.: Study on the effectiveness of anomaly detection for spam filtering. Inf. Sci. 277, 421–444 (2014)
Shen, H., Li, Z.: Leveraging social networks for effective spam filtering. IEEE Trans. Comput. 63(11), 2743–2759 (2014)
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal E-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. ACM (2000)
Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with naive bayes - which naive bayes? In: Third Conference on Email and AntiSpam (CEAS), pp. 27–28 (2006)
Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. In: Proceedings of RANLP 2001, Bulgaria, pp. 58–64 (2001)
Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)
Jiang, S., Pang, G., Wu, M., Kuang, L.: An Improved K-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 39(1), 1503–1509 (2012)
Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence (WI 2003), pp. 702–705. IEEE Computer Society (2003)
Zhou, B., Yao, Y., Luo, J.: Cost-sensitive three-way email spam filtering. J. Intell. Inf. Syst. 42(1), 19–45 (2014)
Guzella, T., Caminhas, W.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36(7), 10206–10222 (2009)
Caruana, G., Li, M.: A survey of emerging approaches to spam filtering. ACM Comput. Surv. 44(2), 1–27 (2012)
Nam, J., Kim, J., Mencía, E.L., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification - revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Melo, R. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 437–452. Springer, Berlin Heidelberg (2014)
Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012)
Khan, A., Baharudin, B., Lee, L.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. 1(1), 4–20 (2010)
Carpinter, J., Hunt, R.: Tightening the net: a review of current and next generation spam filtering tools. Comput. Secur. 25(8), 566–578 (2006)
Talbot, D.: Where Spam is born. MIT Technol. Rev. 111(3), 28 (2008)
Fawcett, T.: In vivo spam filtering: a challenge problem for KDD. ACM SIGKDD Explor. Newsl. 5(2), 140–148 (2003)
Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk E-Mail. In: Papers from the 1998 Workshop Learning for Text Categorization, vol. 62, pp. 98–105 (1998)
Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Trans. Asian Lang. Inf. Process. 3(4), 243–269 (2004)
Koprinska, I., Poon, J., Clark, J., Chan, J.: Learning to classify E-mail. Inf. Sci. 177(10), 2167–2187 (2007)
Lai, C.: An empirical study of three machine learning methods for spam filtering. Knowl.-Based Syst. 20(3), 249–254 (2007)
Vyas, T., Prajapati, P., Gadhwal, S.: A survey and evaluation of supervised machine learning techniques for spam E-mail filtering. In: IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–7. IEEE (2015)
Almeida, T.A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM Symposium on Document Engineering, pp. 259–262. ACM (2011)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th International Conference on Machine Learning, vol. 30, pp. 1–6 (2013)
Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5887. IEEE (2011)
Hajek, P., Bohacova, J.: Predicting abnormal bank stock returns using textual analysis of annual reports - a neural network approach. In: Jayne, C., Iliadis, L. (eds.) Engineering Applications of Neural Networks (EANN), pp. 67–78. Springer, New York (2016)
Acknowledgments
This work was supported by the grant No. SGS_2016_023 of the Student Grant Competition.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Barushka, A., Hájek, P. (2016). Spam Filtering Using Regularized Neural Networks with Rectified Linear Units. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds) AI*IA 2016 Advances in Artificial Intelligence. AI*IA 2016. Lecture Notes in Computer Science(), vol 10037. Springer, Cham. https://doi.org/10.1007/978-3-319-49130-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-49130-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49129-5
Online ISBN: 978-3-319-49130-1
eBook Packages: Computer ScienceComputer Science (R0)