Combining Classifiers for Spam Detection

Barigou, Fatiha; Barigou, Naouel; Atmani, Baghdad

doi:10.1007/978-3-642-30507-8_8

Fatiha Barigou²,
Naouel Barigou² &
Baghdad Atmani²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 293))

Included in the following conference series:

International Conference on Networked Digital Technologies

1515 Accesses
2 Citations

Abstract

Nowadays e-mail has become a fast and economical way to exchange information. However, unsolicited or junk e-mail also known as spam quickly became a major problem on the Internet and keeping users away from them becomes one of the most important research area. Indeed, spam filtering is used to prevent access to undesirable e-mails. In this paper we propose a spam detection system called “3CA&1NB” which uses machine learning to detect spam. “3CA&1NB” has the characteristic of combining three cellular automata and one naïve Bayes algorithm. We discuss how the combination learning based methods can improve detection performances. Our preliminary results show that it can detect spam effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Androutsopoulos, I., Koutsias, J.: An Evaluation of Naive Bayesian Networks. In: Machine Learning in the New Information Age, Barcelona, Spain, pp. 9–17 (2000)
Google Scholar
Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.D., Stamatopoulos, P.: Learning to filter spam e-mail: a comparison of a naïve Bayesian and a memory based approach. In: Proc. Workshop on Machine Learning and Textual Information Access, PKDD, Lyon, France, pp. 1–13 (2000)
Google Scholar
Atmani, B., Beldjilali, B.: Knowledge Discovery in Database: Induction Graph and Cellular Automaton. Computing and Informatics Journal 26, 171–197 (2007)
MATH Google Scholar
Awad, A., Polyvyanyy, A., Weske, M.: Semantic querying of business process models. In: Proc. International Conference on Enterprise Distributed Object Computing Conference, EDOC, pp. 85–94 (2008)
Google Scholar
Barigou, N., Barigou, F., Atmani, B.: A Boolean model for spam detection. In: Proceedings of the International Conference on Communication, Computing and Control Applications, Tunisia, pp. 450–455 (2011)
Google Scholar
Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. In: 4th International Conference on Recent Advances in Natural Language Processing, Bulgaria, pp. 58–64 (2001)
Google Scholar
Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: IEEE International Conference on Web Intelligence, Halifax, Canada, pp. 702–705 (2003)
Google Scholar
Cormack, G., Lynam, T.: Online supervised spam filter evaluation. ACM Transactions On Information Systems 25(3) (2007)
Google Scholar
Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Green, T.: How URL Spam Filtering Beats Bayesian/Heuristics Hands Down (2005), http://www.greenviewdata.com/documents/white_papers/ssh_url_filtering_white_paper.pdf (last date accessed: January 8, 2012)
Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Systems with Applications 36(7), 10206–10222 (2009)
Article Google Scholar
Heron, S.: Technologies for spam detection. Network Security, 11–15 (2009)
Google Scholar
Jung, J., Sit, E.: An empirical study of spam traffic and the use of DNS black lists. In: 4th ACM Conference on Internet Measurement, New York, USA, pp. 370–375 (2004)
Google Scholar
Koprinska, I., Poon, J., Clarck, J., Chan, J.: Learning to classify e-mail. Information Sciences 177, 2167–2187 (2007)
Article Google Scholar
Lai, C., Tsai, M.: An empirical performance comparison of machine learning methods for spam e-mail categorization. In: 4th International Conference on Hybrid Intelligent Systems, pp. 44-48 (2004)
Google Scholar
Rios, G., Zha, H.: Exploring support vector machines and random forests for spam detection. In: First International Conference on Email and Anti Spam (CEAS), California, USA (2004)
Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-Mail. In: Learning for Text Categorization, AAAI Technical Report WS-98-05 (1998)
Google Scholar
Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V.: Stacking classifiers for anti-spam filtering of e-mail. In: 6th Proceedings of Empirical Methods in Natural Language Processing, Pittsburgh, PA, pp. 44–50 (2001)
Google Scholar
Santos, I., Laorden, C., Sanz, B., Bringas, P.G.: Enhanced Topic-based Vector Space Model for Semantics-aware Spam Filtering. Expert Systems with Applications 39(1), 437–444 (2012)
Google Scholar
Sanz, E.P., Hidalgo, J.M., Perez, J.C.: Email spam filtering. In: Zelkowitz, M. (ed.) Advances in Computers, vol. 74, pp. 45–114 (2008)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Shih, D.H., Chiang, S., Lin, I.B.: Collaborative spam filtering with heterogeneous agents. Expert Systems with Applications 34(4), 1555–1566 (2008)
Article Google Scholar
Schneider, K.: A comparison of event models for Naive Bayes anti-spam e-mail filtering. In: 10th Conference of the European Chapter of the Association for Computational Linguistics, pp. 307–314 (2003)
Google Scholar
Subramaniam, T., Jalab, H., Taqa, A.Y.: Overview of textual anti-spam filtering techniques. International Journal of the Physical Sciences 5(12), 1869–1882 (2010)
Google Scholar
Upasana, P., Chakraverty, S.: A review of text classification approaches for e-mail management. International Journal of Engineering and Technology 3(2), 137–144 (2011)
Google Scholar
Valentini, G., Masulli, F.: Ensembles of Learning Machines. In: Marinaro, M., Tagliaferri, R. (eds.) WIRN 2002. LNCS, vol. 2486, pp. 3–19. Springer, Heidelberg (2002)
Chapter Google Scholar
Vapnik, V.N., Druck, H., Wu, D.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)
Article Google Scholar
Zhang, I., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing 3(4), 243–269 (2004)
Article Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of ICML 1997, 14th International Conference on Machine Learning, Nashville, US, pp. 412–420. Morgan Kaufmann Publishers (1997)
Google Scholar
http://www.enisa.europa.eu/act/res/other-areas/anti-spam-measures/studies/spam-slides (last date accessed January 16, 2012)

Download references

Author information

Authors and Affiliations

Computer Science Laboratory of Oran Computer Science Department, Faculty of Science, University of Oran, BP 1524, El M’Naouer, Es Senia, 31000, Oran, Algeria
Fatiha Barigou, Naouel Barigou & Baghdad Atmani

Authors

Fatiha Barigou
View author publications
You can also search for this author in PubMed Google Scholar
Naouel Barigou
View author publications
You can also search for this author in PubMed Google Scholar
Baghdad Atmani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software Engineering, Faculty of Engineering, Lakehead University, 955 Oliver Rd., P7B 5E1, Thunder Bay, Ontario, Canada
Rachid Benlamri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barigou, F., Barigou, N., Atmani, B. (2012). Combining Classifiers for Spam Detection. In: Benlamri, R. (eds) Networked Digital Technologies. NDT 2012. Communications in Computer and Information Science, vol 293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30507-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-30507-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30506-1
Online ISBN: 978-3-642-30507-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics