Skip to main content

Combining Classifiers for Spam Detection

  • Conference paper
Networked Digital Technologies (NDT 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 293))

Included in the following conference series:

Abstract

Nowadays e-mail has become a fast and economical way to exchange information. However, unsolicited or junk e-mail also known as spam quickly became a major problem on the Internet and keeping users away from them becomes one of the most important research area. Indeed, spam filtering is used to prevent access to undesirable e-mails. In this paper we propose a spam detection system called “3CA&1NB” which uses machine learning to detect spam. “3CA&1NB” has the characteristic of combining three cellular automata and one naïve Bayes algorithm. We discuss how the combination learning based methods can improve detection performances. Our preliminary results show that it can detect spam effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Androutsopoulos, I., Koutsias, J.: An Evaluation of Naive Bayesian Networks. In: Machine Learning in the New Information Age, Barcelona, Spain, pp. 9–17 (2000)

    Google Scholar 

  2. Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.D., Stamatopoulos, P.: Learning to filter spam e-mail: a comparison of a naïve Bayesian and a memory based approach. In: Proc. Workshop on Machine Learning and Textual Information Access, PKDD, Lyon, France, pp. 1–13 (2000)

    Google Scholar 

  3. Atmani, B., Beldjilali, B.: Knowledge Discovery in Database: Induction Graph and Cellular Automaton. Computing and Informatics Journal 26, 171–197 (2007)

    MATH  Google Scholar 

  4. Awad, A., Polyvyanyy, A., Weske, M.: Semantic querying of business process models. In: Proc. International Conference on Enterprise Distributed Object Computing Conference, EDOC, pp. 85–94 (2008)

    Google Scholar 

  5. Barigou, N., Barigou, F., Atmani, B.: A Boolean model for spam detection. In: Proceedings of the International Conference on Communication, Computing and Control Applications, Tunisia, pp. 450–455 (2011)

    Google Scholar 

  6. Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. In: 4th International Conference on Recent Advances in Natural Language Processing, Bulgaria, pp. 58–64 (2001)

    Google Scholar 

  7. Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: IEEE International Conference on Web Intelligence, Halifax, Canada, pp. 702–705 (2003)

    Google Scholar 

  8. Cormack, G., Lynam, T.: Online supervised spam filter evaluation. ACM Transactions On Information Systems 25(3) (2007)

    Google Scholar 

  9. Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  10. Green, T.: How URL Spam Filtering Beats Bayesian/Heuristics Hands Down (2005), http://www.greenviewdata.com/documents/white_papers/ssh_url_filtering_white_paper.pdf (last date accessed: January 8, 2012)

  11. Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Systems with Applications 36(7), 10206–10222 (2009)

    Article  Google Scholar 

  12. Heron, S.: Technologies for spam detection. Network Security, 11–15 (2009)

    Google Scholar 

  13. Jung, J., Sit, E.: An empirical study of spam traffic and the use of DNS black lists. In: 4th ACM Conference on Internet Measurement, New York, USA, pp. 370–375 (2004)

    Google Scholar 

  14. Koprinska, I., Poon, J., Clarck, J., Chan, J.: Learning to classify e-mail. Information Sciences 177, 2167–2187 (2007)

    Article  Google Scholar 

  15. Lai, C., Tsai, M.: An empirical performance comparison of machine learning methods for spam e-mail categorization. In: 4th International Conference on Hybrid Intelligent Systems, pp. 44-48 (2004)

    Google Scholar 

  16. Rios, G., Zha, H.: Exploring support vector machines and random forests for spam detection. In: First International Conference on Email and Anti Spam (CEAS), California, USA (2004)

    Google Scholar 

  17. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-Mail. In: Learning for Text Categorization, AAAI Technical Report WS-98-05 (1998)

    Google Scholar 

  18. Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V.: Stacking classifiers for anti-spam filtering of e-mail. In: 6th Proceedings of Empirical Methods in Natural Language Processing, Pittsburgh, PA, pp. 44–50 (2001)

    Google Scholar 

  19. Santos, I., Laorden, C., Sanz, B., Bringas, P.G.: Enhanced Topic-based Vector Space Model for Semantics-aware Spam Filtering. Expert Systems with Applications 39(1), 437–444 (2012)

    Google Scholar 

  20. Sanz, E.P., Hidalgo, J.M., Perez, J.C.: Email spam filtering. In: Zelkowitz, M. (ed.) Advances in Computers, vol. 74, pp. 45–114 (2008)

    Google Scholar 

  21. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  22. Shih, D.H., Chiang, S., Lin, I.B.: Collaborative spam filtering with heterogeneous agents. Expert Systems with Applications 34(4), 1555–1566 (2008)

    Article  Google Scholar 

  23. Schneider, K.: A comparison of event models for Naive Bayes anti-spam e-mail filtering. In: 10th Conference of the European Chapter of the Association for Computational Linguistics, pp. 307–314 (2003)

    Google Scholar 

  24. Subramaniam, T., Jalab, H., Taqa, A.Y.: Overview of textual anti-spam filtering techniques. International Journal of the Physical Sciences 5(12), 1869–1882 (2010)

    Google Scholar 

  25. Upasana, P., Chakraverty, S.: A review of text classification approaches for e-mail management. International Journal of Engineering and Technology 3(2), 137–144 (2011)

    Google Scholar 

  26. Valentini, G., Masulli, F.: Ensembles of Learning Machines. In: Marinaro, M., Tagliaferri, R. (eds.) WIRN 2002. LNCS, vol. 2486, pp. 3–19. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  27. Vapnik, V.N., Druck, H., Wu, D.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)

    Article  Google Scholar 

  28. Zhang, I., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing 3(4), 243–269 (2004)

    Article  Google Scholar 

  29. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of ICML 1997, 14th International Conference on Machine Learning, Nashville, US, pp. 412–420. Morgan Kaufmann Publishers (1997)

    Google Scholar 

  30. http://www.enisa.europa.eu/act/res/other-areas/anti-spam-measures/studies/spam-slides (last date accessed January 16, 2012)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Barigou, F., Barigou, N., Atmani, B. (2012). Combining Classifiers for Spam Detection. In: Benlamri, R. (eds) Networked Digital Technologies. NDT 2012. Communications in Computer and Information Science, vol 293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30507-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30507-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30506-1

  • Online ISBN: 978-3-642-30507-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics