Advertisement

An Immunological Filter for Spam

  • George B. Bezerra
  • Tiago V. Barra
  • Hamilton M. Ferreira
  • Helder Knidel
  • Leandro Nunes de Castro
  • Fernando J. Von Zuben
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4163)

Abstract

Spam messages are continually filling email boxes of practically every Web user. To deal with this growing problem, the development of high-performance filters to block those unsolicited messages is strongly required. An Antibody Network, more precisely SRABNET (Supervised Real-Valued Antibody Network), is proposed as an alternative filter to detect spam. The model of the antibody network is generated automatically from the training dataset and evaluated on unseen messages. We validate this approach using a public corpus, called PU1, which has a large collection of encrypted personal e-mail messages containing legitimate messages and spam. Finally, we compared the performance with the well known naïve Bayes filter using some performances indexes that will be presented.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    O’Brien, C., Vogel, C.: Spam filters: Bayes vs. chi-squared; letters vs. words. In: ISICT 2003: Proceedings of the 1st international symposium on Information and communication technologies, Trinity College Dublin, pp. 291–296 (2003)Google Scholar
  2. 2.
    Tsymbal, A.: A case-based approach to spam filtering that can track concept drift. Technical Report TCD-CS-2004-15, Trinity College Dublin (2004)Google Scholar
  3. 3.
    Cunningham, P., Nowlan, N., Delany, S.J., Haah, M.: A case-based approach to spam filtering that can track concept drift. Technical Report TCD-CS-2003-16, Trinity College Dublin (2003)Google Scholar
  4. 4.
    Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naïve Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167 (2000)Google Scholar
  5. 5.
    Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: AAAI 1998 Workshop on Learning for Text Categorization, pp. 55–62 (1998)Google Scholar
  6. 6.
    Graham, P.: A plan for spam (2003), Available at: http://paulgraham.com/spam.html
  7. 7.
    Drucker, H., Vapnik, V., Wu, D.: Support vector machines for spam categorization. IEEE Transactions on Neural Networks 10, 1048–1054 (1999)CrossRefGoogle Scholar
  8. 8.
    Carreras, X., Màrquez, L.: Boosting trees for anti-spam email filtering. In: Proceedings of the 4th International Conference on Recent Advances in Natural Language Processing, Tzigov Chark, BG (2001)Google Scholar
  9. 9.
    Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.D., Stamatopoulos, P.: Learning to filter spam e-mail: A comparison of a naïve Bayesian and a memory-based approach. In: Proceedings of the Workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 1–13 (2000)Google Scholar
  10. 10.
    Oda, T., White, T.: Immunity from spam: An analysis of an artificial immune system for junk email detection. In: Proceedings of the 4th International Conference on Artificial Immune Systems (ICARIS), pp. 276–289 (2005)Google Scholar
  11. 11.
    Oda, T., White, T.: Developing an immunity to spam. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2723, pp. 231–242. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Oda, T., White, T.: Increasing the accuracy of a spam-detecting artificial immune system. In: Proceedings of the Congress on Evolutionary Computation (CEC 2003), Canberra, Australia, pp. 390–396 (2003)Google Scholar
  13. 13.
    Secker, A., Freitas, A.A., Timmis, J.: AISEC: An artificial immune system for e-mail classification. In: Proceedings of the Congress on Evolutionary Computation, pp. 131–139 (2003)Google Scholar
  14. 14.
    Knidel, H., de Castro, L.N., Von Zuben, F.J.: A supervised constructive neuro-immune network for pattern classification. In: IJCNN 2006: Proceedings of the 2006 Conference on International Joint Conference on Neural Networks (2006)Google Scholar
  15. 15.
    de Castro, L.N., Von Zuben, F.J., de Deus Jr., G.A.: The construction of a Boolean competitive neural network using ideas from immunology. Neurocomputing 50, 51–85 (2003)zbMATHCrossRefGoogle Scholar
  16. 16.
    Knidel, H., de Castro, L.N., Von Zuben, F.J.: RABNET: a real-valued antibody network for data clustering. In: GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, pp. 371–372. ACM Press, New York (2005)CrossRefGoogle Scholar
  17. 17.
    Segel, L.A., Perelson, A.S.: Computations in shape space: a new approach to immune network theory. In: Perelson, A. (ed.) Theoretical Immunology. SFI Series on Complexity, vol. 2, pp. 321–343. Addison-Wesley, Reading (1988)Google Scholar
  18. 18.
    Kohonen, T.: Self-organization and associative memory: 3rd edition. Springer, New York (1989)Google Scholar
  19. 19.
    Kohonen, T.: Self-organizing maps. Springer, Berlin (2000)Google Scholar
  20. 20.
    Zuchini, M.H.: Aplicações de mapas auto-organizáveis em mineração de dados e recuperação de informação. Master’s thesis, UNICAMP (2003)Google Scholar
  21. 21.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of ICML 1997, 14th International Conference on Machine Learning, Nashville, US, pp. 412–420. Morgan Kaufmann Publishers, San Francisco (1997)Google Scholar
  22. 22.
    Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing (TALIP) 3, 243–269 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • George B. Bezerra
    • 1
  • Tiago V. Barra
    • 1
  • Hamilton M. Ferreira
    • 1
  • Helder Knidel
    • 1
  • Leandro Nunes de Castro
    • 2
  • Fernando J. Von Zuben
    • 1
  1. 1.Laboratory of Bioinformatics and Bio-Inspired Computing (LBIC), Department of Computer Engineering and Industrial AutomationUniversity of Campinas, UnicampCampinasBrazil
  2. 2.Catholic University of Santos, UniSantosSantosBrazil

Personalised recommendations