An Integrated Approach to Filtering Phishing E-mails

  • M. Dolores del Castillo
  • Ángel Iglesias
  • J. Ignacio Serrano
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4739)


This paper presents a system for classifying e-mails into two categories, legitimate and fraudulent. This classifier system is based on the serial application of three filters: a Bayesian filter that classifies the textual content of e-mails, a rule based filter that classifies the non-grammatical content of e-mails and, finally, a filter based on an emulator of fictitious accesses which classifies the responses from websites referenced by links contained in e-mails. The approach of this system is hybrid, because it uses different classification methods, and also integrated, because it takes into account all kind of data and information contained in e-mails.


e-mail attacks textual and non-textual content machine learning methods 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Androutsopoulos, I., Paliouras, G., Karkaletsis, G., Sakkis, G., Spyropoulos, C., Stamatopoulos, P.: Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. In: Workshop on Machine Learning and Textual Information Access. 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (2000)Google Scholar
  2. 2.
    Cohen, W.: Learning rules that classify e-mail. In: AAAI Spring Symposium on Machine Learning in Information Access (1996)Google Scholar
  3. 3.
    Cunningham, P., Nowlan, N., Delany, S.J., Haahr, M.: A Case-Based Approach to Spam Filtering that Can Track Concept Drift. Technical Report at Trinity College, TCD-CS-2003-16, Dublin (2003)Google Scholar
  4. 4.
    Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner - version 4.0 Reference Guide (2001)Google Scholar
  5. 5.
    Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5) (1999)Google Scholar
  6. 6.
    GeoTrust TrustWatch,
  7. 7.
  8. 8.
    Graham, P.: Better Bayesian Filtering. In: Proc. of Spam Conference 2003, MIT Media Lab, Cambridge (2003)Google Scholar
  9. 9.
    June Phishing Activity Trends Report (2006),
  10. 10.
    McAffee SpamKiller,
  11. 11.
    Michalsky, R.S.: A theory and methodology of inductive learning. In: Michalsky, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 83–134. Springer, Heidelberg (1983)Google Scholar
  12. 12.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)zbMATHGoogle Scholar
  13. 13.
  14. 14.
  15. 15.
  16. 16.
    Randazzese, V.A.: ChoiceMail Eases Antispam Software Use While Effectively Fighting Off Unwanted E-mail Traffic. CRN (2004)Google Scholar
  17. 17.
    Rulot, H.: ECGI. Un algoritmo de Inferencia Gramatical mediante Corrección de Errores. Phd Thesis, Facultad de Ciencias Físicas, Universidad de Valencia (1992).Google Scholar
  18. 18.
    Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  19. 19.
    Sergeant, M.: Internet-Level Spam Detection and SpamAssassin 2.50. In: Proceedings of Spam Conference 2003, MIT Media Lab. Cambridge (2003),
  20. 20.
    Serrano, J.I., Castillo, M.D.: Evolutionary Learning of Document Categories. Journal of Information Retrieval 10, 69–83 (2007)CrossRefGoogle Scholar
  21. 21.
    Serrano, J.I., Araujo, L.: Statistical Recognition of Noun Phrases in Unrestricted Text. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds.) IDA 2005. LNCS, vol. 3646, pp. 397–408. Springer, Heidelberg (2005)Google Scholar
  22. 22.
  23. 23.
  24. 24.
    Suckers for spam (2005),
  25. 25.
    Tagged Message Delivery Agent Homepage,
  26. 26.
    Understanding Phishing and Pharming. White Paper (2006),
  27. 27.
    Vinther, M.: Junk Detection using neural networks. MeeSoft Technical Report (2002),
  28. 28.
  29. 29.
    3Sharp LLC, Gone Phishing: Evaluating Anti-Phishing Tools for Windows, Robichaux P., Ganger, D.L.,

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • M. Dolores del Castillo
    • 1
  • Ángel Iglesias
    • 1
  • J. Ignacio Serrano
    • 1
  1. 1.Instituto de Automática Industrial. CSIC, Ctra. Campo Real, km. 0,200, 28500 Arganda del Rey, MadridSpain

Personalised recommendations