Email Classification Techniques—A Review

  • Namrata ShroffEmail author
  • Amisha Sinhgala
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 52)


Email has become a significant correspondence medium. Official, personal, social and promotional and other messages hit our mail box every day. From the research, it has discovered that the normal office specialist gets 121 messages for every day. Now and then because of flooding of messages in inbox, a portion of the some mails stay unattended, so on the off chance that messages are characterized into top need folders, at that point, the issue of unattended or unanswered mail will be tackled. In this paper, we identified the key features of email classification are temporal, behavioral, single email multinomial valued, content and local and global features. Also datasets, techniques and tools in various email classification like spam, phishing, multifolder and machine generated email classification were studied. Different email classifiers provide different mechanisms for classification. Challenges in email classification are discussed. From the study, it is found that J48 classification algorithm works the best for spam and ham email classification. In comparison with various email service provider, Microsoft Outlook filters the mail based on many criteria.


Email classification Feature in email classification Spam and phishing Email classification 


  1. 1.
    Mujtaba G, Shuib L, Raj RG, Majeed N, Al-Garadi MA (2017) Email classification research trends: review and open Issues. IEEE Access 5:9044–9064CrossRefGoogle Scholar
  2. 2.
    Alsmadi I, Alhami I (2015) Clustering and classification of email contents. J King Saud Univ Comput Inf Sci 27(1):46–57Google Scholar
  3. 3.
    Youn S, McLeod D (2007) A comparative study for email classification. Advances and innovations in systems, computing sciences and software engineering. Springer, Dordrecht, pp. 387–391Google Scholar
  4. 4.
    Tang G, Pei J, Luk WS (2014) Email mining: tasks, common techniques, and tools. Knowl Inf Syst 41(1):1–31CrossRefGoogle Scholar
  5. 5.
    Ailon N, Karnin ZS, Liberty E, Maarek Y (2013) Threading machine generated email. In: Proceedings of 6th ACM international conference on web search data mining, pp 405–414Google Scholar
  6. 6.
    Smadi S, Aslam N, Zhang L, Alasem R, Hossain MA (2016) Detection of phishing emails using data mining algorithms. In: 9th international conference on software, knowledge, information management and applicationsGoogle Scholar
  7. 7.
    Şentürk Ş, Yerli E, Soǧukpnar İ (2017) Email phishing detection and prevention by using data mining techniques. In: 2nd international conference on computer science and engineering (UBMK), pp 707–712Google Scholar
  8. 8.
    Aski AS, Sourati NK (2016) Proposed efficient algorithm to filter spam using machine learning techniques. Pacific Sci Rev A Nat Sci Eng 18(2):145–149Google Scholar
  9. 9.
    Chae MK, Alsadoon A, Prasad PWC, Sreedharan S (2017) Spam filtering email classification (SFECM) using gain and graph mining algorithm. In: 2nd international conference on anti-cyber crimes, pp 217–222Google Scholar
  10. 10.
    Bekkerman R, McCallum A, Huang G (2004) Automatic categorization of email into folders: benchmark experiments on Enron and SRI corpora. Science 80(418):1–23Google Scholar
  11. 11.
    Kanja S. Editing training data for multi-label classification with the k-nearest neighbor rule.
  12. 12.
    Di Castro D (2018) Automated extractions for machine generated mail. In: WWW ’18 companion: the 2018 web conference companion, vol 2, pp 655–662Google Scholar
  13. 13.
    Sun Y, Garcia-Pueyo L, Wendt JB, Najork M, Broder A (2019) Learning effective embeddings for machine generated emails with applications to email category prediction. In: Proceedings—2018 IEEE international conference on big data (Big Data), vol ii, pp 1846–1855Google Scholar
  14. 14.
    Brutlag JD, Meek C (2000) Challenges of the email domain for text classification. In: Proceedings of the seventeenth international conference on machine learningGoogle Scholar

Copyright information

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021

Authors and Affiliations

  1. 1.Gujarat Technological UniversityChandkhedaIndia
  2. 2.S.V.I.T VasadVasadIndia

Personalised recommendations