Filtering Spam Email with Flexible Preprocessors

  • Wanli Ma
  • Dat Tran
  • Dharmendra Sharma
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 4)

Spam email is the common name for “unsolicited bulk email.” It is one of the cybernuisances we have to put up with everyday. Spam email does not just waste resources; it also poses a serious security threat. There are two types of spam email: unsolicited commercial email and email used as a delivery agent for malware (malicious software). The former uses email for commercial advertisement purposes, including illegal commercial activities. Dealing with it costs staff time and IT resources. The latter has a more sinister intention. Any type of malware, be it virus, worm, or spyware, has to find a way to infect host computers. An easy and effective way to deliver malware is through unsolicited bulk email. In the last couple of years, several high profile and successful virus/worm attacks were delivered via unsolicited bulk email—for example, LoveBug, Slammer, etc.


Optical Character Recognition Unknown Word Spam Detection Spam Email Detection Engine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cormack G. Lynam T (2005) TREC 2005 Spam Track Overview. In: The Fourteenth text retrieval conference (TREC 2005), Gaithersburg, MD, USAGoogle Scholar
  2. 2.
    Sahami, M, Dumais S et al (1998) A Bayesian approach to filtering junk e-mail. In: AAAI98 Workshop on learning for text categorizationGoogle Scholar
  3. 3.
    Sakkis G, Androutsopoulos I et al (2003) A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval 6(1):49–73CrossRefGoogle Scholar
  4. 4.
    Carreras X, Marquez L (2001) Boosting trees for anti-spam email filtering. In: 4th International conference on recent advances in natural language processing (RANLP-2001)Google Scholar
  5. 5.
    Zhang L Yao T-s (2003) Filtering junk mail with a maximum entropy model. In: 20th International conference on computer processing of oriental languages (ICCPOL03)Google Scholar
  6. 6.
    Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Transactions on Neural Networks 10(5):1048–1054CrossRefGoogle Scholar
  7. 7.
    Chuan Z, Xianliang L et al (2005) A LVQ-based neural network anti-spam email approach. ACM SIGOPS Operating Systems Review 39(1):34–39CrossRefGoogle Scholar
  8. 8.
    Zhou Y, Mulekar MS, Nerellapalli P (2005) Adaptive spam filtering using dynamic feature space. In: 17th IEEE international conference on tools with artificial intelligence (ICTAI’05)Google Scholar
  9. 9.
    Graham-Cumming J (2006) The Spammers’ compendium. 15 May 2006 [cited 2006 May]. Available from:
  10. 10.
    Cockeyed (2006) There are 600,426,974,379,824,381,952 ways to spell Viagra [cited 2006 October]. Available from:
  11. 11.
    Wu C-T, Cheng K-T et al (2005) Using visual features for anti-spam filtering. In: IEEE international conference on image processing, 2005 (ICIP 2005)Google Scholar
  12. 12.
    Damiani E, Vimercati SDCd et al (2004) P2P-based collaborative spam detection and filtering. In: 4th IEEE international conference on peer-to-peer computing (P2P’04), Zurich, SwitzerlandGoogle Scholar
  13. 13.
    Albrecht K, Burri N, Wattenhofer R (2005) Spamato—an extendable spam filter system. In: 2nd Conference on email and anti-spam (CEAS’05), Stanford University, Palo Alto, California, USAGoogle Scholar
  14. 14.
    Yerazunis WS, Chhabra S et al (2005) A Unified model of spam filtration, Mitsubishi Electric Research Laboratories Inc: 201 Broadway, Cambridge, Massachusetts 02139, USAGoogle Scholar
  15. 15.
    Postel JB (1982) Simple mail transfer protocol [cited 2006 May]. Available from:
  16. 16.
    Freed N, Borenstein N (1996) Multipurpose internet mail extensions (MIME). Part 2: Media types [cited 2006 May]. Available from:
  17. 17.
    Ma W, Tran D et al (2006) Detecting spam email by extracting keywords from image attachments. In: Asia-Pacific workshop on visual information processing (VIP2006), Beijing, ChinaGoogle Scholar
  18. 18.
    Tran D, Ma W, Sharma W (2006) Fuzzy normalization for spam email detection. In: Proceedings of SCIS & ISISGoogle Scholar
  19. 19.
    Tran D, Ma W, Sharma D (2006) A Noise tolerant spam email detection engine. In: 5th Workshop on the internet, telecommunications and signal processing (WITSP’06), Hobart, AustraliaGoogle Scholar
  20. 20.
    Ma W, Tran D, Sharma D (2006) Detecting image based spam email by using OCR and Trigram method. In: International workshop on security engineering and information technology on high performance network (SIT2006), Cheju Island, KoreaGoogle Scholar
  21. 21.
    Tran D, Ma W, et al (2006) A Proposed statistical model for spam email detection. In: Proceedings of the first international conference on theories and applications of computer science (ICTAC 2006)Google Scholar
  22. 22.
    Pelletier L, Almhana J, Choulakian V (2004) Adaptive filtering of spam. In: Second annual conference on communication networks and services research (CNSR’04)Google Scholar
  23. 23.
    Zdziarski JA (2005) Ending spam: Bayesian content filtering and the art of statistical language classification. No Starch Press, San Francisco, USAGoogle Scholar
  24. 24.
    Eikvil L (1993) OCR—optical character recognition. Norwegian Computing Center: Oslo, NorwayGoogle Scholar
  25. 25.
  26. 26.
    Tran D, Sharma D (2005) Markov models for written language identification. In: Proceedings of the 12th international conference on neural information processing, pp 67–70Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Wanli Ma
    • 1
  • Dat Tran
    • 1
  • Dharmendra Sharma
    • 1
  1. 1.School of Information Sciences & EngineeringUniversity of CanberraAustralia

Personalised recommendations