An Add-On to Rule-Based Sifters for Multi-recipient Spam Emails
The Spam filtering technique described here targets multiple recipient Spam messages with similar email addresses. We exploit these similar patterns to create a rule-based classification system (accuracy 92%). Our technique uses the ‘TO’ and ‘CC’ fields to classify an email as Spam or Legitimate. We introduce certain new rules which should enhance the performance of the current filtering techniques . We also introduce a novel metric to calculate the degree of similarity between a set of strings.
Unable to display preview. Download preview PDF.
- 1.Parker, M.: Storing SpamAssassin User Data in SQL Databases, ApacheCon (2004) Google Scholar
- 2.Wu, D., Vapnik, V.: Support vector machine for text categorization (1998)Google Scholar
- 3.Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Academic Press, London (2000)Google Scholar
- 4.Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to filter unsolicited comer cial e-mail. Technical Report, National Centre for Scientific Research Demokritos (2004)Google Scholar