When you browse your email, you can usually tell right away whether a message is spam. Still, you probably do not enjoy spending your time identifying spam and have come to rely on a filter to do that task for you, either deleting the spam automatically or filing it in a different mailbox. An email filter is based on a set of rules applied to each incoming message, tagging it as spam or “ham” (not spam). Such a filter is an example of a supervised classification algorithm. It is formulated by studying a training sample of email messages that have been manually classified as spam or ham. Information in the header and text of each message is converted into a set of numerical variables such as the size of the email, the domain of the sender, or the presence of the word “free.” These variables are used to define rules that determine whether an incoming message is spam or ham.
KeywordsSupport Vector Machine Random Forest Linear Discriminant Analysis Flea Beetle Projection Pursuit
Unable to display preview. Download preview PDF.