Skip to main content

Neural Recognition and Genetic Features Selection for Robust Detection of E-Mail Spam

  • Conference paper
Advances in Artificial Intelligence (SETN 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3955))

Included in the following conference series:

Abstract

In this paper a method for feature selection and classification of email spam messages is presented. The selection of features is performed in two steps: The selection is performed by measuring their entropy and a fine-tuning selection is implemented using a genetic algorithm. In the classification process, a Radial Basis Function Network is used to ensure robust classification rate even in case of complex cluster structure. The proposed method shows that, when using a two-level feature selection, a better accuracy is achieved than using one-stage selection. Also, the use of a lemmatizer or a stop-word list gives minimal classification improvement. The proposed method achieves 96-97% average accuracy when using only 20 features out of 15000.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval 6, 49–73 (2003)

    Article  Google Scholar 

  2. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-Spam Filtering. In: Proc. of the workshop on Machine Learning in the New Information Age (2000)

    Google Scholar 

  3. Lee, D.L., Chuang, H., Seamons, K.: Document Ranking and the Vector-Space Model. IEEE Software 14, 67–75 (1997)

    Article  Google Scholar 

  4. Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: KDD Workshop on Text Mining (2000)

    Google Scholar 

  5. Michelakis, E., Androutsopoulos, I., Paliouras, G., Sakkis, G., Stamatopoulos, P.: Filtron: A Learning-Based Anti-Spam Filter. In: Proc. of the 1st Conference on Email and Anti-Spam (2004)

    Google Scholar 

  6. Gavrilis, D., Tsoulos, I., Dermatas, E.: Stochastic Classification of Scientific Abstracts. In: Proceedings of the 6th Speech and Computer Conference, Patra (2005)

    Google Scholar 

  7. Pierre, J.M.: On the Automated Classification of Web Sites. Linkoping Electronic Articles in Computer and Information Science 6 (2001)

    Google Scholar 

  8. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 46, 391–407 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gavrilis, D., Tsoulos, I.G., Dermatas, E. (2006). Neural Recognition and Genetic Features Selection for Robust Detection of E-Mail Spam. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds) Advances in Artificial Intelligence. SETN 2006. Lecture Notes in Computer Science(), vol 3955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752912_54

Download citation

  • DOI: https://doi.org/10.1007/11752912_54

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34117-8

  • Online ISBN: 978-3-540-34118-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics