Skip to main content

A Heuristic-Based Feature Selection Method for Clustering Spam Emails

  • Conference paper
Neural Information Processing. Theory and Algorithms (ICONIP 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6443))

Included in the following conference series:

  • 2484 Accesses

Abstract

In recent years, in order to cope with spam based attacks, there have been many efforts made towards the clustering of spam emails. During the clustering process, many statistical features (e.g., the size of emails) are used for calculating similarities between spam emails. In many cases, however, some of the features may be redundant or contribute little to the clustering process. Feature selection is one of the most typical methods used to identify a subset of key features from an initial set. In this paper, we propose a heuristic-based feature selection method for clustering spam emails. Unlike the existing methods in that they make the combinations of given features and evaluate them using data mining and machine learning techniques, our method focuses on evaluating each feature according to only its value distribution in spam clusters. With our method, we identified 4 significant features which yielded a clustering accuracy of 86.33% with low time complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhuang, L., Dunagan, J., Simon, D.R., Wang, H.J., Tygar, J.D.: Characterizing botnets from email spam records. In: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, San Francisco, vol. (2), pp. 1–9 (2008)

    Google Scholar 

  2. Li, F., Hsieh, M.H.: An Empirical Study of Clustering Behavior of Spammers and Group-based Anti-Spam Strategies. In: Proceedings of 3rd Conference on Email and Anti-Spam (CEAS), Mountain View, CA, pp. 21–28 (2006)

    Google Scholar 

  3. Xie, Y., Yu, F., Achan, K., Panigrahy, R., Hulten, G., Osipkov, I.: Spamming botnets: signatures and characteristics. ACM SIGCOMM Computer Communication Review 38(4) (October 2008)

    Google Scholar 

  4. Song, J., Inoue, D., Eto, M., Kim, H., Nakao, K.: An Empirical Study of Spam: Analyzing Spam Sending Systems and Malicious Web Servers. In: 10th Annual International Symposium on Applications and the Internet (SAINT 2010), Seoul, Korea, pp. 19–23 (July 2010)

    Google Scholar 

  5. Anderson, D.S., Fleizach, C., Savage, S., Voelker, G.M.: Spamscatter: Characterizing Internet Scam Hosting Infrastructure. In: Proceedings of the USENIX Security Symposium, Boston (2007)

    Google Scholar 

  6. Song, J., Inoue, D., Eto, M., Kim, H., Nakao, K.: O-means: An Optimized Clustering Method for Analyzing Spam Based Attacks. IEICE Transactions on Fundamentals E94-A(1) (January 2010)

    Google Scholar 

  7. Fetterly, D., Manasse, M., Najork, M., Wiener, J.L.: A large-scale study of the evolution of web pages. Softw. Pract. Exper. 34(2) (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Song, J., Eto, M., Kim, H.C., Inoue, D., Nakao, K. (2010). A Heuristic-Based Feature Selection Method for Clustering Spam Emails. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds) Neural Information Processing. Theory and Algorithms. ICONIP 2010. Lecture Notes in Computer Science, vol 6443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17537-4_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17537-4_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17536-7

  • Online ISBN: 978-3-642-17537-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics