Skip to main content

Feature Selection Techniques for Email Spam Classification: A Survey

  • Conference paper
  • First Online:
Proceedings of International Conference on Artificial Intelligence, Smart Grid and Smart City Applications (AISGSC 2019 2019)

Abstract

In this digital world, most of the communication is done only through the Internet. Email is widely used for exchanging information not only for personal communication but also has an important part in business communication because of its effectiveness, fastness, and cost-effective mode of communication. Spam email is the serious problem on the Internet; when users click on to the spam mail, it starts spreading viruses in the user system, consumes lot of network bandwidth and email storage space, and steals user’s confidential data. Feature selection approach selects the best features from the dataset which removes irrelevant, redundant, and noisy data. The proposed paper offers email spam detection which incorporates various feature selection approaches like Information Gain, Correlation-Based Feature Selection, Genetic Algorithm, Ant Colony Optimization, Artificial Bee Colony, Particle Swarm Optimization, Cuckoo Search Algorithm, Harmony Search Algorithm, etc.; when classification is done after feature selection, it will enhance the performance of spam filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ABC:

Artificial Bee Colony Optimization

BoW:

Bag-of-Word

CGA:

Compact Genetic Algorithm

EA:

Evolutionary Algorithm

F-GSO:

Firefly-Group Search Optimizer

HKSVM:

Hybrid Kernel based Support Vector Machine

HSA:

Harmony Search Algorithm

KNN:

K-Nearest Neighbors

LR:

Logistic Regression

MLP:

Multi-Layer Perceptron

MLP-NN:

Multi-Layer Perceptron Neural Network

NB:

Naïve Bayes

PCA:

Principal Component Analysis

PNN:

Probabilistic Neural Network

PoS:

Part-of-Speech

PSO:

Particle Swarm Optimization

PU:

Positive Unlabeled

RF:

Random Forests

SCS:

Stepsize Cuckoo Search

SMO:

Sequential Minimal Optimization

SVM:

Support Vector Machine

TF-IDF:

Term Frequency – Inverse Document Frequency

TREC:

Text Retrieval Conference

UCI:

UC Irvine

References

  1. Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626

    Article  Google Scholar 

  2. Jain D, Singh V (2018) Feature selection and classification systems for chronic disease prediction: a review. Egypt Inform J Elsevier 19(3):179–189

    Article  Google Scholar 

  3. Bhuiyan H, Ashiquzzaman A, Juthi TI, Biswas S, Ara J (2018) A survey of existing E-mail spam filtering methods considering machine learning techniques. Global J Comp Sci Technol C Softw Data Eng 18(2):21–29

    Google Scholar 

  4. Mujtaba G, Shuib L, Raj RG, Majeed N, Al-Garadi MA (2017) Email classification research trends: review and open issues, IEEE Access, pp 9044–9064

    Google Scholar 

  5. Sharma V, Poriye M, Kumar V (2017) Various classifiers with optimal feature selection for- email spam filtering. Int J Comput Sci Commun 8(2):18–22

    Google Scholar 

  6. Varghese R, Dhanya KA (2017) Efficient feature set for spam email filtering. IEEE 7th international advance computing conference, pp 732–737

    Google Scholar 

  7. Dagher I, Antoun R (2017) Ham – Spam Filtering using Kernel PCA. Int J Comput Commun 11:38–44

    Google Scholar 

  8. Mehdi Zekriyapanah Gashti (2017) FHSA. Eng Technol Appl Sci Res 7(3):1713–1718

    Google Scholar 

  9. Kaur H, Prince Verma E (2017) E-mail spam detection using refined MLP with feature selection. Int J Modern Educ Comput Sci 9:42–52

    Article  Google Scholar 

  10. Esmaeili M, Arjomandzadeh A, Shams R, Zahedi M (2017) An anti-spam system using naive Bayes method and feature selection methods. Int J Comput Appl 165(4):1–5

    Google Scholar 

  11. Kumaresan T, Palanisamy C (2017) E-mail spam classification using S-cuckoo search and support vector machine. Int J Bio-Inspired Comput 9(3):142–156

    Article  Google Scholar 

  12. Rathore SK, Yada S (2017) A hybrid Bayesian approach with ABC to recognition of email SPAM. Int J Comput Sci Mob Comput 6(5):459–466

    Google Scholar 

  13. Shradhanjali, Verma T (2017) E-mail spam detection and classification using SVM and feature extraction. Int J Adv Res Ideas Innov Technol 3(3)

    Google Scholar 

  14. Kumaresan T, Saravanakumar S, Balamurugan R (2017) Visual and textual features based email spam classification using S-cuckoo search and hybrid kernel support vector machine. Clust Comput 22:33–46. Springer Publication

    Google Scholar 

  15. Renuka DK, Visalakshi P (2017) Weighted-based multiple classifier and F-GSO algorithm for email spam classification. Int J Business Intelligence Data Mining 12(3):274–298

    Article  Google Scholar 

  16. Zavvar M, Rezaei M, Garavand S (2016) Email spam detection using combination of particle swarm optimization and artificial neural network and support vector machine. Int J Modern Education Computer Science (IJMECS) 8(7):68–74

    Article  Google Scholar 

  17. Karthika Renuka D, Visalakshi P, Sankar T (2015) Improving E-mail spam classification using ant Colony optimization algorithm. Int J Comput Appl 22–26

    Google Scholar 

  18. Mohamad M, Selamat A (2015) An evaluation on the efficiency of hybrid feature selection in spam email classification. IEEE international conference on computer, communication, and control. Technology 227–231

    Google Scholar 

  19. Kumar S, Arumugam S (2015) A probabilistic neural network based classification of spam mails using particle swarm optimization feature selection. Middle-East J Sci Res 23(5):874–879

    Google Scholar 

  20. Kalaibar SM, Razavi SN (2014) Spam filtering by using genetic based feature selection. Int J Comput Appl Technol Res 3(12):839–843

    Google Scholar 

Download references

Acknowledgments

Our sincere thanks to the University Grants Commission (UGC), Hyderabad, for granting the funds to carry out this work.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vinitha, V.S., Renuka, D.K. (2020). Feature Selection Techniques for Email Spam Classification: A Survey. In: Kumar, L., Jayashree, L., Manimegalai, R. (eds) Proceedings of International Conference on Artificial Intelligence, Smart Grid and Smart City Applications. AISGSC 2019 2019. Springer, Cham. https://doi.org/10.1007/978-3-030-24051-6_86

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24051-6_86

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24050-9

  • Online ISBN: 978-3-030-24051-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics