Advertisement

Neural Computing and Applications

, Volume 31, Issue 3, pp 691–699 | Cite as

Improved email spam detection model based on support vector machines

  • Sunday Olusanya OlatunjiEmail author
Original Article
  • 149 Downloads

Abstract

Email has become extremely popular among people nowadays. In fact, it has been reported to be the cheapest, popular and fastest means of communication in recent times. Despite the huge benefits of emails, unfortunately its usage has been bedeviled with the huge presence of unsolicited and sometimes fraudulent emails which must be promptly detected and isolated through what is popularly referred to as spam detection system. Spam detection is highly needed to protect email users and prevents several negative usages to which emails have been subjected to of recent. Unfortunately, due to the adaptive nature of unsolicited emails through the use of mailing tools, the effectiveness of the spam detecting tools has often been limited and sometimes rendered ineffective, hence the need for better spam detection tools to achieve better spam detection accuracy. Several spam detection models have been proposed and tested in the literature, but still the reported accuracy indicated that there is still need for more work in this direction in order to achieve better accuracy. In this work, support vector machines-based model is proposed for spam detection while paying attention to appropriately search for the optimal parameters to achieve better performance. Experimental results show that the proposed model outperformed all the earlier proposed models on the same popular dataset employed in this work. Accuracy of 95.87 and 94.06% was obtained for training and testing sets, respectively. The 94.06% testing accuracy represents an improvement of 3.11% over the best reported model in the literature that had an accuracy of 91.22% on the same dataset used in this work.

Keywords

Support vector machines Email Spam Non-spam Spam detector Computational intelligence 

Notes

Acknowledgement

The author would like to acknowledge the University of Dammam, Dammam, Kingdom of Saudi Arabia for some of the facilities utilized during the course of this research.

Compliance with ethical standards

Conflict of interest

The author declares that he has no conflict of interest.

References

  1. 1.
    Abu-Nimeh S, Nappa D, Wang X, Nair S (2008) Bayesian additive regression trees-based spam detection for enhanced email privacy. In: 2008 third international conference on availability, reliability and security. IEEE, pp. 1044–1051. doi: 10.1109/ARES.2008.136
  2. 2.
    Adewumi AAAA, Owolabi TO, Alade IOIO, Olatunji SO (2016) Estimation of physical, mechanical and hydrological properties of permeable concrete using computational intelligence approach. Appl Soft Comput 42:342–350. doi: 10.1016/j.asoc.2016.02.009 CrossRefGoogle Scholar
  3. 3.
    Akande KOKO, Owolabi TO, Olatunji SO (2015) Investigating the effect of correlation-based feature selection on the performance of support vector machines in reservoir characterization. J Nat Gas Sci Eng 22:515–522. doi: 10.1016/j.jngse.2015.01.007 CrossRefGoogle Scholar
  4. 4.
    Akande KO, Olatunji SO, Owolabi TO, AbdulRaheem A (2015a) Comparative analysis of feature selection-based machine learning techniques in reservoir characterization. CPAPER, Society of Petroleum Engineers. doi: 10.2118/178006-MS
  5. 5.
    Akande KO, Olatunji SO, Owolabi TO, AbdulRaheem A (2015b) Feature selection-based ANN for improved characterization of carbonate reservoir. CPAPER, Society of Petroleum Engineers. doi: 10.2118/178029-MS
  6. 6.
    Akande KO, Owolabi TO, Twaha S, Olatunji SO (2014) Performance comparison of SVM and ANN in predicting compressive strength of concrete. IOSR J Comput Eng 16(5):88–94CrossRefGoogle Scholar
  7. 7.
    Ariaeinejad R, Sadeghian A (2011) Spam detection system: a new approach based on interval type-2 fuzzy sets. In: 2011 24th Canadian conference on electrical and computer engineering(CCECE). IEEE, pp. 000379–000384. doi: 10.1109/CCECE.2011.6030477
  8. 8.
    Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297zbMATHGoogle Scholar
  9. 9.
    Fernandez R, Picard RW (2002) Dialog act classification from prosodic features using support vector machines. In: Speech Prosody. Conference paper, Aix-en Provence, France, Dialog ActGoogle Scholar
  10. 10.
    Gupta SM (2007) Support vector machines based modelling of concrete strength. World Acad Sci Eng Technol 36:305–311Google Scholar
  11. 11.
    Ibitoye M, Hamzaid N, Abdul Wahab A, Hasnan N, Olatunji S, Davis G (2016) Estimation of electrically-evoked knee torque from mechanomyography using support vector regression. Sensors 16(7):1115. doi: 10.3390/s16071115 CrossRefGoogle Scholar
  12. 12.
    Idris I, Selamat A (2014) Improved email spam detection model with negative selection algorithm and particle swarm optimization. Appl Soft Comput 22:11–27. doi: 10.1016/j.asoc.2014.05.002 CrossRefGoogle Scholar
  13. 13.
    Özgür L, Güngör T, Gürgen F (2004) Spam mail detection using artificial neural network and Bayesian filter, 505–510. doi: 10.1007/978-3-540-28651-6_74
  14. 14.
    Hopkins M, Reeber E, Forman G, Suermondt J (1999) SpamBase dataset. Hewlett-Packard Labs; 1501 Page Mill Rd.; Palo Alto; CA 94304. https://archive.ics.uci.edu/ml/datasets/Spambase
  15. 15.
    Milano P, Chicco D (2012) Support vector machines in bioinformatics: a survey. A technical report, pp 1–35. https://s3-us-west-2.amazonaws.com/mlsurveys/125.pdf. Accessed June 2017
  16. 16.
    Ni L-P, Ni Z-W, Gao Y-Z (2011) Stock trend prediction based on fractal feature selection and support vector machine. Expert Syst Appl 38(5):5569–5576. http://www.sciencedirect.com/science/article/B6V03-51F7PMS-B/2/f3645bc7144b2047233ac753849dccce
  17. 17.
    Olatunji SO, Hossain A (2012) Support vector machines based model for predicting software maintainability of object-oriented software systems. J Inf Commun Technol 2(5), 23–32. http://www.jict.co.uk/volume-2-issue-5-may-2012
  18. 18.
    Olatunji SO, Selamat A, Abdulraheem A, Abdul Raheem AA (2014) A hybrid model through the fusion of type-2 fuzzy logic systems, and extreme learning machines for modelling permeability prediction. Inf Fusion 16(2014):29–45. doi: 10.1016/j.inffus.2012.06.001 CrossRefGoogle Scholar
  19. 19.
    Owolabi T, Akande K, Olatunji S (2014) Estimation of superconducting transition temperature T C for superconductors of the doped MgB2 system from the crystal lattice parameters using support vector regression. J Supercond Novel Magn. doi: 10.1007/s10948-014-2891-7 Google Scholar
  20. 20.
    Owolabi TO, Akande KO, Olatunji SO (2015) Estimation of surface energies of hexagonal close packed metals using computational intelligence technique. Appl Soft Comput 31:360–368. doi: 10.1016/j.asoc.2015.03.009 CrossRefGoogle Scholar
  21. 21.
    Owolabi TO, Akande KOKO, Olatunji SO (2016) Application of computational intelligence technique for estimating superconducting transition temperature of YBCO superconductors. Appl Soft Comput 43:143–149. doi: 10.1016/j.asoc.2016.02.005 CrossRefGoogle Scholar
  22. 22.
    Owolabi TO, Akande KO, Olatunji SO (2014) Estimation of the atomic radii of periodic elements using support vector machine. Int J Adv Inf Sci Technol 28(28):39–49Google Scholar
  23. 23.
    Owolabi TO, Akande KO, Olatunji SO (2014) Prediction of superconducting transition temperatures for fe-based superconductors using support vector machine. Adv Phys Theories Appl 35:12–26Google Scholar
  24. 24.
    Owolabi TO, Akande KO, Olatunji SO (2014) Support vector machines approach for estimating work function of semiconductors: addressing the limitation of metallic plasma model. Appl Phys Res 6(5):122CrossRefGoogle Scholar
  25. 25.
    Owolabi TO, Akande KO, Olatunji SO (2015) Development and validation of surface energies estimator (SEE) using computational intelligence technique. Comput Mater Sci 101:143–151. doi: 10.1016/j.commatsci.2015.01.020 CrossRefGoogle Scholar
  26. 26.
    Owolabi TO, Akande KO, Olatunji SO (2015) Estimation of surface energies of transition metal carbides using machine learning approach. Int J Mater Sci Eng. doi: 10.17706/ijmse.2015.3.2.104-119 Google Scholar
  27. 27.
    Owolabi TO, Akande KO, Olatunji SO (2016) Computational intelligence method of estimating solid–liquid interfacial energy of materials at their melting temperatures. J Intell Fuzzy Syst 31:519–527CrossRefGoogle Scholar
  28. 28.
    Owolabi TO, Akande KO, Sunday OO (2015) Modeling of average surface energy estimator using computational intelligence technique. Multidiscip Modell Mater Struct 11(2):284–296. doi: 10.1108/MMMS-12-2014-0059 CrossRefGoogle Scholar
  29. 29.
    Owolabi TO, Faiz M, Olatunji SO, Popoola IK (2016) Computational intelligence method of determining the energy band gap of doped ZnO semiconductor. Mater Des 101:277–284. doi: 10.1016/j.matdes.2016.03.116 CrossRefGoogle Scholar
  30. 30.
    Rojas DA, Ramos OL, Saby JE (2016) Recognition of Spanish vowels through imagined speech by using spectral analysis and SVM. J Inf Hiding Multimed Signal Process 7(4):889–897. http://bit.kuas.edu.tw/~jihmsp/2016/vol7/JIH-MSP-2016-04-020.pdf
  31. 31.
    Canu S, Grandvalet Y, Guigue V, Rakotomamonjy A (2008) SVM and kernel methods matlab toolbox. A free SVM toolbox. http://asi.insa-rouen.fr/enseignants/~arakoto/toolbox/. Accessed June 2017
  32. 32.
    Olatunji SO, Arif H (2015) Identification of erythemato-squamous skin diseases using support vector machines and extreme learning machines: a comparative study towards effective diagnosis. Trans Mach Learn Artif Intell 2(6):124–135. doi: 10.14738/tmlai.26.812 Google Scholar
  33. 33.
    Temitayo F, Stephen O, Abimbola A (2012) Hybrid GA-SVM for efficient feature selection in E-mail classification. ISSN 3(3):2222–1719. www.iiste.org
  34. 34.
    Vapnik V (1995) The nature of statistical learning theory. Springer, New YorkCrossRefzbMATHGoogle Scholar
  35. 35.
    Yin H, Qiao J, Fu P, Xia X (2014) Face feature selection with binary particle swarm optimization and support vector machine. J Inf Hiding Multimed Signal Process 5(4):731–739. http://bit.kuas.edu.tw/~jihmsp/2014/vol5/JIH-MSP-2014-04-014.pdf
  36. 36.
    Zhang Y, Li H, Niranjan M, Rockett P (2008) Applying cost-sensitive multiobjective genetic programming to feature extraction for spam e-mail filtering. Springer, Berlin, pp. 325–336. doi: 10.1007/978-3-540-78671-9_28

Copyright information

© The Natural Computing Applications Forum 2017

Authors and Affiliations

  1. 1.Computer Science Department, College of Computer Sciences and Information TechnologyUniversity of DammamDammamKingdom of Saudi Arabia

Personalised recommendations