Skip to main content

MLSPD - Machine Learning Based Spam and Phishing Detection

  • Conference paper
  • First Online:
Book cover Computational Data and Social Networks (CSoNet 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11280))

Included in the following conference series:

Abstract

Spam emails have become a global menace since the rise of the Internet era. In fact, according to an estimate, around 50% of the emails are spam emails. Spam emails as part of a phishing scam can be sent to the masses with the motive to perform information stealing, identity theft, and other malicious actions. The previous studies showed that 91% of the cyber attacks start with the phishing emails, which contain Uniform Resource Locator (URLs). Although these URLs have several characteristics which make them distinguishable from the usual website links, yet a human eye cannot easily notice these URLs. Previous research also showed that traditional systems such as blacklisting/whitelisting of IPs and spam filters could not efficiently detect phishing and spam emails. However, Machine Learning (ML) approaches have shown promising results in combating spamming and phishing attacks. To identify these threats, we used several ML algorithms to train spam and phishing detector. The proposed framework is based on several linguistic and URL based features. Our proposed model can detect the spam and phishing emails with the accuracy of 89.2% and 97.7%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Statista: Spam share of global email traffic 2014–2018. Technical report. https://www.statista.com/statistics/420391/spam-email-traffic-share. Accessed 1 Sept 2018

  2. Kaspersky: What is spam and a phishing scam. Technical report. https://www.kaspersky.com/resource-center/threats/spam-phishing. Accessed 1 Sept 2018

  3. CSO: What is cryptojacking? How to prevent, detect, and recover from it. Technical report. https://www.csoonline.com/article/3253572/internet/what-is-cryptojacking-how-to-prevent-detect-and-recover-from-it.html. Accessed 1 Sept 2018

  4. Darkreading: 91% of cyberattacks start with a phishing email. Technical report. https://www.darkreading.com/endpoint/91-of-cyberattacks-start-with-a-phishing-email/. Accessed 1 Sept 2018

  5. Volkamer, M., Renaud, K., Reinheimer, B., Kunz, A.: User experiences of TORPEDO: TOoltip-poweRED Phishing Email DetectiOn. Comput. Secur. 71, 100–113 (2017)

    Article  Google Scholar 

  6. Sheng, S., Holbrook, M., Kumaraguru, P., Cranor, L.F., Downs, J.: Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 373–382. ACM (2010)

    Google Scholar 

  7. KnowBe4: Q2 2018 top-clicked phishing report. Technical report. https://www.knowbe4.com/press/knowbe4-releases-q2-2018-top-clicked-phishing-report. Accessed 1 Sept 2018

  8. Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K.: Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 18(1), 826–830 (2017)

    MathSciNet  MATH  Google Scholar 

  9. Kumar, S., Viinikainen, A., Hamalainen, T.: Machine learning classification model for network based intrusion detection system. In: Proceedings of the 11th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 242–249, December 2016

    Google Scholar 

  10. Kumar, S., Viinikainen, A., Hamalainen, T.: A network-based framework for mobile threat detection. In: Proceedings of the 1st International Conference on Data Intelligence and Security (ICDIS), pp. 227–233, April 2018

    Google Scholar 

  11. Pan, Y., Ding, X.: Anomaly based web phishing page detection. In: Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC 2006), pp. 381–392, December 2006

    Google Scholar 

  12. McGrath, D.K., Gupta, M.: Behind phishing: an examination of phisher modi operandi. In Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET) (2008)

    Google Scholar 

  13. Xiang, G., Hong, J., Rose, C.P., Cranor, L.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 21 (2011)

    Article  Google Scholar 

  14. Aydin, M., Baykal, N.: Feature extraction and classification phishing websites based on URL. In: Proceedings of the IEEE Conference on Communications and Network Security (CNS), pp. 769–770, September 2015

    Google Scholar 

  15. Zouina, M., Outtaj, B.: A novel lightweight url phishing detection system using svm and similarity index. Hum. Centric Comput. Inf. Sci. 7(1), 17 (2017)

    Article  Google Scholar 

  16. Jain, A.K., Gupta, B.B.: PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Bokhari, M.U., Agrawal, N., Saini, D. (eds.) Cyber Security. AISC, vol. 729, pp. 467–474. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8536-9_44

    Chapter  Google Scholar 

  17. Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2018)

    Article  Google Scholar 

  18. CSMINING CSDMC2010 SPAM corpus (2010, s.e.d., csdmc2010 and s. corpus). http://csmining.org/index.php/spam-mail-datasets.html. Accessed 1 May 2018

  19. IBM: Watson tone analyzer. https://www.ibm.com/watson/services/tone-analyzer. Accessed 20 Aug 2018

  20. Mohammad, R.M., Thabtah, F., McCluskey, L.: An assessment of features related to phishing websites using an automated technique. In: Proceedings of the International Conference for Internet Technology and Secured Transactions, pp. 492–497, December 2012

    Google Scholar 

  21. Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  22. Mohammad, R.M., Thabtah, F., McCluskey, L.: Phishing websites features. http://eprints.hud.ac.uk/id/eprint/24330/6/MohammadPhishing14July2015.pdf

  23. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjay Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, S., Faizan, A., Viinikainen, A., Hamalainen, T. (2018). MLSPD - Machine Learning Based Spam and Phishing Detection. In: Chen, X., Sen, A., Li, W., Thai, M. (eds) Computational Data and Social Networks. CSoNet 2018. Lecture Notes in Computer Science(), vol 11280. Springer, Cham. https://doi.org/10.1007/978-3-030-04648-4_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04648-4_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04647-7

  • Online ISBN: 978-3-030-04648-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics