MLSPD - Machine Learning Based Spam and Phishing Detection

Kumar, Sanjay; Faizan, Azfar; Viinikainen, Ari; Hamalainen, Timo

doi:10.1007/978-3-030-04648-4_43

Sanjay Kumar¹⁷,
Azfar Faizan¹⁷,
Ari Viinikainen¹⁷ &
…
Timo Hamalainen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11280))

Included in the following conference series:

International Conference on Computational Social Networks

2032 Accesses
7 Citations

Abstract

Spam emails have become a global menace since the rise of the Internet era. In fact, according to an estimate, around 50% of the emails are spam emails. Spam emails as part of a phishing scam can be sent to the masses with the motive to perform information stealing, identity theft, and other malicious actions. The previous studies showed that 91% of the cyber attacks start with the phishing emails, which contain Uniform Resource Locator (URLs). Although these URLs have several characteristics which make them distinguishable from the usual website links, yet a human eye cannot easily notice these URLs. Previous research also showed that traditional systems such as blacklisting/whitelisting of IPs and spam filters could not efficiently detect phishing and spam emails. However, Machine Learning (ML) approaches have shown promising results in combating spamming and phishing attacks. To identify these threats, we used several ML algorithms to train spam and phishing detector. The proposed framework is based on several linguistic and URL based features. Our proposed model can detect the spam and phishing emails with the accuracy of 89.2% and 97.7%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Statista: Spam share of global email traffic 2014–2018. Technical report. https://www.statista.com/statistics/420391/spam-email-traffic-share. Accessed 1 Sept 2018
Kaspersky: What is spam and a phishing scam. Technical report. https://www.kaspersky.com/resource-center/threats/spam-phishing. Accessed 1 Sept 2018
CSO: What is cryptojacking? How to prevent, detect, and recover from it. Technical report. https://www.csoonline.com/article/3253572/internet/what-is-cryptojacking-how-to-prevent-detect-and-recover-from-it.html. Accessed 1 Sept 2018
Darkreading: 91% of cyberattacks start with a phishing email. Technical report. https://www.darkreading.com/endpoint/91-of-cyberattacks-start-with-a-phishing-email/. Accessed 1 Sept 2018
Volkamer, M., Renaud, K., Reinheimer, B., Kunz, A.: User experiences of TORPEDO: TOoltip-poweRED Phishing Email DetectiOn. Comput. Secur. 71, 100–113 (2017)
Article Google Scholar
Sheng, S., Holbrook, M., Kumaraguru, P., Cranor, L.F., Downs, J.: Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 373–382. ACM (2010)
Google Scholar
KnowBe4: Q2 2018 top-clicked phishing report. Technical report. https://www.knowbe4.com/press/knowbe4-releases-q2-2018-top-clicked-phishing-report. Accessed 1 Sept 2018
Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K.: Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 18(1), 826–830 (2017)
MathSciNet MATH Google Scholar
Kumar, S., Viinikainen, A., Hamalainen, T.: Machine learning classification model for network based intrusion detection system. In: Proceedings of the 11th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 242–249, December 2016
Google Scholar
Kumar, S., Viinikainen, A., Hamalainen, T.: A network-based framework for mobile threat detection. In: Proceedings of the 1st International Conference on Data Intelligence and Security (ICDIS), pp. 227–233, April 2018
Google Scholar
Pan, Y., Ding, X.: Anomaly based web phishing page detection. In: Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC 2006), pp. 381–392, December 2006
Google Scholar
McGrath, D.K., Gupta, M.: Behind phishing: an examination of phisher modi operandi. In Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET) (2008)
Google Scholar
Xiang, G., Hong, J., Rose, C.P., Cranor, L.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 21 (2011)
Article Google Scholar
Aydin, M., Baykal, N.: Feature extraction and classification phishing websites based on URL. In: Proceedings of the IEEE Conference on Communications and Network Security (CNS), pp. 769–770, September 2015
Google Scholar
Zouina, M., Outtaj, B.: A novel lightweight url phishing detection system using svm and similarity index. Hum. Centric Comput. Inf. Sci. 7(1), 17 (2017)
Article Google Scholar
Jain, A.K., Gupta, B.B.: PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Bokhari, M.U., Agrawal, N., Saini, D. (eds.) Cyber Security. AISC, vol. 729, pp. 467–474. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8536-9_44
Chapter Google Scholar
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2018)
Article Google Scholar
CSMINING CSDMC2010 SPAM corpus (2010, s.e.d., csdmc2010 and s. corpus). http://csmining.org/index.php/spam-mail-datasets.html. Accessed 1 May 2018
IBM: Watson tone analyzer. https://www.ibm.com/watson/services/tone-analyzer. Accessed 20 Aug 2018
Mohammad, R.M., Thabtah, F., McCluskey, L.: An assessment of features related to phishing websites using an automated technique. In: Proceedings of the International Conference for Internet Technology and Secured Transactions, pp. 492–497, December 2012
Google Scholar
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Mohammad, R.M., Thabtah, F., McCluskey, L.: Phishing websites features. http://eprints.hud.ac.uk/id/eprint/24330/6/MohammadPhishing14July2015.pdf
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, University of Jyvaskyla, Jyvaskyla, Finland
Sanjay Kumar, Azfar Faizan, Ari Viinikainen & Timo Hamalainen

Authors

Sanjay Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Azfar Faizan
View author publications
You can also search for this author in PubMed Google Scholar
Ari Viinikainen
View author publications
You can also search for this author in PubMed Google Scholar
Timo Hamalainen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanjay Kumar .

Editor information

Editors and Affiliations

Texas Southern University, Houston, TX, USA
Xuemin Chen
Ira A. Fulton School of Engineering, Tempe, AZ, USA
Arunabha Sen
Texas Southern University, Houston, TX, USA
Wei Wayne Li
University of Florida, Gainesville, AL, USA
My T. Thai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, S., Faizan, A., Viinikainen, A., Hamalainen, T. (2018). MLSPD - Machine Learning Based Spam and Phishing Detection. In: Chen, X., Sen, A., Li, W., Thai, M. (eds) Computational Data and Social Networks. CSoNet 2018. Lecture Notes in Computer Science(), vol 11280. Springer, Cham. https://doi.org/10.1007/978-3-030-04648-4_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-04648-4_43
Published: 18 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04647-7
Online ISBN: 978-3-030-04648-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics