Abstract
Spam emails have become a global menace since the rise of the Internet era. In fact, according to an estimate, around 50% of the emails are spam emails. Spam emails as part of a phishing scam can be sent to the masses with the motive to perform information stealing, identity theft, and other malicious actions. The previous studies showed that 91% of the cyber attacks start with the phishing emails, which contain Uniform Resource Locator (URLs). Although these URLs have several characteristics which make them distinguishable from the usual website links, yet a human eye cannot easily notice these URLs. Previous research also showed that traditional systems such as blacklisting/whitelisting of IPs and spam filters could not efficiently detect phishing and spam emails. However, Machine Learning (ML) approaches have shown promising results in combating spamming and phishing attacks. To identify these threats, we used several ML algorithms to train spam and phishing detector. The proposed framework is based on several linguistic and URL based features. Our proposed model can detect the spam and phishing emails with the accuracy of 89.2% and 97.7%, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Statista: Spam share of global email traffic 2014–2018. Technical report. https://www.statista.com/statistics/420391/spam-email-traffic-share. Accessed 1 Sept 2018
Kaspersky: What is spam and a phishing scam. Technical report. https://www.kaspersky.com/resource-center/threats/spam-phishing. Accessed 1 Sept 2018
CSO: What is cryptojacking? How to prevent, detect, and recover from it. Technical report. https://www.csoonline.com/article/3253572/internet/what-is-cryptojacking-how-to-prevent-detect-and-recover-from-it.html. Accessed 1 Sept 2018
Darkreading: 91% of cyberattacks start with a phishing email. Technical report. https://www.darkreading.com/endpoint/91-of-cyberattacks-start-with-a-phishing-email/. Accessed 1 Sept 2018
Volkamer, M., Renaud, K., Reinheimer, B., Kunz, A.: User experiences of TORPEDO: TOoltip-poweRED Phishing Email DetectiOn. Comput. Secur. 71, 100–113 (2017)
Sheng, S., Holbrook, M., Kumaraguru, P., Cranor, L.F., Downs, J.: Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 373–382. ACM (2010)
KnowBe4: Q2 2018 top-clicked phishing report. Technical report. https://www.knowbe4.com/press/knowbe4-releases-q2-2018-top-clicked-phishing-report. Accessed 1 Sept 2018
Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K.: Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 18(1), 826–830 (2017)
Kumar, S., Viinikainen, A., Hamalainen, T.: Machine learning classification model for network based intrusion detection system. In: Proceedings of the 11th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 242–249, December 2016
Kumar, S., Viinikainen, A., Hamalainen, T.: A network-based framework for mobile threat detection. In: Proceedings of the 1st International Conference on Data Intelligence and Security (ICDIS), pp. 227–233, April 2018
Pan, Y., Ding, X.: Anomaly based web phishing page detection. In: Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC 2006), pp. 381–392, December 2006
McGrath, D.K., Gupta, M.: Behind phishing: an examination of phisher modi operandi. In Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET) (2008)
Xiang, G., Hong, J., Rose, C.P., Cranor, L.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 21 (2011)
Aydin, M., Baykal, N.: Feature extraction and classification phishing websites based on URL. In: Proceedings of the IEEE Conference on Communications and Network Security (CNS), pp. 769–770, September 2015
Zouina, M., Outtaj, B.: A novel lightweight url phishing detection system using svm and similarity index. Hum. Centric Comput. Inf. Sci. 7(1), 17 (2017)
Jain, A.K., Gupta, B.B.: PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Bokhari, M.U., Agrawal, N., Saini, D. (eds.) Cyber Security. AISC, vol. 729, pp. 467–474. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8536-9_44
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2018)
CSMINING CSDMC2010 SPAM corpus (2010, s.e.d., csdmc2010 and s. corpus). http://csmining.org/index.php/spam-mail-datasets.html. Accessed 1 May 2018
IBM: Watson tone analyzer. https://www.ibm.com/watson/services/tone-analyzer. Accessed 20 Aug 2018
Mohammad, R.M., Thabtah, F., McCluskey, L.: An assessment of features related to phishing websites using an automated technique. In: Proceedings of the International Conference for Internet Technology and Secured Transactions, pp. 492–497, December 2012
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Mohammad, R.M., Thabtah, F., McCluskey, L.: Phishing websites features. http://eprints.hud.ac.uk/id/eprint/24330/6/MohammadPhishing14July2015.pdf
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kumar, S., Faizan, A., Viinikainen, A., Hamalainen, T. (2018). MLSPD - Machine Learning Based Spam and Phishing Detection. In: Chen, X., Sen, A., Li, W., Thai, M. (eds) Computational Data and Social Networks. CSoNet 2018. Lecture Notes in Computer Science(), vol 11280. Springer, Cham. https://doi.org/10.1007/978-3-030-04648-4_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-04648-4_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04647-7
Online ISBN: 978-3-030-04648-4
eBook Packages: Computer ScienceComputer Science (R0)