Towards the Detection of Malicious URL and Domain Names Using Machine Learning

Ghalati, Nastaran Farhadi; Ghalaty, Nahid Farhady; Barata, José

doi:10.1007/978-3-030-45124-0_10

Nastaran Farhadi Ghalati¹⁹,
Nahid Farhady Ghalaty²⁰ &
José Barata¹⁹

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 577))

Included in the following conference series:

Doctoral Conference on Computing, Electrical and Industrial Systems

1019 Accesses
4 Citations

Abstract

Malicious Uniform Resource Locator (URL) is an important problem in web search and mining. Malicious URLs host unsolicited content (spam, phishing, drive-by downloads, etc.) and try to lure uneducated users into clicking in such links or downloading malware which will result in critical data exfiltration. Traditional techniques in detecting such URLs have been to use blacklists and rule-based methods. The main disadvantage of such problems is that they are not resistant to 0-day attacks, meaning that there will be at least one victim for each URL before the blacklist is created. Other techniques include having sandbox and testing the URLs before clicking on them in the production or main environment. Such methods have two main drawbacks which are the cost of the sandboxing as well as the non-real-time response which is due to the approval process in the test environment. In this paper, we propose a method that exploits semantic features in both domains and URLs as well. The method is adaptive, meaning that the model can dynamically change based on the new feedback received on the 0-day attacks. We extract features from all sections of a URL separately. We then apply three methods of machine learning on three different sets of data. We provide an analysis of features on the most efficient value of N for applying the N-grams to the domain names. The result shows that Random Forest has the highest accuracy of over 96% and at the same time provides more interpretability as well as performance benefits.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

RSA Quarterly Fraud Report, vol. 1, no. 3Q3 (2018)
Google Scholar
Nahorney, O.C.H.L.B., O’Gorman, D.O.B.B., Paul, J.P.P.S.W., Cleary, W.C.W.G., Corpin, M.: Internet security threat report. Technical report 23, Symantec Corporation (2018)
Google Scholar
State of the Phish™ Report: Wombat security technologies (2018)
Google Scholar
Canfora, G., Medvet, E., Mercaldo, F., Visaggio, C.A.: Detection of malicious web pages using system calls sequences. In: Teufel, S., Min, T.A., You, I., Weippl, E. (eds.) CD-ARES 2014. LNCS, vol. 8708, pp. 226–238. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10975-6_17
Chapter Google Scholar
Chhabra, S., Aggarwal, A., Benevenuto, F., Kumaraguru, P.: Phi.sh/$oCiaL: the phishing landscape through short urls. In: Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, pp. 92–101. ACM (2011)
Google Scholar
Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious JavaScript code. In: Proceedings of the 19th International Conference on World Wide Web, pp. 281–290. ACM (2010)
Google Scholar
Daigle, L.: WHOIS Protocol Specification, RFC 3912 (2004)
Google Scholar
Fahmy, H.M., Ghoneim, S.A.: PhishBlock: a hybrid anti-phishing tool. In: 2011 International Conference on Communications, Computing and Control Applications, pp. 1–5 (2011)
Google Scholar
Felegyhazi, M., Kreibich, C., Paxson, V.: On the potential of proactive domain blacklisting. LEET 10, 6-6 (2010)
Google Scholar
Gyawali, B., Solorio, T., Montes-y Gómez, M., Wardman, B., Warner, G.: Evaluating a semisupervised approach to phishing URL identification in a realistic scenario. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pp. 176–183. ACM (2011)
Google Scholar
Hong, J.: The state of phishing attacks. Commun. ACM 55(1), 74–81 (2012)
Article Google Scholar
Liang, B., Huang, J., Liu, F., Wang, D., Dong, D., Liang, Z.: Malicious web pages detection based on abnormal visibility recognition. In: 2009 International Conference on e-Business and Information System Security, pp. 1–5. IEEE (2009)
Google Scholar
Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theor. 37(1), 145–151 (1991)
Article MathSciNet Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Identifying suspicious urls: an application of large-scale online learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 681–688. ACM (2009)
Google Scholar
LLC OpenDNS: PhishTank: an anti-phishing site (2016). https://www.phishtank.com
Patil, D.R., Patil, J.: Survey on malicious web pages detection techniques. Int. J. u-and e-Serv. Sci. Technol. 8(5), 195–206 (2015)
Article MathSciNet Google Scholar
Rieck, K., Krueger, T., Dewald, A.: Cujo: efficient detection and prevention of drive-by-download attacks. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 31–39. ACM (2010)
Google Scholar
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning-based phishing detection from URLs. Exp. Syst. Appl. 117, 345–357 (2019)
Article Google Scholar
Sheng, S., Wardman, B., Warner, G., Cranor, L.F., Hong, J., Zhang, C.: An empirical analysis of phishing blacklists. In: 6th Conference on Email and Anti-Spam (CEAS), California, USA (2009)
Google Scholar
Shibahara, T., et al.: Malicious url sequence detection using event denoising convolutional neural network. In: 2017 IEEE International Conference on Communications, pp. 1–7 (2017)
Google Scholar
Tao, Y.: Suspicious URL and device detection by log mining. Ph.D. thesis, Applied Sciences: School of Computing Science (2014)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the FCT/MCTES (UNINOVA-CTS funding UID/EEA/00066/2019), UIDB/00066/2020 (CTS – Center of Technology and Systems), and the FCT/MCTES project CESME - collaborative and Evolvable Smart Manufacturing Ecosystem, funding PRDC/EEI-AUT/32410/2017.

Author information

Authors and Affiliations

Universidade Nova de Lisboa (UNL), CTS-UNINOVA, Lisbon, Portugal
Nastaran Farhadi Ghalati & José Barata
George Mason University, Fairfax, VA, USA
Nahid Farhady Ghalaty

Authors

Nastaran Farhadi Ghalati
View author publications
You can also search for this author in PubMed Google Scholar
Nahid Farhady Ghalaty
View author publications
You can also search for this author in PubMed Google Scholar
José Barata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nastaran Farhadi Ghalati .

Editor information

Editors and Affiliations

NOVA University of Lisbon, Monte Caparica, Portugal
Luis M. Camarinha-Matos
NOVA University of Lisbon, Monte Caparica, Portugal
Nastaran Farhadi
NOVA University of Lisbon, Monte Caparica, Portugal
Fábio Lopes
NOVA University of Lisbon, Monte Caparica, Portugal
Helena Pereira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghalati, N.F., Ghalaty, N.F., Barata, J. (2020). Towards the Detection of Malicious URL and Domain Names Using Machine Learning. In: Camarinha-Matos, L., Farhadi, N., Lopes, F., Pereira, H. (eds) Technological Innovation for Life Improvement. DoCEIS 2020. IFIP Advances in Information and Communication Technology, vol 577. Springer, Cham. https://doi.org/10.1007/978-3-030-45124-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-45124-0_10
Published: 29 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45123-3
Online ISBN: 978-3-030-45124-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)