Machine Learning Based Phishing Web Sites Detection

Nguyen, Huu Hieu; Nguyen, Duc Thai

doi:10.1007/978-3-319-27247-4_11

Huu Hieu Nguyen⁶ &
Duc Thai Nguyen⁶

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 371))

2065 Accesses
10 Citations

Abstract

Phishing is a major problem that involves web sites and fraudulent emails that aim to reveal users important information such as financial data, emails, and other private information. Phishing activities have been in the increasing trend, and many unsuspecting users have fallen victims of these websites and fraudulent emails. This paper has analyzed the evaluation and design of the features used to detect and reduce any false activity. The selected features not only depend on the characteristics of the URL (Uniform Resource Locator), but also on the website content. The TF-IDF algorithm is used to calculate the top keywords of the website content that is used to extract one of the important features. The technique was evaluated on the dataset of 4.420 legitimate URLs and 5.389 phishing URLs. By considering features and evaluating using 5 classification algorithms, the resulting classifiers obtain 98.8 % accuracy on detecting phishing website URLs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Basnet R, Mukkamala S, Sung AH (2008) Detection of phishing attacks: a machine learning approach. Springer, Berlin, pp 373–383
Google Scholar
Geng GG, Lee XD, Zhang YM (2015) Combating phishing attacks via brand identity and authorization features. Secur Commun Netw 8(6):888–898
Article Google Scholar
Chaudhary S, Berki E, Li L, Valtanen J (2012) Time up for phishing with effective anti-phishing research strategies. Int J Hum Cap Inf Technol Prof (IJHCITP) 49–64 (2015)
Google Scholar
Ray LL (2015) Countering cross-site scripting in web-based applications. Int J Strateg Inf Technol Appl (IJSITA) 6(1):57–68
Article Google Scholar
Sheng S, Wardman B, Warner G, Cranor LF, Hong J, Zhang C (2009) An empirical analysis of phishing blacklists
Google Scholar
Goodin D (2012) Google bots detect 9,500 new malicious websites every day
Google Scholar
Joshi Y, Das D, Saha S (2009) Mitigating man in the middle attack over secure sockets layer. In: IEEE international conference on Internet Multimedia Services Architecture and Applications (IMSAA), 2009. IEEE
Google Scholar
Dudhe MPD, Ramteke PL (2015) A review on phishing detection approaches
Google Scholar
Kirda E, Kruegel C (2005) Protecting users against phishing attacks with antiphish. In: 29th annual international computer software and applications conference, 2005. COMPSAC 2005, vol 1. IEEE
Google Scholar
Likarish P, Jung EE, Dunbar D, Hansen TE, Hourcade JP (2008) B-apt: bayesian anti-phishing toolbar. IEEE Int Conf Commun 1745–1749
Google Scholar
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: The 16th international conference on World Wide Web, p 639648
Google Scholar
Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans Inf Syst Secur 14(2):128
Article Google Scholar
Salton G, McGill MJ (1986) Introduction to modern information retrieval. Facet Publishing, London
Google Scholar
5000 best websites. http://5000best.com/websites
PhishTank—Suspected phish submissions. https://www.phishtank.com
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Murph KP (2006) Naive bayes classifiers. University of British Columbia, Vancouver
Google Scholar
Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 443–458
Google Scholar
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 217–222
Google Scholar
Ripley BD (1994) Neural networks and related methods for classification. J Roy Stat Soc 409–456
Google Scholar

Download references

Author information

Authors and Affiliations

Ho Chi Minh City University of Technology, 268 Ly Thuong Kiet Street, District 10, Ho Chi Minh City, Vietnam
Huu Hieu Nguyen & Duc Thai Nguyen

Authors

Huu Hieu Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Duc Thai Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huu Hieu Nguyen .

Editor information

Editors and Affiliations

Faculty of Electrical and Electronics En, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Vo Hoang Duy
MERLIN, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Tran Trong Dao
Electrical Engineering and Computer Scie, VŠB—Technical University, Ostrava-Poruba, Czech Republic
Ivan Zelinka
Division of Mechanical, Korea Maritime and Ocean University, Busan, Korea (Republic of)
Hyeung-Sik Choi
Université de Picardie Jules Verne, Laboratoire MIS, Amiens Cedex 1, France
Mohammed Chadli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, H.H., Nguyen, D.T. (2016). Machine Learning Based Phishing Web Sites Detection. In: Duy, V., Dao, T., Zelinka, I., Choi, HS., Chadli, M. (eds) AETA 2015: Recent Advances in Electrical Engineering and Related Sciences. Lecture Notes in Electrical Engineering, vol 371. Springer, Cham. https://doi.org/10.1007/978-3-319-27247-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-27247-4_11
Published: 10 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27245-0
Online ISBN: 978-3-319-27247-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics