Skip to main content

Machine Learning Based Phishing Web Sites Detection

  • Conference paper
  • First Online:
AETA 2015: Recent Advances in Electrical Engineering and Related Sciences

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 371))

Abstract

Phishing is a major problem that involves web sites and fraudulent emails that aim to reveal users important information such as financial data, emails, and other private information. Phishing activities have been in the increasing trend, and many unsuspecting users have fallen victims of these websites and fraudulent emails. This paper has analyzed the evaluation and design of the features used to detect and reduce any false activity. The selected features not only depend on the characteristics of the URL (Uniform Resource Locator), but also on the website content. The TF-IDF algorithm is used to calculate the top keywords of the website content that is used to extract one of the important features. The technique was evaluated on the dataset of 4.420 legitimate URLs and 5.389 phishing URLs. By considering features and evaluating using 5 classification algorithms, the resulting classifiers obtain 98.8 % accuracy on detecting phishing website URLs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Basnet R, Mukkamala S, Sung AH (2008) Detection of phishing attacks: a machine learning approach. Springer, Berlin, pp 373–383

    Google Scholar 

  2. Geng GG, Lee XD, Zhang YM (2015) Combating phishing attacks via brand identity and authorization features. Secur Commun Netw 8(6):888–898

    Article  Google Scholar 

  3. Chaudhary S, Berki E, Li L, Valtanen J (2012) Time up for phishing with effective anti-phishing research strategies. Int J Hum Cap Inf Technol Prof (IJHCITP) 49–64 (2015)

    Google Scholar 

  4. Ray LL (2015) Countering cross-site scripting in web-based applications. Int J Strateg Inf Technol Appl (IJSITA) 6(1):57–68

    Article  Google Scholar 

  5. Sheng S, Wardman B, Warner G, Cranor LF, Hong J, Zhang C (2009) An empirical analysis of phishing blacklists

    Google Scholar 

  6. Goodin D (2012) Google bots detect 9,500 new malicious websites every day

    Google Scholar 

  7. Joshi Y, Das D, Saha S (2009) Mitigating man in the middle attack over secure sockets layer. In: IEEE international conference on Internet Multimedia Services Architecture and Applications (IMSAA), 2009. IEEE

    Google Scholar 

  8. Dudhe MPD, Ramteke PL (2015) A review on phishing detection approaches

    Google Scholar 

  9. Kirda E, Kruegel C (2005) Protecting users against phishing attacks with antiphish. In: 29th annual international computer software and applications conference, 2005. COMPSAC 2005, vol 1. IEEE

    Google Scholar 

  10. Likarish P, Jung EE, Dunbar D, Hansen TE, Hourcade JP (2008) B-apt: bayesian anti-phishing toolbar. IEEE Int Conf Commun 1745–1749

    Google Scholar 

  11. Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: The 16th international conference on World Wide Web, p 639648

    Google Scholar 

  12. Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans Inf Syst Secur 14(2):128

    Article  Google Scholar 

  13. Salton G, McGill MJ (1986) Introduction to modern information retrieval. Facet Publishing, London

    Google Scholar 

  14. 5000 best websites. http://5000best.com/websites

  15. PhishTank—Suspected phish submissions. https://www.phishtank.com

  16. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

  17. Murph KP (2006) Naive bayes classifiers. University of British Columbia, Vancouver

    Google Scholar 

  18. Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 443–458

    Google Scholar 

  19. Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 217–222

    Google Scholar 

  20. Ripley BD (1994) Neural networks and related methods for classification. J Roy Stat Soc 409–456

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huu Hieu Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, H.H., Nguyen, D.T. (2016). Machine Learning Based Phishing Web Sites Detection. In: Duy, V., Dao, T., Zelinka, I., Choi, HS., Chadli, M. (eds) AETA 2015: Recent Advances in Electrical Engineering and Related Sciences. Lecture Notes in Electrical Engineering, vol 371. Springer, Cham. https://doi.org/10.1007/978-3-319-27247-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27247-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27245-0

  • Online ISBN: 978-3-319-27247-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics