Abstract
Phishing is an illegitimate method to collect secret information of any person or organization. Information like debit card, credit card details, PIN no, OTP, passwords, etc. are stolen by the attackers through phishing sites. Researchers have used different techniques to detect those phishing sites. But it is difficult to stay on a particular technique as attackers come with new tactics. In this paper, phishing and legitimate URL classifications are performed based on the lexical features of URLs. Feature selection technique is used to select the relevant features only. Accuracy for all combination of features with different numbers of features each time was evaluated to find the best possible combination of features. Performances are analyzed for different datasets with various parameters using four different machine learning techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2013). https://doi.org/10.1007/s00521-013-1490-z
Phishing Activity Trends Reports. https://www.antiphishing.org/trendsreports/. Accessed 13 Nov 2019
Overview Safe Browsing APIs (v4) Google Developers. https://developers.google.com/safe-browsing/v4. Accessed 18 Dec 2019
Gupta, B.B., Arachchilage, N.A.G., Psannis, K.E.: Defending against phishing attacks: taxonomy of methods, current issues and future directions. Telecommunication Systems 67(2), 247–267 (2017). https://doi.org/10.1007/s11235-017-0334-z
Prakash, P., Kumar, M., Kompella, R.R., Gupta, M.: PhishNet: predictive blacklisting to detect phishing attacks. In: 2010 Proceedings IEEE INFOCOM (2010). https://doi.org/10.1109/infcom.2010.5462216
Han, W., Cao, Y., Bertino, E., Yong, J.: Using automated individual white-list to protect web digital identities. Expert Syst. Appl. 39, 11861–11869 (2012). https://doi.org/10.1016/j.eswa.2012.02.020
Jain, A.K., Gupta, B.B.: A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP J. Inf. Secur. 2016(1), 1–11 (2016). https://doi.org/10.1186/s13635-016-0034-3
Jain, A.K., Gupta, B.B.: Phishing detection: analysis of visual similarity based approaches. Secur. Commun. Netw. 2017, 1–20 (2017). https://doi.org/10.1155/2017/5421046
Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL detection using Machine Learning: a survey. arXiv:1701.07179v2 (2017)
Vanhoenshoven, F., Napoles, G., Falcon, R., Vanhoof, K., Koppen, M.: Detecting malicious URLs using machine learning techniques. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI) (2016). https://doi.org/10.1109/ssci.2016.7850079
Mamun, M.S.I., Rathore, M.A., Lashkari, A.H., Stakhanova, N., Ghorbani, A.A.: Detecting malicious URLs using lexical analysis. In: Chen, J., Piuri, V., Su, C., Yung, M. (eds.) NSS 2016. LNCS, vol. 9955, pp. 467–482. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46298-1_30
Phishtank-Join the fight against phishing. https://www.phishtank.com/. Accessed 09 Nov 2019
DMOZ URL gr33ndata: gr33ndata/dmoz-urlclassifier. https://github.com/gr33ndata/dmoz-urlclassifier/. Accessed 27 Oct 2019
Banik, B., Sarma, A.: Phishing URL detection system based on URL features using SVM. Int. J. Electron. Appl. Res. 5, 40–55 (2018). https://doi.org/10.33665/ijear.2018.v05i02.003
Chiew, K.L., Chang, E.H., Tan, C.L., Abdullah, J., Yong, K.S.C.: Building standard offline anti-phishing dataset for Benchmarking, International Journal of Engineering & Technology, vol. 7, no. 4.31, pp. 7–14, (2018). https://doi.org/10.14419/ijet.v7i4.31.23333
Althobaiti, K., Rummani, G., Vaniea, K.: A review of human- and computer-facing URL phishing features. In: 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) (2019). https://doi.org/10.1109/eurospw.2019.00027
Brownlee, J.: How to Choose a Feature Selection Method For Machine Learning. https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/. Accessed 05 Jan 2020
Asaithambi, S.: Why, How and When to apply Feature Selection. https://towardsdatascience.com/why-how-and-when-to-apply-feature-selection-e9c69adfabf2. Accessed 05 Jan 2020
Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence (1995). https://doi.org/10.1109/tai.1995.479783
Meesad, P., Boonrawd, P., Nuipian, V.: A Chi-Square-test for word importance differentiation in text classification. In: International Conference on Information and Electronics Engineering (2011)
Hutchinson, S., Zhang, Z., Liu, Q.: Detecting phishing websites with random forest. In: Meng, L., Zhang, Y. (eds.) MLICOM 2018. LNICSSITE, vol. 251, pp. 470–479. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00557-3_46
Sonowal, G., Kuppusamy, K.: PhiDMA – a phishing detection model with multi-filter approach. J. King Saud Univ. Comput. Inf. Sci. 32, 99–112 (2017). https://doi.org/10.1016/j.jksuci.2017.07.005
Mao, J., et al.: Phishing page detection via learning classifiers from page layout feature. EURASIP J. Wirel. Commun. Network. 2019(1), 1–14 (2019). https://doi.org/10.1186/s13638-019-1361-0
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Banik, B., Sarma, A. (2020). Lexical Feature Based Feature Selection and Phishing URL Classification Using Machine Learning Techniques. In: Bhattacharjee, A., Borgohain, S., Soni, B., Verma, G., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. https://doi.org/10.1007/978-981-15-6318-8_9
Download citation
DOI: https://doi.org/10.1007/978-981-15-6318-8_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6317-1
Online ISBN: 978-981-15-6318-8
eBook Packages: Computer ScienceComputer Science (R0)