Web Scanner Detection Based on Behavioral Differences

  • Jianming FuEmail author
  • Lin Li
  • Yingjun Wang
  • Jianwei Huang
  • Guojun Peng
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1095)


Web scanners will not only take up the bandwidth of the server, but also collect sensitive information of websites and probe vulnerabilities of the system, which seriously threaten the security of websites. Accurate detection of Web scanners can effectively mitigate this kind of thread. Existing scanner detection methods extract features from log and differentiate between scanners and legal users with machine learning. However, these methods are unable to block scanning due to lack of behavior information of clients. To solve this problem, a Web scanner detection method based on behavioral differences is proposed. It collects request information and behavior information of clients by three modules named Passive Detection, Active Injection and Active Detection. Then, six kinds of features including fingerprint of scanners and execution ability of JavaScript code are extracted to detect whether a client is a scanner. This method makes full use of the behavior characteristics of clients and the behavioral differences between scanners and legal users. The experimental results showed the method is efficient and fast in scanner detection.


Scanner detection Behavioral difference Online detection Browsing behavior 



We sincerely thank SociaSec anonymous reviewers for their valuable feedback. This research was supported in part by the National Natural Science Foundation of China (U1636107, 61373168).


  1. 1.
    Imperva. Bot traffic report 2016 [EB/OL] (2016).
  2. 2.
    Asselin, E., Aguilar-Melchor, C., Jakllari, G.: Anomaly detection for web server log reduction: a simple yet efficient crawling based approach. In: 2016 IEEE Conference on Communications and Network Security (CNS), pp. 586–590. IEEE (2016)Google Scholar
  3. 3.
    Stock, B., Pellegrino, G., Rossow, C., Johns, M., Backes, M.: Hey, you have a problem: on the feasibility of large-scale web vulnerability notification. In: 25th USENIX Security Symposium (USENIX Security 16), pp. 1015–1032 (2016)Google Scholar
  4. 4.
    Kals, S., Kirda, E., Kruegel, C., Jovanovic, N.: SecuBat: a web vulnerability scanner. In: Proceedings of the 15th International Conference on World Wide Web, pp. 247–256. ACM (2006)Google Scholar
  5. 5.
    Zhao, T., Yuliang, L., Liu, J.H., Sun, H., Shi, F.: Web vulnerability detection based on form crawler. Comput. Eng. 34(9), 186–188 (2008)Google Scholar
  6. 6.
    Akrout, R., Alata, E., Kaaniche, M., Nicomette, V.: An automated black box approach for web vulnerability identification and attack scenario generation. J. Braz. Comput. Soc. 20(1), 4 (2014)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Cetin, O., Ganan, C., Korczynski, M., van Eeten, M.: Make notifications great again: learning how to notify in the age of large-scale vulnerability scanning. In: Workshop on the Economy of Information Security (2017)Google Scholar
  8. 8.
    Stock, B., Pellegrino, G., Li, F., Backes, M., Rossow, C.: Didnt you hear me? Towards more successful web vulnerability notifications (2018)Google Scholar
  9. 9.
    Geens, N., Huysmans, J., Vanthienen, J.: Evaluation of web robot discovery techniques: a benchmarking study. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 121–130. Springer, Heidelberg (2006). Scholar
  10. 10.
    Tan, P.N., Kumar, V.: Discovery of web robot sessions based on their navigational patterns. In: Zhong, N., Liu, J. (eds.) Intelligent Technologies for Information Analysis, pp. 193–222. Springer, Heidelberg (2004). Scholar
  11. 11.
    Bomhardt, C., Gaul, W., Schmidt-Thieme, L.: Web robot detection - preprocessing web logfiles for robot detection. In: Bock, H.H., et al. (eds.) New Developments in Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 113–124. Springer, Heidelberg (2005). Scholar
  12. 12.
    Stassopoulou, A., Dikaiakos, M.D.: A probabilistic reasoning approach for discovering web crawler sessions. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM -2007. LNCS, vol. 4505, pp. 265–272. Springer, Heidelberg (2007). Scholar
  13. 13.
    Lu, W.-Z., Yu, S.-Z.: Web robot detection based on hidden Markov model. In: 2006 International Conference on Communications, Circuits and Systems, vol. 3, pp. 1806–1810. IEEE (2006)Google Scholar
  14. 14.
    Huntington, P., Nicholas, D., Jamali, H.R.: Web robot detection in the scholarly information environment. J. Inf. Sci. 34(5), 726–741 (2008)CrossRefGoogle Scholar
  15. 15.
    Seay. Waf realized scanner recognition, completely resisted hacker scanning [EB/OL] (2016).
  16. 16.
    Liu, X., Fang, Y., Huang, C., Liu, L.: Research of identifying web vulnerability scanner based on finite state machine. J. Inf. Secur. Res. 3(2), 123–128 (2017)Google Scholar
  17. 17.
    Jacob, G., Kirda, E., Kruegel, C., Vigna, G.: \(\{\)PUBCRAWL\(\}\): protecting users and businesses from crawlers. In: Presented as part of the 21st USENIX Security Symposium (USENIX Security 12), pp. 507–522 (2012)Google Scholar
  18. 18.
    SEO optimization. Yujian [EB/OL] (2019).
  19. 19.
    Netsparker Web Application Security Scanner. Sqlmap [EB/OL] (2019).
  20. 20.
    Wpscanteam. Wpscan [EB/OL] (2019).
  21. 21.
    Espreto. Wpsploit [EB/OL] (2019).
  22. 22.
    OWASP Project. Dirbrute [EB/OL] (2019).
  23. 23.
    Xmendez. Wfuzz [EB/OL] (2019).
  24. 24.
    Yu, J.X., Ou, Y., Zhang, C., Zhang, S.: Identifying interesting visitors through web log classification. IEEE Intell. Syst. 20(3), 55–59 (2005)CrossRefGoogle Scholar
  25. 25.
    Stevanovic, D., Vlajic, N., An, A.: Unsupervised clustering of web sessions to detect malicious and non-malicious website users. Procedia Comput. Sci. 5, 123–131 (2011)CrossRefGoogle Scholar
  26. 26.
    Doran, D., Gokhale, S.S.: An integrated method for real time and offline web robot detection. Expert Syst. 33(6), 592–606 (2016)CrossRefGoogle Scholar
  27. 27.
    OpenResty. Openresty - official site [EB/OL] (2017).
  28. 28.
    Fuyun. Safedog [EB/OL] (2018).
  29. 29.
    Trustwave. Modsecurity [EB/OL] (2019).
  30. 30.
    Liang, S., Li, M., Liang, J., Chen, Z.: An experimental study of response times of web applications. J. Comput. Res. Dev. 40(7), 1076–1080 (2003)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Jianming Fu
    • 1
    Email author
  • Lin Li
    • 1
  • Yingjun Wang
    • 1
  • Jianwei Huang
    • 1
  • Guojun Peng
    • 1
  1. 1.Key Laboratory of Aerospace Information Security and Trusted Computing, School of Cyber Science EngineeringWuhan UniversityWuhanChina

Personalised recommendations