Detecting and Characterizing Web Bot Traffic in a Large E-commerce Marketplace

  • Haitao XuEmail author
  • Zhao Li
  • Chen Chu
  • Yuanmi Chen
  • Yifan Yang
  • Haifeng Lu
  • Haining Wang
  • Angelos Stavrou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11099)


A certain amount of web traffic is attributed to web bots on the Internet. Web bot traffic has raised serious concerns among website operators, because they usually consume considerable resources at web servers, resulting in high workloads and longer response time, while not bringing in any profit. Even worse, the content of the pages it crawled might later be used for other fraudulent activities. Thus, it is important to detect web bot traffic and characterize it. In this paper, we first propose an efficient approach to detect web bot traffic in a large e-commerce marketplace and then perform an in-depth analysis on the characteristics of web bot traffic. Specifically, our proposed bot detection approach consists of the following modules: (1) an Expectation Maximization (EM)-based feature selection method to extract the most distinguishable features, (2) a gradient based decision tree to calculate the likelihood of being a bot IP, and (3) a threshold estimation mechanism aiming to recover a reasonable amount of non-bot traffic flow. The detection approach has been applied on Taobao/Tmall platforms, and its detection capability has been demonstrated by identifying a considerable amount of web bot traffic. Based on data samples of traffic originating from web bots and normal users, we conduct a comparative analysis to uncover the behavioral patterns of web bots different from normal users. The analysis results reveal their differences in terms of active time, search queries, item and store preferences, and many other aspects. These findings provide new insights for public websites to further improve web bot traffic detection for protecting valuable web contents.



We would like to thank the anonymous reviewers for their valuable feedback. This work was partially supported by the U.S. NSF grant CNS-1618117 and DARPA XD3 Project HR0011-16-C-0055.


  1. 1.
    ‘Good’ bots are going too far.
  2. 2.
    Distil Networks: The 2018 Bad Bot Report.
  3. 3.
    Distil Networks: The 2016 Bad Bot Report.
  4. 4.
    Distil Networks: The 2015 Bad Bot Report.
  5. 5.
    Alibaba Group Quarterly Report.
  6. 6.
    Taobao Users’ Shopping Habits in 24 Hours.
  7. 7.
    Taobao Online Shoppers Behavior.
  8. 8.
    Sellers’ Reputation Grade on Alibaba.
  9. 9.
    Alibaba Group’s September Quarter 2015 Results.
  10. 10.
    Ihm, S., Pai, V.S.: Towards understanding modern web traffic. In: IMC (2011)Google Scholar
  11. 11.
    Weng, H., Li, Z., et al.: Online e-commerce fraud: a large-scale detection and analysis. In: ICDE (2018)Google Scholar
  12. 12.
    Su, N., Liu, Y., et al.: Detecting crowdturfing “add to favorites” activities in online shopping. In: WWW (2018)Google Scholar
  13. 13.
    Quinlan, J.R.: Generating production rules from decision trees. In: IJCAI (1987)Google Scholar
  14. 14.
    Meiss, M., Menczer, F., Vespignani, A.: On the lack of typical behavior in the global web traffic network. In: WWW (2005)Google Scholar
  15. 15.
    Lan, K., Hussain, A., Dutta, D.: Effect of malicious traffic on the network. In: PAM (2003)Google Scholar
  16. 16.
    Buehrer, G., Stokes, J.W., Chellapilla, K.: A large-scale study of automated web search traffic. In: AIRWeb (2008)Google Scholar
  17. 17.
    Adar, E., Teevan, J., Dumais, S.T.: Large scale analysis of web revisitation patterns. In: CHI (2008)Google Scholar
  18. 18.
    Goseva-Popstojanova, K., Anastasovski, G., Dimitrijevikj, A., Pantev, R., Miller, B.: Characterization and classification of malicious web traffic. Comput. Secur. 42, 92–115 (2014)CrossRefGoogle Scholar
  19. 19.
    Suchacka, G., Sobków, M.: Detection of Internet robots using a Bayesian approach. In: IEEE 2nd International Conference on Cybernetics (CYBCONF) (2015)Google Scholar
  20. 20.
    McKenna, S.F.: Detection and classification of Web robots with honeypots. Naval Postgraduate School (2016)Google Scholar
  21. 21.
    Rude, H.N.: Intelligent caching to mitigate the impact of web robots on web servers. Wright State University (2016)Google Scholar
  22. 22.
    Rude, H.N., Doran, D.: Request type prediction for web robot and internet of things traffic. In: ICMLA (2015)Google Scholar
  23. 23.
    Koehl, A., Wang, H.: Surviving a search engine overload. In: WWW (2012)Google Scholar
  24. 24.
    Gummadi, R., Balakrishnan, H., Maniatis, P., Ratnasamy, S.: Not-a-Bot: improving service availability in the face of botnet attacks. In: NSDI (2009)Google Scholar
  25. 25.
    Jamshed, M.A., Kim, W., Park, K.: Suppressing bot traffic with accurate human attestation. In: Proceedings of the First ACM Asia-Pacific Workshop on Workshop on Systems (2010)Google Scholar
  26. 26.
    Kang, H., Wang, K., Soukal, D., Behr, F., Zheng, Z.: Large-scale bot detection for search engines. In: WWW (2010)Google Scholar
  27. 27.
    Xu, H., Liu, D., Wang, H., Stavrou, A.: E-commerce reputation manipulation: the emergence of reputation-escalation-as-a-service. In: WWW (2015)Google Scholar
  28. 28.
    Kohavi, R., Parekh, R.: Ten supplementary analyses to improve e-commerce web sites. In: SIGKDD Workshop (2003)Google Scholar
  29. 29.
    Kolias, C., Kambourakis, G., Stavrou, A., Voas, J.: DDoS in the IoT: Mirai and other botnets. Computer 50(7), 80–84 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Haitao Xu
    • 1
    Email author
  • Zhao Li
    • 2
  • Chen Chu
    • 2
  • Yuanmi Chen
    • 2
  • Yifan Yang
    • 2
  • Haifeng Lu
    • 2
  • Haining Wang
    • 3
  • Angelos Stavrou
    • 4
  1. 1.Arizona State UniversityGlendaleUSA
  2. 2.Alibaba GroupHangzhouChina
  3. 3.University of DelawareNewarkUSA
  4. 4.George Mason UniversityFairfaxUSA

Personalised recommendations