An Approach Based on Contrast Patterns for Bot Detection on Web Log Files
Nowadays, companies invest resources in detecting non-human accesses on their web traffics. Usually, non-human accesses are a few compared with the human accesses, which is considered as a class imbalance problem, and as a consequence, classifiers bias their classification results toward the human accesses obviating, in this way, the non-human accesses. In some classification problems, such as the non-human traffic detection, high accuracy is not only the desired quality, the model provided by the classifier should be understood by experts. For that, in this paper, we study the use of contrast pattern-based classifiers for building an understandable and accurate model for detecting non-human traffic on web log files. Our experiments over five databases show that the contrast pattern-based approach obtains significantly better AUC results than other state-of-the-art classifiers.
KeywordsBot detection Contrast pattern Supervised classification
This research was partly supported by Google incorporation under the APRU project “AI for Everyone”. Authors are thankful to Robinson Mas del Risco and Fernando Gómez Herrera for providing bot software, and for helping on bot execution throughout our experimentations, respectively.
- 1.Dong, G.: Preliminaries. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications. Data Mining and Knowledge Discovery Series, chap. 1, pp. 3–12. Chapman & Hall/CRC (2012)Google Scholar
- 2.Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1999, pp. 43–52. ACM, New York (1999)Google Scholar
- 10.Iqbal, M.S., Zulkernine, M., Jaafar, F., Gu, Y.: FCFraud: fighting click-fraud from the user side. In: 17th International Symposium on High Assurance Systems Engineering (HASE), pp. 157–164, January 2016Google Scholar
- 11.Knobbe, A., Crémilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: the LeGo approach to data mining. In: International Workshop from Local Patterns to Global Models (ECML 2008), pp. 1–16. LeGo (2008)Google Scholar
- 15.Perera, K.S., Neupane, B., Faisal, M.A., Aung, Z., Woon, W.L.: A novel ensemble learning-based approach for click fraud detection in mobile advertising. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS (LNAI), vol. 8284, pp. 370–382. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03844-5_38CrossRefGoogle Scholar
- 16.Soldo, F., Metwally, A.: Traffic anomaly detection based on the IP size distribution. In: International Conference on Computer Communications, pp. 2005–2013 (2012)Google Scholar
- 17.Taneja, M., Garg, K., Purwar, A., Sharma, S.: Prediction of click frauds in mobile advertising. In: Eighth International Conference on Contemporary Computing (IC3), pp. 162–166 (2015). https://doi.org/10.1109/IC3.2015.7346672
- 18.Zhang, X., Dong, G.: Overview and analysis of contrast pattern based classification. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications. Data Mining and Knowledge Discovery Series, chap. 11, pp. 151–170. Chapman & Hall/CRC (2012)Google Scholar