Advertisement

An Approach Based on Contrast Patterns for Bot Detection on Web Log Files

  • Octavio Loyola-GonzálezEmail author
  • Raúl Monroy
  • Miguel Angel Medina-Pérez
  • Bárbara Cervantes
  • José Ernesto Grimaldo-Tijerina
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11288)

Abstract

Nowadays, companies invest resources in detecting non-human accesses on their web traffics. Usually, non-human accesses are a few compared with the human accesses, which is considered as a class imbalance problem, and as a consequence, classifiers bias their classification results toward the human accesses obviating, in this way, the non-human accesses. In some classification problems, such as the non-human traffic detection, high accuracy is not only the desired quality, the model provided by the classifier should be understood by experts. For that, in this paper, we study the use of contrast pattern-based classifiers for building an understandable and accurate model for detecting non-human traffic on web log files. Our experiments over five databases show that the contrast pattern-based approach obtains significantly better AUC results than other state-of-the-art classifiers.

Keywords

Bot detection Contrast pattern Supervised classification 

Notes

Acknowledgment

This research was partly supported by Google incorporation under the APRU project “AI for Everyone”. Authors are thankful to Robinson Mas del Risco and Fernando Gómez Herrera for providing bot software, and for helping on bot execution throughout our experimentations, respectively.

References

  1. 1.
    Dong, G.: Preliminaries. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications. Data Mining and Knowledge Discovery Series, chap. 1, pp. 3–12. Chapman & Hall/CRC (2012)Google Scholar
  2. 2.
    Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1999, pp. 43–52. ACM, New York (1999)Google Scholar
  3. 3.
    Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: classification by aggregating emerging patterns. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 30–42. Springer, Heidelberg (1999).  https://doi.org/10.1007/3-540-46846-3_4CrossRefGoogle Scholar
  4. 4.
    García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)CrossRefGoogle Scholar
  5. 5.
    García-Borroto, M., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: Finding the best diversity generation procedures for mining contrast patterns. Expert Syst. Appl. 42(11), 4859–4866 (2015)CrossRefGoogle Scholar
  6. 6.
    García-Borroto, M., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Medina-Pérez, M.A., Ruiz-Shulcloper, J.: LCMine: an efficient algorithm for mining discriminative regularities and its application in supervised classification. Pattern Recogn. 43(9), 3025–3034 (2010)CrossRefGoogle Scholar
  7. 7.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  8. 8.
    Hallam-Baker, P.M., Behlendorf, B.: W3C - Extended Log File Format. www.w3.org, https://www.w3.org/TR/WD-logfile.html
  9. 9.
    Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)CrossRefGoogle Scholar
  10. 10.
    Iqbal, M.S., Zulkernine, M., Jaafar, F., Gu, Y.: FCFraud: fighting click-fraud from the user side. In: 17th International Symposium on High Assurance Systems Engineering (HASE), pp. 157–164, January 2016Google Scholar
  11. 11.
    Knobbe, A., Crémilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: the LeGo approach to data mining. In: International Workshop from Local Patterns to Global Models (ECML 2008), pp. 1–16. LeGo (2008)Google Scholar
  12. 12.
    Loyola-González, O., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., García-Borroto, M.: Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases. Neurocomputing 175(Part B), 935–947 (2016)CrossRefGoogle Scholar
  13. 13.
    Loyola-González, O., Medina-Pérez, M.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Monroy, R., García-Borroto, M.: PBC4cip: a new contrast pattern-based classifier for class imbalance problems. Knowl.-Based Syst. 115, 100–109 (2017)CrossRefGoogle Scholar
  14. 14.
    Martens, D., Baesens, B., Gestel, T.V., Vanthienen, J.: Comprehensible credit scoring models using rule extraction from support vector machines. Eur. J. Oper. Res. 183(3), 1466–1476 (2007)CrossRefGoogle Scholar
  15. 15.
    Perera, K.S., Neupane, B., Faisal, M.A., Aung, Z., Woon, W.L.: A novel ensemble learning-based approach for click fraud detection in mobile advertising. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS (LNAI), vol. 8284, pp. 370–382. Springer, Cham (2013).  https://doi.org/10.1007/978-3-319-03844-5_38CrossRefGoogle Scholar
  16. 16.
    Soldo, F., Metwally, A.: Traffic anomaly detection based on the IP size distribution. In: International Conference on Computer Communications, pp. 2005–2013 (2012)Google Scholar
  17. 17.
    Taneja, M., Garg, K., Purwar, A., Sharma, S.: Prediction of click frauds in mobile advertising. In: Eighth International Conference on Contemporary Computing (IC3), pp. 162–166 (2015).  https://doi.org/10.1109/IC3.2015.7346672
  18. 18.
    Zhang, X., Dong, G.: Overview and analysis of contrast pattern based classification. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications. Data Mining and Knowledge Discovery Series, chap. 11, pp. 151–170. Chapman & Hall/CRC (2012)Google Scholar
  19. 19.
    Zhang, X., Dong, G., Ramamohanarao, K.: Information-based classification by aggregating emerging patterns. In: Leung, K.S., Chan, L.-W., Meng, H. (eds.) IDEAL 2000. LNCS, vol. 1983, pp. 48–53. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-44491-2_8CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Octavio Loyola-González
    • 1
    Email author
  • Raúl Monroy
    • 2
  • Miguel Angel Medina-Pérez
    • 2
  • Bárbara Cervantes
    • 2
  • José Ernesto Grimaldo-Tijerina
    • 3
  1. 1.School of Science and EngineeringTecnologico de MonterreyPueblaMexico
  2. 2.School of Science and EngineeringTecnologico de MonterreyAtizapánMexico
  3. 3.Network Information Center MexicoTecnologico de MonterreyMonterreyMexico

Personalised recommendations