Abstract
Web logs are an important source of information to describe and understand the traffic of the servers and its characteristics. The analysis of these logs is rather challenging because of the large volume of data and the complex relationships hidden in these data. Our investigation focuses on the analysis of the logs of two Web servers and identifies the main characteristics of their workload and the navigation profiles of crawlers and human users visiting the sites. The classification of these visitors has shown some interesting similarities and differences in term of traffic intensity and its temporal distribution. In general, crawlers tend to re-visit the sites rather often, even though they seldom send bursts of requests to reduce their impact on the servers resources. The other clients are also characterized by periodic patterns that can be effectively represented by few principal components.
Chapter PDF
References
Almeida, V., Menascé, D., Riedi, R., Peligrinelli, F., Fonseca, R., Meira Jr., W.: Analyzing Web robots and their impact on caching. In: Proc. of the Sixth Web Caching and Content Delivery Workshop (2001)
Arlitt, M.F., Williamson, C.L.: Web server workload characterization: the search for invariants. In: Proc. of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 126–137 (1996)
Crovella, M., Bestavros, A.: Self-similarity in World Wide Web traffic: evidence and possible causes. IEEE/ACM Trans. on Networking 5(6), 835–846 (1997)
Dikaiakos, M.D., Stassopoulou, A., Papageorgiou, L.: An investigation of web crawler behavior: characterization and metrics. Computer Communications 28(8), 880–897 (2005)
Doran, D., Gokhale, S.: Discovering new trends in web robot traffic through functional classification. In: Proc. of the International Symposium on Network Computing and Applications, pp. 275–278. IEEE Computer Society (2008)
Duskin, O., Feitelson, D.G.: Distinguishing humans from robots in web search logs: preliminary results using query rates and intervals. In: Proc. of the Workshop on Web Search Click Data, pp. 15–19. ACM (2009)
Hallam-Baker, P.M., Behlendorf, B.: Extended Log File Format. W3C Working Draft WD-logfile-960323 (1996)
Iyengar, A.K., Squillante, M.S., Zhang, L.: Analysis and characterization of large-scale Web server access patterns and performance. World Wide Web 2(1-2), 85–100 (1999)
Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Data Analysis, 6th edn. Pearson Prentice Hall (2007)
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, Heidelberg (2002)
Koster, M.: A method for Web Robots control. Network Working Group - Internet Draft (1996)
Lê, S., Josse, J., Husson, F.: FactoMineR: An R Package for Multivariate Analysis.. Journal of Statistical Software 25(1), 1–18 (2008)
Lee, J., Cha, S., Lee, D., Lee, H.: Classification of web robots: An empirical study based on over one billion requests. Computers & Security 28(8), 795–802 (2009)
Mahanti, A., Williamson, C., Wu, L.: Workload characterization of a large systems conference Web server. In: Proc. of the Seventh Annual Communication Networks and Services Research Conference, pp. 55–64. IEEE Computer Society (2009)
Menascé, D.A., Almeida, V.A.F., Riedi, R., Ribeiro, F., Fonseca, R., Meira Jr., W.: A hierarchical and multiscale approach to analyze E-business workloads. Performance Evaluation 54(1), 33–57 (2003)
Menascé, D.A., Almeida, V.: Capacity Planning for Web Services: metrics, models, and methods. Prentice Hall (2001)
Olston, C., Najork, M.: Web Crawling. Journal of Foundations and Trends in Information Retrieval 4(3), 175–246 (2010)
Park, K., Pai, V.S., Lee, K.-W., Calo, S.: Securing web service by automatic robot detection. In: Proc. of USENIX 2006, pp. 23–23. USENIX Association (2006)
Performance Evaluation Group Web site – University of Pavia: http://peg.unipv.it
Pitkow, J.E.: Summary of WWW characterizations. World Wide Web 2(1-2), 3–13 (1999)
SPEC Web site – European mirror: http://spec.unipv.it
Stassopoulou, A., Dikaiakos, M.D.: Web robot detection: A probabilistic reasoning approach. Computer Networks 53(3), 265–278 (2009)
Tan, P.N., Kumar, V.: Discovery of Web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery 6(1), 9–35 (2002)
Thelwall, M., Stuart, D.: Web crawling ethics revisited: Cost, privacy, and denial of service. Journal of the American Society for Information Science and Technology 57(13), 1771–1779 (2006)
Williams, A., Arlitt, M., Williamson, C., Barker, K.: Web workload characterization: Ten years later. In: Tang, X., Xu, J., Chanson, S.T. (eds.) Web Content Delivery. Web Information Systems Engineering and Internet Technologies, vol. 2, pp. 3–21. Springer, US (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 IFIP International Federation for Information Processing
About this chapter
Cite this chapter
Calzarossa, M.C., Massari, L. (2011). Analysis of Web Logs: Challenges and Findings. In: Hummel, K.A., Hlavacs, H., Gansterer, W. (eds) Performance Evaluation of Computer and Communication Systems. Milestones and Future Challenges. PERFORM 2010. Lecture Notes in Computer Science, vol 6821. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25575-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-25575-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25574-8
Online ISBN: 978-3-642-25575-5
eBook Packages: Computer ScienceComputer Science (R0)