Abstract
Web Mining is the application of data mining to discover useful knowledge from the Web. Web mining focuses now on four main research directions related to the categories of Web data: Web content mining, Web usage mining, Web structure mining, and Web user profile mining. Web content mining discovers what Web pages are about and reveals new knowledge from them. Web usage mining concerns the identification of patterns in user navigation through Web pages and is performed for the reasons of service personalization, system improvement, and usage characterization. Web structure mining investigates how the Web documents are structured, and discovers the model underlying the link structures of WWW. Web user profile mining discovers user’s profiles based on users’ behavior on the Web. We present the application of data mining in Web performance analysis. We call our approach Web performance mining (WPM). It has been defined to characterize the performance from the perspective of Web clients in the sense of the data transfer throughput in Web transactions. WPM adds a new dimension in Web mining research that focuses on using data mining techniques to analyze Web performance measurements to find interesting patterns in order to support decision-making in the use of Web, for example, to predict future state of good or poor performance in the access to particular Web servers. WPM is based on the measurements which are planned and performed using specific measurement tools and platforms. We developed the multi-agent distributed system MWING to support required active measurements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abusina, Z.U.M., Zabir, S.M.S., Asir, A., Chakraborty, D., Suganuma, T., Shiratori, N.: An engineering approach to dynamic prediction of network performance from application logs. Int. Network Mgmt. 15, 151–162 (2005)
Arlit, M., Krishnamurthy, B., Mogul, J.C.: Predicting short-transfer latency from TCP arcane: A trace-based validation. In: IMC 2005: Proceedings of International Measurement Conference, pp. 119-124. USENIX Association (2005)
Baker, M., Buyya, R., Laforenza, D.: Grids and grid technologies for wide-area distributed computing. Softw. Pract. Exper. 32(15), 1437–1466 (2002)
Borzemski, L.: Data mining in evaluation of internet path performance. In: Orchard, B., Yang, C., Ali, M. (eds.) IEA/AIE 2004. LNCS, vol. 3029, pp. 643–652. Springer, Heidelberg (2004)
Borzemski, L.: Testing, measuring and diagnosing Web sites from the user’s perspective. International Journal of Enterprise Information Systems 2(1), 54–66 (2006)
Borzemski, L.: The use of data mining to predict Web performance. Cybernetics and Systems 37(6), 587–608 (2006)
Borzemski, L.: Internet path behavior prediction via data mining: Conceptual framework and case study. J. UCS 13(2), 287–316 (2007)
Borzemski, L., Cichocki, Ł., Fraś, M., Kliber, M., Nowak, Z.: MWING: A multiagent system for web site measurements. In: Nguyen, N.T., Grzech, A., Howlett, R.J., Jain, L.C. (eds.) KES-AMSTA 2007. LNCS, vol. 4496, pp. 278–287. Springer, Heidelberg (2007)
Borzemski, L., Druszcz, A.: Lessons from the application of domain-independent data mining system for discovering web user access patterns. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds.) KES 2006. LNCS, vol. 4253, pp. 789–796. Springer, Heidelberg (2006)
Borzemski, L., Kliber, M., Nowak, Z.: Application of data mining algorithms to TCP throughput prediction in HTTP transactions. In: Nguyen, N.T., Borzemski, L., Grzech, A., Ali, M. (eds.) IEA/AIE 2008. LNCS, vol. 5027, pp. 159–168. Springer, Heidelberg (2008)
Borzemski, L., Nowak, Z.: WING: A web probing, visualization, and performance analysis service. In: Koch, N., Fraternali, P., Wirsing, M. (eds.) ICWE 2004. LNCS, vol. 3140, pp. 601–602. Springer, Heidelberg (2004)
Borzemski, L., Nowak, Z.: An empirical study of Web quality: Measuring the Web from the Wroclaw University of Technology campus. In: Engineering Advanced Web Applications, pp. 307–320. Rinton Publishers (2004)
Borzemski, L., Nowak, Z.: Using Autonomous System topological information in Web server performance prediction. Cybernetics and Systems 39(7), 1–17 (2008)
Borzemski, L., Zatwarnicki, K.: Using adaptive fuzzy-neural control to minimize response time in cluster-based web systems. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS, vol. 3528, pp. 63–68. Springer, Heidelberg (2005)
Borzemski, L., Zatwarnicki, K., Zatwarnicka, A.: Adaptive and intelligent request distribution for Content Delivery Networks. Cybernetics and Systems 38(8), 837–857 (2007)
Bose, I., Mahapatra, R.K.: Business data mining – a machine learning perspective. Information & Management 39, 211–225 (2001)
Brownlee, N., Claffy, K.C., Murray, M., Nemeth, E.: Methodology for passive analysis of a university Internet link. In: PAM 2001: Passive and Active Measurement Workshop, Amsterdam, April 22-24, 2001 (accessed May 21, 2008), http://www.ripe.net/pam2001/program.html
Brownlee, N., Claffy, K.C.: Internet measurement. IEEE Internet Computing 8(5), 30–33 (2004)
CAIDA. The Cooperative Association for Internet Data Analysis. (accessed April 21, 2008), http://www.caida.org
Cardellini, V., Casalicchio, E., Colajanni, M., Yu, P.S.: The state of the art in locally distributed Web-server systems. ACM Computing Surveys 34(2), 263–311 (2002)
Casalicchio, E., Colajanni, M.: A client-aware dispatching algorithm for Web clusters providing multiple services. In: WWW10, Proceedings of the Tenth International World Wide Web Conference, pp. 535–544 (2001)
Cell Broadband Engine (accessed April 21, 2008), http://www.ibm.com/developerworks/power/cell/
Chakrabarti, S.: Mining the Web: Analysis of Hypertext and Semi Structured Data. Morgan Kaufmann, San Francisco (2003)
Chen, M.-S., Han, J., Yu, P.S.: Data mining: An overview from a database perspective. IEEE Trans. Knowledge and Data Engineering 8(6), 866–883 (1996)
Cho, Y.H., Kim, J.K., Kim, S.H.: A personalized recommender system based on Web usage mining and decision tree induction. Expert Systems with Applications 23, 329–342 (2002)
Comer, D.: Computer Networks and Internets, 5th edn. Prentice Hall, Upper Saddle River (2008)
Daniel, E., Ward, J.: Enterprise portals: Addressing the organisational and individual perspectives of information systems. In: 13th European Conference on Information Systems. Regensburg, Germany, May 26-28 (2005)
Dimitropoulos, X., Krioukov, D., Riley, G., Claffy, K.C.: Revealing the autonomous system taxonomy: The machine learning approach. In: Passive and Active Measurement (PAM) Workshop. The Computing Research Repository (CoRR), April, abs/cs/0604017, 2006 (accessed April 21, 2008)
Duffield, N.: Sampling for passive Internet measurement: A review. Statist. Sci 19(3), 472–498 (2004)
Facca, F., Lanzi, P.: Mining interesting knowledge from weblogs: A survey. Data & Knowledge Engineering 53, 225–241 (2005)
Fürnkranz, J.: Web mining. In: Data Mining and Knowledge Discovery Handbook, pp. 899–920. Springer, Berlin (2005)
He, Q., Dovrolis, C., Ammar, M.: On the predictability of large transfer TCP throughput. In: Proceedings SIGCOMM 2005, pp. 145–156. ACM Press, New York (2005)
IBM DB2 Intelligent Miner (accessed April 21, 2008), http://www.ibm.com/
Johnson, K.L., Carr, J.F., Day, M.S., Kaashoek, M.F.: The measured performance of content distribution networks. Computer Communications 24, 202–206 (2001)
Kim, J.K., Cho, Y.H., Kim, W., Kim, J.R., Suh, J.H.: A personalized recommendation procedure for Internet shopping support. Electronic Commerce Research and Applications 1, 301–313 (2002)
Keynote service (accessed April 21, 2008), http://www.keynote.com
MOME: MOnitoring and MEsurement project (accessed April 21, 2008), http://www.ist-mome.org/
NGG - Future for European Grids: Grids and Service Oriented Knowledge Utilities - Vision and Research Directions 2010 and Beyond. The 3rd report of the NGG Expert Group (accessed April 21, 2008), ftp://ftp.cordis.europa.eu/pub/ist/docs/grids/ngg3-report_en.pdf
Pandey, S.K., Mishra, R.B.: Intelligent Web mining model to enhance knowledge discovery on the Web. In: Proceedings of the 7th Parallel and Distributed Computing, Applications and Technologies International Conference, pp. 339 – 343 (2006)
Pechenizkiy, M., Tsymbal, A., Puuronen, S.: Knowledge management challenges in knowledge discovery systems. In: DEXA 2005: Proceedings of the 16th International Workshop on Database and Expert Systems Applications, pp. 433–437 (2005)
Pednault, E.: Transform regression and the Kolmogorov superposition theorem. In: Proceedings of the Sixth SIAM International Conference on Data Mining, pp. 35–46 (2006)
Prasad, R.S., Murray, M., Dovrolis, C., Claffy, K.C.: Bandwidth estimation: Metrics, measurement techniques, and tools. IEEE Network 17(6), 27–35 (2003)
RFC1945. Hypertext Transfer Protocol - HTTP/1.0. Request for Comments: 1945 (accessed April 21, 2008), http://www.ietf.org/rfc/rfc1945.txt
Roughan, M.: A comparison of Poisson and uniform sampling for active measurements. IEEE J. Sel. Areas in Commm 24(12), 2299–2312 (2006)
Swany, M., Wolski, R.: Multivariate resource performance forecasting in the network weather service. In: Proceedings of the IEEE/ACM SC2002 Conference, pp. 1–10 (2002)
W3C Extended Log File Format (accessed April 21, 2008), http://www.w3.org/TR/WD-logfile
Wang, X., Abraham, A., Smith, K.A.: Intelligent web traffic mining and analysis. J. of Network and Computer Applications 28, 147–165 (2005)
Watson, D., Malan, G.R., Jahanian, F.: An extensible probe architecture for network protocol performance measurement. Softw. Pract. Exper. 34, 47–67 (2004)
Wolski, R.: Dynamically forecasting network performance using the network weather service. Cluster Computing 1(1), 119–132 (1998)
Yousaf, M., Welzl, M.: A reliable network measurement and prediction architecture for grid scheduling. In: 1st IEEE/IFIP International Workshop on Autonomic Grid Networking and Management AGNM 2005, Barcelona 28th (October 2005)
Zhang, Y., Duffield, N., Paxson, V., Shenker, S.: On the constancy of Internet path properties. In: Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, pp. 197–211 (2001)
Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Applied Artificial Intelligence 17, 375–381 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Borzemski, L. (2009). Towards Web Performance Mining. In: Ting, IH., Wu, HJ. (eds) Web Mining Applications in E-commerce and E-services. Studies in Computational Intelligence, vol 172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88081-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-88081-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88080-6
Online ISBN: 978-3-540-88081-3
eBook Packages: EngineeringEngineering (R0)