Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 172))

Abstract

Web Mining is the application of data mining to discover useful knowledge from the Web. Web mining focuses now on four main research directions related to the categories of Web data: Web content mining, Web usage mining, Web structure mining, and Web user profile mining. Web content mining discovers what Web pages are about and reveals new knowledge from them. Web usage mining concerns the identification of patterns in user navigation through Web pages and is performed for the reasons of service personalization, system improvement, and usage characterization. Web structure mining investigates how the Web documents are structured, and discovers the model underlying the link structures of WWW. Web user profile mining discovers user’s profiles based on users’ behavior on the Web. We present the application of data mining in Web performance analysis. We call our approach Web performance mining (WPM). It has been defined to characterize the performance from the perspective of Web clients in the sense of the data transfer throughput in Web transactions. WPM adds a new dimension in Web mining research that focuses on using data mining techniques to analyze Web performance measurements to find interesting patterns in order to support decision-making in the use of Web, for example, to predict future state of good or poor performance in the access to particular Web servers. WPM is based on the measurements which are planned and performed using specific measurement tools and platforms. We developed the multi-agent distributed system MWING to support required active measurements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abusina, Z.U.M., Zabir, S.M.S., Asir, A., Chakraborty, D., Suganuma, T., Shiratori, N.: An engineering approach to dynamic prediction of network performance from application logs. Int. Network Mgmt. 15, 151–162 (2005)

    Article  Google Scholar 

  2. Arlit, M., Krishnamurthy, B., Mogul, J.C.: Predicting short-transfer latency from TCP arcane: A trace-based validation. In: IMC 2005: Proceedings of International Measurement Conference, pp. 119-124. USENIX Association (2005)

    Google Scholar 

  3. Baker, M., Buyya, R., Laforenza, D.: Grids and grid technologies for wide-area distributed computing. Softw. Pract. Exper. 32(15), 1437–1466 (2002)

    Article  MATH  Google Scholar 

  4. Borzemski, L.: Data mining in evaluation of internet path performance. In: Orchard, B., Yang, C., Ali, M. (eds.) IEA/AIE 2004. LNCS, vol. 3029, pp. 643–652. Springer, Heidelberg (2004)

    Google Scholar 

  5. Borzemski, L.: Testing, measuring and diagnosing Web sites from the user’s perspective. International Journal of Enterprise Information Systems 2(1), 54–66 (2006)

    Google Scholar 

  6. Borzemski, L.: The use of data mining to predict Web performance. Cybernetics and Systems 37(6), 587–608 (2006)

    Article  Google Scholar 

  7. Borzemski, L.: Internet path behavior prediction via data mining: Conceptual framework and case study. J. UCS 13(2), 287–316 (2007)

    Google Scholar 

  8. Borzemski, L., Cichocki, Ł., Fraś, M., Kliber, M., Nowak, Z.: MWING: A multiagent system for web site measurements. In: Nguyen, N.T., Grzech, A., Howlett, R.J., Jain, L.C. (eds.) KES-AMSTA 2007. LNCS, vol. 4496, pp. 278–287. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Borzemski, L., Druszcz, A.: Lessons from the application of domain-independent data mining system for discovering web user access patterns. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds.) KES 2006. LNCS, vol. 4253, pp. 789–796. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Borzemski, L., Kliber, M., Nowak, Z.: Application of data mining algorithms to TCP throughput prediction in HTTP transactions. In: Nguyen, N.T., Borzemski, L., Grzech, A., Ali, M. (eds.) IEA/AIE 2008. LNCS, vol. 5027, pp. 159–168. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Borzemski, L., Nowak, Z.: WING: A web probing, visualization, and performance analysis service. In: Koch, N., Fraternali, P., Wirsing, M. (eds.) ICWE 2004. LNCS, vol. 3140, pp. 601–602. Springer, Heidelberg (2004)

    Google Scholar 

  12. Borzemski, L., Nowak, Z.: An empirical study of Web quality: Measuring the Web from the Wroclaw University of Technology campus. In: Engineering Advanced Web Applications, pp. 307–320. Rinton Publishers (2004)

    Google Scholar 

  13. Borzemski, L., Nowak, Z.: Using Autonomous System topological information in Web server performance prediction. Cybernetics and Systems 39(7), 1–17 (2008)

    Article  Google Scholar 

  14. Borzemski, L., Zatwarnicki, K.: Using adaptive fuzzy-neural control to minimize response time in cluster-based web systems. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS, vol. 3528, pp. 63–68. Springer, Heidelberg (2005)

    Google Scholar 

  15. Borzemski, L., Zatwarnicki, K., Zatwarnicka, A.: Adaptive and intelligent request distribution for Content Delivery Networks. Cybernetics and Systems 38(8), 837–857 (2007)

    Article  Google Scholar 

  16. Bose, I., Mahapatra, R.K.: Business data mining – a machine learning perspective. Information & Management 39, 211–225 (2001)

    Article  Google Scholar 

  17. Brownlee, N., Claffy, K.C., Murray, M., Nemeth, E.: Methodology for passive analysis of a university Internet link. In: PAM 2001: Passive and Active Measurement Workshop, Amsterdam, April 22-24, 2001 (accessed May 21, 2008), http://www.ripe.net/pam2001/program.html

  18. Brownlee, N., Claffy, K.C.: Internet measurement. IEEE Internet Computing 8(5), 30–33 (2004)

    Article  Google Scholar 

  19. CAIDA. The Cooperative Association for Internet Data Analysis. (accessed April 21, 2008), http://www.caida.org

  20. Cardellini, V., Casalicchio, E., Colajanni, M., Yu, P.S.: The state of the art in locally distributed Web-server systems. ACM Computing Surveys 34(2), 263–311 (2002)

    Article  Google Scholar 

  21. Casalicchio, E., Colajanni, M.: A client-aware dispatching algorithm for Web clusters providing multiple services. In: WWW10, Proceedings of the Tenth International World Wide Web Conference, pp. 535–544 (2001)

    Google Scholar 

  22. Cell Broadband Engine (accessed April 21, 2008), http://www.ibm.com/developerworks/power/cell/

  23. Chakrabarti, S.: Mining the Web: Analysis of Hypertext and Semi Structured Data. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  24. Chen, M.-S., Han, J., Yu, P.S.: Data mining: An overview from a database perspective. IEEE Trans. Knowledge and Data Engineering 8(6), 866–883 (1996)

    Article  Google Scholar 

  25. Cho, Y.H., Kim, J.K., Kim, S.H.: A personalized recommender system based on Web usage mining and decision tree induction. Expert Systems with Applications 23, 329–342 (2002)

    Article  Google Scholar 

  26. Comer, D.: Computer Networks and Internets, 5th edn. Prentice Hall, Upper Saddle River (2008)

    Google Scholar 

  27. Daniel, E., Ward, J.: Enterprise portals: Addressing the organisational and individual perspectives of information systems. In: 13th European Conference on Information Systems. Regensburg, Germany, May 26-28 (2005)

    Google Scholar 

  28. Dimitropoulos, X., Krioukov, D., Riley, G., Claffy, K.C.: Revealing the autonomous system taxonomy: The machine learning approach. In: Passive and Active Measurement (PAM) Workshop. The Computing Research Repository (CoRR), April, abs/cs/0604017, 2006 (accessed April 21, 2008)

    Google Scholar 

  29. Duffield, N.: Sampling for passive Internet measurement: A review. Statist. Sci 19(3), 472–498 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  30. Facca, F., Lanzi, P.: Mining interesting knowledge from weblogs: A survey. Data & Knowledge Engineering 53, 225–241 (2005)

    Article  Google Scholar 

  31. Fürnkranz, J.: Web mining. In: Data Mining and Knowledge Discovery Handbook, pp. 899–920. Springer, Berlin (2005)

    Chapter  Google Scholar 

  32. He, Q., Dovrolis, C., Ammar, M.: On the predictability of large transfer TCP throughput. In: Proceedings SIGCOMM 2005, pp. 145–156. ACM Press, New York (2005)

    Chapter  Google Scholar 

  33. IBM DB2 Intelligent Miner (accessed April 21, 2008), http://www.ibm.com/

  34. Johnson, K.L., Carr, J.F., Day, M.S., Kaashoek, M.F.: The measured performance of content distribution networks. Computer Communications 24, 202–206 (2001)

    Article  Google Scholar 

  35. Kim, J.K., Cho, Y.H., Kim, W., Kim, J.R., Suh, J.H.: A personalized recommendation procedure for Internet shopping support. Electronic Commerce Research and Applications 1, 301–313 (2002)

    Article  Google Scholar 

  36. Keynote service (accessed April 21, 2008), http://www.keynote.com

  37. MOME: MOnitoring and MEsurement project (accessed April 21, 2008), http://www.ist-mome.org/

  38. NGG - Future for European Grids: Grids and Service Oriented Knowledge Utilities - Vision and Research Directions 2010 and Beyond. The 3rd report of the NGG Expert Group (accessed April 21, 2008), ftp://ftp.cordis.europa.eu/pub/ist/docs/grids/ngg3-report_en.pdf

  39. Pandey, S.K., Mishra, R.B.: Intelligent Web mining model to enhance knowledge discovery on the Web. In: Proceedings of the 7th Parallel and Distributed Computing, Applications and Technologies International Conference, pp. 339 – 343 (2006)

    Google Scholar 

  40. Pechenizkiy, M., Tsymbal, A., Puuronen, S.: Knowledge management challenges in knowledge discovery systems. In: DEXA 2005: Proceedings of the 16th International Workshop on Database and Expert Systems Applications, pp. 433–437 (2005)

    Google Scholar 

  41. Pednault, E.: Transform regression and the Kolmogorov superposition theorem. In: Proceedings of the Sixth SIAM International Conference on Data Mining, pp. 35–46 (2006)

    Google Scholar 

  42. Prasad, R.S., Murray, M., Dovrolis, C., Claffy, K.C.: Bandwidth estimation: Metrics, measurement techniques, and tools. IEEE Network 17(6), 27–35 (2003)

    Article  Google Scholar 

  43. RFC1945. Hypertext Transfer Protocol - HTTP/1.0. Request for Comments: 1945 (accessed April 21, 2008), http://www.ietf.org/rfc/rfc1945.txt

  44. Roughan, M.: A comparison of Poisson and uniform sampling for active measurements. IEEE J. Sel. Areas in Commm 24(12), 2299–2312 (2006)

    Article  Google Scholar 

  45. Swany, M., Wolski, R.: Multivariate resource performance forecasting in the network weather service. In: Proceedings of the IEEE/ACM SC2002 Conference, pp. 1–10 (2002)

    Google Scholar 

  46. W3C Extended Log File Format (accessed April 21, 2008), http://www.w3.org/TR/WD-logfile

  47. Wang, X., Abraham, A., Smith, K.A.: Intelligent web traffic mining and analysis. J. of Network and Computer Applications 28, 147–165 (2005)

    Article  Google Scholar 

  48. Watson, D., Malan, G.R., Jahanian, F.: An extensible probe architecture for network protocol performance measurement. Softw. Pract. Exper. 34, 47–67 (2004)

    Article  Google Scholar 

  49. Wolski, R.: Dynamically forecasting network performance using the network weather service. Cluster Computing 1(1), 119–132 (1998)

    Article  Google Scholar 

  50. Yousaf, M., Welzl, M.: A reliable network measurement and prediction architecture for grid scheduling. In: 1st IEEE/IFIP International Workshop on Autonomic Grid Networking and Management AGNM 2005, Barcelona 28th (October 2005)

    Google Scholar 

  51. Zhang, Y., Duffield, N., Paxson, V., Shenker, S.: On the constancy of Internet path properties. In: Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, pp. 197–211 (2001)

    Google Scholar 

  52. Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Applied Artificial Intelligence 17, 375–381 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Borzemski, L. (2009). Towards Web Performance Mining. In: Ting, IH., Wu, HJ. (eds) Web Mining Applications in E-commerce and E-services. Studies in Computational Intelligence, vol 172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88081-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88081-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88080-6

  • Online ISBN: 978-3-540-88081-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics