Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 172))

Abstract

In recent years, web usage mining techniques have helped online service providers to enhance their services, and restructure and redesign their websites in line with the insights gained. The application of these techniques is essential in building intelligent, personalised online services. More recently, it has been recognised that the shift from traditional to online services – and so the growing numbers of online customers and the increasing traffic generated by them – brings new challenges to the field. Highly demanding real-world E-commerce and E-services applications, where the rapid, and possibly changing, large volume data streams do not allow offline processing, motivate the development of new, highly efficient real-time web usage mining techniques. This chapter provides an introduction to online web usage mining and presents an overview of the latest developments. In addition, it outlines the major, and yet mostly unsolved, challenges in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.: Data Streams: Models and Algorithms. Advances in Database Systems. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  2. Anand, S.S., Mobasher, B.: Intelligent techniques for web personalization. In: Mobasher, B., Anand, S.S. (eds.) ITWP 2003. LNCS (LNAI), vol. 3169, pp. 1–36. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Atterer, R., Wnuk, M., Schmidt, A.: Knowing the user’s every move: user activity tracking for website usability evaluation and implicit interaction. In: WWW 2006: Proceedings of the 15th international conference on World Wide Web, pp. 203–212. ACM, New York (2006)

    Chapter  Google Scholar 

  4. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS 2002: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 1–16. ACM, New York (2002)

    Chapter  Google Scholar 

  5. Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web: Probabilistic Methods and Algorithms. John Wiley & Sons, Chichester (2003)

    Google Scholar 

  6. Balog, K., Hofgesang, P.I., Kowalczyk, W.: Modeling navigation patterns of visitors of unstructured websites. In: AI-2005: Proceedings of the 25th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, pp. 116–129. Springer SBM, Heidelberg (2005)

    Google Scholar 

  7. Baraglia, R., Silvestri, F.: Dynamic personalization of web sites without user intervention. Commun. ACM 50(2), 63–67 (2007)

    Article  Google Scholar 

  8. Barbará, D.: Requirements for clustering data streams. SIGKDD Explor. Newsl. 3(2), 23–27 (2002)

    Article  Google Scholar 

  9. Baron, S., Spiliopoulou, M.: Monitoring the evolution of web usage patterns. In: Berendt, B., Hotho, A., Mladenič, D., van Someren, M., Spiliopoulou, M., Stumme, G. (eds.) EWMF 2003. LNCS (LNAI), vol. 3209, pp. 181–200. Springer, Heidelberg (2004)

    Google Scholar 

  10. Calders, T., Dexters, N., Goethals, B.: Mining frequent itemsets in a stream. In: Perner, P. (ed.) ICDM 2007, pp. 83–92. IEEE Computer Society, Los Alamitos (2007)

    Chapter  Google Scholar 

  11. Chang, J.H., Lee, W.S.: EstWin: Online data stream mining of recent frequent itemsets by sliding window method. J. Inf. Sci. 31(2), 76–90 (2005)

    Article  Google Scholar 

  12. Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  13. Chen, C.-M.: Incremental personalized web page mining utilizing self-organizing HCMAC neural network. Web Intelli. and Agent Sys. 2(1), 21–38 (2004)

    Google Scholar 

  14. Chen, Y., Guo, J., Wang, Y., Xiong, Y., Zhu, Y.: Incremental mining of sequential patterns using prefix tree. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 433–440. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Cheng, H., Yan, X., Han, J.: IncSpan: incremental mining of sequential patterns in large database. In: KDD 2004: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 527–532. ACM Press, New York (2004)

    Chapter  Google Scholar 

  16. Cheung, W., Zaïane, O.R.: Incremental mining of frequent patterns without candidate generation or support constraint. In: IDEAS 2003: 7th International Database Engineering and Applications Symposium, pp. 111–116. IEEE Computer Society, Los Alamitos (2003)

    Chapter  Google Scholar 

  17. Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: Maintaining closed frequent itemsets over a stream sliding window. In: ICDM 2004, pp. 59–66. IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  18. Cooley, R., Mobasher, B., Srivastava, J.: Web mining: Information and pattern discovery on the world wide web. In: ICTAI 1997: Proceedings of the 9th International Conference on Tools with Artificial Intelligence, pp. 558–567. IEEE Computer Society, Los Alamitos (1997)

    Chapter  Google Scholar 

  19. Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems 1(1), 5–32 (1999)

    Google Scholar 

  20. Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)

    Article  MathSciNet  Google Scholar 

  21. Desikan, P., Srivastava, J.: Mining temporally evolving graphs. In: Mobasher, B., Liu, B., Masand, B., Nasraoui, O. (eds.) WebKDD 2004: Webmining and Web Usage Analysis (2004)

    Google Scholar 

  22. Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Trans. Inter. Tech. 3(1), 1–27 (2003)

    Article  Google Scholar 

  23. El-Sayed, M., Ruiz, C., Rundensteiner, E.A.: FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs. In: WIDM 2004: Proceedings of the 6th annual ACM international workshop on Web information and data management, pp. 128–135. ACM Press, New York (2004)

    Chapter  Google Scholar 

  24. Ester, M., Kriegel, H.-P., Sander, J., Wimmer, M., Xu, X.: Incremental clustering for mining in a data warehousing environment. In: Gupta, A., Shmueli, O., Widom, J. (eds.) VLDB 1998: Proceedings of 24rd International Conference on Very Large Data Bases, pp. 323–333. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  25. Fetterly, D., Manasse, M., Najork, M., Wiener, J.L.: A large-scale study of the evolution of web pages. Softw. Pract. Exper. 34(2), 213–237 (2004)

    Article  Google Scholar 

  26. Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. SIGMOD Rec. 34(2), 18–26 (2005)

    Article  Google Scholar 

  27. Gama, J., Castillo, G.: Learning with local drift detection. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 42–55. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  28. Ganti, V., Gehrke, J., Ramakrishnan, R.: DEMON: Mining and monitoring evolving data. Knowledge and Data Engineering 13(1), 50–63 (2001)

    Article  Google Scholar 

  29. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Next Generation Data Mining. AAAI/MIT (2003)

    Google Scholar 

  30. Giraud-Carrier, C.: A note on the utility of incremental learning. AI Communications 13(4), 215–223 (2000)

    MATH  Google Scholar 

  31. Godoy, D., Amandi, A.: User profiling for web page filtering. IEEE Internet Computing 9(04), 56–64 (2005)

    Article  Google Scholar 

  32. Gündüz-Ögüdücü, S., Özsu, M.T.: Incremental click-stream tree model: Learning from new users for web page prediction. Distributed and Parallel Databases 19(1), 5–27 (2006)

    Article  Google Scholar 

  33. Han, J., Han, D., Lin, C., Zeng, H.-J., Chen, Z., Yu, Y.: Homepage live: automatic block tracing for web personalization. In: WWW 2007: Proceedings of the 16th International Conference on World Wide Web, pp. 1–10. ACM, New York (2007)

    Chapter  Google Scholar 

  34. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, pp. 1–12. ACM, New York (2000)

    Chapter  Google Scholar 

  35. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)

    Article  MathSciNet  Google Scholar 

  36. Hofgesang, P.I.: Methodology for preprocessing and evaluating the time spent on web pages. In: WI 2006: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 218–225. IEEE Computer Society, Los Alamitos (2006)

    Chapter  Google Scholar 

  37. Hofgesang, P.I.: Web personalisation through incremental individual profiling and support-based user segmentation. In: WI 2007: Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 213–220. IEEE Computer Society, Washington (2007)

    Chapter  Google Scholar 

  38. Hofgesang, P.I., Patist, J.P.: Online change detection in individual web user behaviour. In: WWW 2008: Proceedings of the 17th International Conference on World Wide Web, pp. 1157–1158. ACM, New York (2008)

    Chapter  Google Scholar 

  39. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM Press, New York (2001)

    Chapter  Google Scholar 

  40. Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM 2003: Proceedings of the twelfth international conference on Information and knowledge management, pp. 287–294. ACM, New York (2003)

    Chapter  Google Scholar 

  41. Xie, Z.-j., Chen, H., Li, C.: MFIS—mining frequent itemsets on data streams. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS, vol. 4093, pp. 1085–1093. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  42. Khoury, I., El-Mawas, R.M., El-Rawas, O., Mounayar, E.F., Artail, H.: An efficient web page change detection system based on an optimized Hungarian algorithm. IEEE Trans. Knowl. Data Eng. 19(5), 599–613 (2007)

    Article  Google Scholar 

  43. Koh, J.-L., Shieh, S.-F.: An efficient approach for maintaining association rules based on adjusting FP-tree structures1. In: Lee, Y., Li, J., Whang, K.-Y., Lee, D. (eds.) DASFAA 2004. LNCS, vol. 2973, pp. 417–424. Springer, Heidelberg (2004)

    Google Scholar 

  44. Laxman, S., Sastry, P.S., Unnikrishnan, K.P.: A fast algorithm for finding frequent episodes in event streams. In: KDD 2007: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 410–419. ACM, New York (2007)

    Chapter  Google Scholar 

  45. Lee, D., Lee, W.: Finding maximal frequent itemsets over online data streams adaptively. In: ICDM 2005: Proceedings of the 5th IEEE International Conference on Data Mining, pp. 266–273. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  46. Leung, C.K.-S., Khan, Q.I.: DSTree: A tree structure for the mining of frequent sets from data streams. In: Perner, P. (ed.) ICDM 2006: Proceedings of the Sixth International Conference on Data Mining, pp. 928–932. IEEE Computer Society, Los Alamitos (2006)

    Chapter  Google Scholar 

  47. Leung, C.K.-S., Khan, Q.I., Hoque, T.: CanTree: A tree structure for efficient incremental mining of frequent patterns. In: ICDM 2005: Proceedings of the 5th IEEE International Conference on Data Mining, pp. 274–281. IEEE Computer Society, Los Alamitos (2005)

    Chapter  Google Scholar 

  48. Li, H.-F., Lee, S.-Y., Shan, M.-K.: On mining webclick streams for path traversal patterns. In: WWW Alt. 2004: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pp. 404–405. ACM, New York (2004)

    Chapter  Google Scholar 

  49. Li, H.-F., Lee, S.-Y., Shan, M.-K.: DSM-TKP: Mining top-k path traversal patterns over web click-streams. In: WI 2005: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 326–329. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  50. Li, H.-F., Lee, S.-Y., Shan, M.-K.: DSM-PLW: single-pass mining of path traversal patterns over streaming web click-sequences. Comput. Netw. 50(10), 1474–1487 (2006)

    Article  Google Scholar 

  51. Liu, B.: Web Data Mining. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  52. Liu, L., Pu, C., Tang, W.: WebCQ-detecting and delivering information changes on the web. In: CIKM 2000: Proceedings of the ninth international conference on Information and knowledge management, pp. 512–519. ACM Press, New York (2000)

    Chapter  Google Scholar 

  53. Masseglia, F., Poncelet, P., Teisseire, M.: Web usage mining: How to efficiently manage new transactions and new clients. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 530–535. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  54. Mobasher, B., Dai, H., Luo, T., Sun, Y., Zhu, J.: Integrating web usage and content mining for more effective personalization. In: Bauknecht, K., Madria, S.K., Pernul, G. (eds.) EC-Web 2000. LNCS, vol. 1875, pp. 165–176. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  55. Nasraoui, O., Cerwinske, J., Rojas, C., González, F.A.: Performance of recommendation systems in dynamic streaming environments. In: SDM 2007. SIAM, Philadelphia (2007)

    Google Scholar 

  56. Nasraoui, O., Rojas, C., Cardona, C.: A framework for mining evolving trends in web data streams using dynamic learning and retrospective validation. Computer Networks 50(10), 1488–1512 (2006)

    Article  Google Scholar 

  57. Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans. Knowl. Data Eng. 20(2), 202–215 (2008)

    Article  Google Scholar 

  58. Nasraoui, O., Uribe, C.C., Coronel, C.R., González, F.A.: TECNO-STREAMS: Tracking evolving clusters in noisy data streams with a scalable immune system learning model. In: ICDM 2003: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 235–242. IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

  59. Nguyen, S.N., Sun, X., Orlowska, M.E.: Improvements of incSpan: Incremental mining of sequential patterns in large database. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 442–451. Springer, Heidelberg (2005)

    Google Scholar 

  60. Ntoulas, A., Cho, J., Olston, C.: What’s new on the web?: the evolution of the web from a search engine perspective. In: WWW 2004: Proceedings of the 13th international conference on World Wide Web, pp. 1–12. ACM, New York (2004)

    Chapter  Google Scholar 

  61. Parthasarathy, S., Zaki, M.J., Ogihara, M., Dwarkadas, S.: Incremental and interactive sequence mining. In: CIKM 1999: Proceedings of the eighth international conference on Information and knowledge management, pp. 251–258. ACM Press, New York (1999)

    Chapter  Google Scholar 

  62. Perkowitz, M., Etzioni, O.: Adaptive web sites: automatically synthesizing web pages. In: AAAI 1998/IAAI 1998: Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, pp. 727–732. American Association for Artificial Intelligence, Menlo Park (1998)

    Google Scholar 

  63. Pierrakos, D., Paliouras, G., Papatheodorou, C., Spyropoulos, C.D.: Web usage mining as a tool for personalization: A survey. User Modeling and User-Adapted Interaction 13(4), 311–372 (2003)

    Article  Google Scholar 

  64. Roddick, J.F., Spiliopoulou, M.: A survey of temporal knowledge discovery paradigms and methods. IEEE Transactions on Knowledge and Data Engineering 14(4), 750–767 (2002)

    Article  Google Scholar 

  65. Rojas, C., Nasraoui, O.: Summarizing evolving data streams using dynamic prefix trees. In: WI 2007: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 221–227. IEEE Computer Society, Washington (2007)

    Chapter  Google Scholar 

  66. Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R.: MONIC: modeling and monitoring cluster transitions. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 706–711. ACM, New York (2006)

    Google Scholar 

  67. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1(2), 12–23 (2000)

    Article  Google Scholar 

  68. Stonebraker, M., Çetintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. SIGMOD Rec. 34(4), 42–47 (2005)

    Article  Google Scholar 

  69. Suryavanshi, B.S., Shiri, N., Mudur, S.P.: Adaptive web usage profiling. In: Nasraoui, O., Zaïane, O.R., Spiliopoulou, M., Mobasher, B., Masand, B., Yu, P.S. (eds.) WebKDD 2005. LNCS, vol. 4198, pp. 119–138. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  70. Wang, K.: Discovering patterns from large and dynamic sequential data. J. Intell. Inf. Syst. 9(1), 33–56 (1997)

    Article  Google Scholar 

  71. Weinreich, H., Obendorf, H., Herder, E., Mayer, M.: Not quite the average: An empirical study of web use. ACM Trans. Web 2(1), 1–31 (2008)

    Article  Google Scholar 

  72. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23(1), 69–101 (1996)

    Google Scholar 

  73. Wu, E.H., Ng, M.K., Huang, J.Z.: On improving website connectivity by using web-log data streams. In: Lee, Y., Li, J., Whang, K.-Y., Lee, D. (eds.) DASFAA 2004. LNCS, vol. 2973, pp. 352–364. Springer, Heidelberg (2004)

    Google Scholar 

  74. Wu, E.H., Ng, M.K., Yip, A.M., Chan, T.F.: A clustering model for mining evolving web user patterns in data stream environment. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 565–571. Springer, Heidelberg (2004)

    Google Scholar 

  75. Yen, S.-J., Lee, Y.-S., Hsieh, M.-C.: An efficient incremental algorithm for mining web traversal patterns. In: ICEBE 2005: Proceedings of the IEEE International Conference on e-Business Engineering, pp. 274–281. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hofgesang, P.I. (2009). Online Mining of Web Usage Data: An Overview. In: Ting, IH., Wu, HJ. (eds) Web Mining Applications in E-commerce and E-services. Studies in Computational Intelligence, vol 172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88081-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88081-3_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88080-6

  • Online ISBN: 978-3-540-88081-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics