Ontology-Based Partitioning of Data Steam for Web Mining: A Case Study of Web Logs

  • Jason J. Jung
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3036)


This paper presents a nevel method partitioning steaming data based on ontology. Web directory service is applied to enrich semantics to web logs, as categorizing them to all possible hierarchical paths. In order to detect the candidate set of session identifiers, semantic factors like semantic mean, deviation, and distance matrix are established. Eventually, each semantic session is obtained based on nested repetition of top-down partitioning and evaluation process. For experiment, we applied this ontology-oriented heuristics to sessionize the access log files for one week from IRCache. Compared with time-oriented heuristics, more than 48% of sessions were additionally detected by semantic outlier analysis.


Semantic Distance Semantic Factor Cache Server Data Steam Semantic Session 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Cooley, R., Srivastava, J., Mobasher, B.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proc. of the 9th IEEE Int. Conf. on Tools with Artificial Intelligence (1997)Google Scholar
  2. 2.
    Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on Web usage mining. Comm. of the ACM 43(8) (2000)Google Scholar
  3. 3.
    Berendt, B., Mobasher, B., Nakagawa, M., Spiliopoulou, M.: The Impact of Site Structure and User Environment on Session Reconstruction inWeb Usage Analysis. In: Proc. of the 4th WebKDD Workshop at the ACM-SIGKDD Conf. on Knowledge Discovery in Databases (2002)Google Scholar
  4. 4.
    Chen, Z., Tao, L., Wang, J., Wenyin, L., Ma, W.-Y.: A Unified Framework for Web Link Analysis. In: Proc. of the 3rd Int. Conf. onWeb Information Systems Engineering, pp. 63–72 (2002)Google Scholar
  5. 5.
    Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems 1(1), 5–32 (1999)Google Scholar
  6. 6.
    Dai, H., Mobasher, B.: Using ontologies to discover domain-level web usage profiles. In: Proc. of the 2nd Semantic Web Mining Workshop at the PKDD 2002 (2002) Google Scholar
  7. 7.
    Labrou, Y., Finin, T.: Yahoo! as an Ontology: Using Yahoo! Categories to Describe Documents. In: Proc. of the 8th Int. Conf. on Information Knowledge Management, pp. 180–187 (1999)Google Scholar
  8. 8.
    Jung, J.J., Yoon, J.-S., Jo, G.-S.: Collaborative Information Filtering by Using Categorized Bookmarks on the Web. In: Proc. of the 14th Int. Conf. on Applications of Prolog, pp. 343–357 (2001)Google Scholar
  9. 9.
    McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Building Domain-Specific Search Engines with Machine Learning Techniques. In: AAAI Spring Symp. (1999)Google Scholar
  10. 10.
    Berendt, B., Spiliopoulou, M.: Analysing navigation behaviour in web sites integrating multiple information systems. The VLDB Journal 9(1), 56–75 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Jason J. Jung
    • 1
  1. 1.School of Computer and Information EngineeringInha UniversityIncheonKorea

Personalised recommendations