A Web User Profiling Approach

  • Younes Hafri
  • Chabane Djeraba
  • Peter Stanchev
  • Bruno Bachimont
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2642)


People display regularities in almost everything they do. This paper proposes characteristics of an idealized algorithm that would allow an automatic extraction of web user profil based on user navigation paths. We describe a simple predictive approach with these characteristics and show its predictive accuracy on a large dataset from KDD-Cup web logs (a commercial web site), while using fewer computational and memory resources. To achieve this objective, our approach is articulated around three notions: (1) Applying probabilistic exploration using Markov models. (2) Avoiding the problem of Markov model high-dimensionality and sparsity by clustering web documents, based on their content, before applying the Markov analysis. (3) Clustering Markov models, and extraction of their gravity centers. On the basis of these three notions, the approach makes possible the prediction of future states to be visited in k steps and navigation sessions monitoring, based on both content and traversed paths.


Markov Model Hide Markov Model Speech Recognition Page Number Gravity Center 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rabiner, L.R.: A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. of IEEE 77(2) (1989) 257–286CrossRefGoogle Scholar
  2. 2.
    Hu, J., Brown, M.K., Turin, W.: HMM based on-line handwriting recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 18(10) (1996) 1039–1045CrossRefGoogle Scholar
  3. 3.
    Hughey, R., Krogh, A.: Hidden Markov Model for sequence analysis: extension and analysis of the basic method. Comp. Appl. in the Biosciences 12 (1996) 95–107Google Scholar
  4. 4.
    Eickeler, S., Kosmala, A., Rigoll, G.: Hidden Markov Model based online gesture recognition. Proc. Int. Conf. on Pattern Recognition (ICPR) (1998) 1755–1757Google Scholar
  5. 5.
    Jebara, T., Pentland, A.: Action Reaction Learning: Automatic Visual Analysis and Synthesis of interactive behavior. In 1st Intl. Conf. on Computer Vision Systems (ICVS’99) (1999)Google Scholar
  6. 6.
    Rabiner, L. R., Lee, C.H., Juang, B. H., Wilpon, J. G.: HMM Clustering for Connected Word Recognition. Proceedings of IEEE ICASSP (1989) 405–408Google Scholar
  7. 7.
    Lee, K. F.: Context-Dependent Phonetic Hidden Markov Models for Speaker Independent Continuous Speech Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 38(4) (1990) 599–609CrossRefGoogle Scholar
  8. 8.
    Smyth, P.: Clustering sequences with HMM, in Advances in Neural Information Processing (M. Mozer, M. Jordan, and T. Petsche, eds.) MIT Press 9 (1997)Google Scholar
  9. 9.
    Smyth, P.: Clustering sequences with hidden markov models. In M.C. Mozer, M.I. Jordan, and T. Petsche, editors, Advances in NIPS 9, (1997)Google Scholar
  10. 10.
    Li, C., Biswas, G.: Clustering Sequence Data using Hidden Markov Model Representation, SPIE’99 Conference on Data Mining and Knowledge Discovery: Theory, Tools, and Technology, (1999) 14–21Google Scholar
  11. 11.
    Li, C., Biswas, G.: A Bayesian Approach to Temporal Data Clustering using Hidden Markov Models. Intl. Conference on Machine Learning (2000) 543–550Google Scholar
  12. 12.
    Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics, 6(2) (1978) 461–464zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Stolcke, A., Omohundro, S.: Hidden Markov Model Induction by Bayesian Model Merging. Hanson, S.J., Cowan, J.D., Giles, C.L. eds. Advances in Neural Information Processing Systems 5 (1993) 11–18Google Scholar
  14. 14.
    Cheeseman, P., Stutz, J.: Bayesian Classification (autoclass): Theory and Results. Advances in Knowledge discovery and data mining, (1996) 153–180Google Scholar
  15. 15.
    Law, M.H., Kwok, J.T.: Rival penalized competitive learning for model-based sequence Proceedings Intl Conf. on Pattern Recognition (ICPR) 2, (2000) 195–198CrossRefGoogle Scholar
  16. 16.
    Cadez, I., Ganey, S. and Smyth, P.: A general probabilistic framework for clustering individuals. Technical report, Univ. Calif., Irvine, March (2000)Google Scholar
  17. 17.
    Smyth, P.: Probabilistic model-based clustering of multivariate and sequential data. In Proc. of 7th Int. Workshop AI and Statistics, (1999) 299–304Google Scholar
  18. 18.
    Ni, Z.: Normal orthant probabilities in the equicorrelated case. Jour. Math. Analysis and Applications, no 246, (2000) 280–295Google Scholar
  19. 19.
    Ng, R.T. and Han, J.: CLARANS: A Method for Clustering Objects for Spatial Data Mining. TJDE 14(5), (2002) 1003–1016Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Younes Hafri
    • 1
    • 2
  • Chabane Djeraba
    • 2
  • Peter Stanchev
    • 3
  • Bruno Bachimont
    • 1
  1. 1.Institut National de l’AudiovisuelBry-sur-Marne CedexFrance
  2. 2.Institut de Recherche en Informatique de NantesNantes CedexFrance
  3. 3.Kettering UniversityFlintUSA

Personalised recommendations