Abstract
With the emergence of the World Wide Web, analyzing and improving Web communication has become essential to adapt the Web content to the visitors’ expectations. Web communication analysis is traditionally performed by Web analytics software, which produce long lists of page-based audience metrics. These results suffer from page synonymy, page polysemy, page temporality, and page volatility. In addition, the metrics contain little semantics and are too detailed to be exploited by organization managers and chief editors, who need summarized and conceptual information to take high-level decisions. To obtain such metrics, we propose to classify the Web site pages into categories representing the Web site topics and to aggregate the page hits accordingly. In this paper, we show how to compute and visualize these metrics using OLAP tools. To solve the page-temporality issue, we propose to classify the versions of the pages using automatic classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB J 7(3), 163–178 (1998)
Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.E.: Using information scent to model user information needs and actions and the web. In: Proc. of the SIGCHI on Human Factors in Computing Systems, pp. 490–497 (2001)
Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. Data Knowl. Eng. 53(3), 225–241 (2005)
Johan, H., Perrotta, D., Steinberger, R., Varfis, A.: Document classification and visualisation to support the investigation of suspected fraud. In: Proc. of the 4th European Conf. on Principles and Practice of Knowledge Discovery in Databases, PKDD (2000)
Malinowski, E., Zimányi, E.: OLAP hierarchies: A conceptual perspective. In: Persson, A., Stirna, J. (eds.) CAiSE 2004. LNCS, vol. 3084, pp. 477–491. Springer, Heidelberg (2004)
March, J.G., Simon, H.A., Guetzkow, H.S.: Organizations, 2nd edn. Blackwell, Malden (1983)
Mitchell, T.M.: Machine Learning. McGraw-Hill Higher Education, New York (1997)
Norguet, J.-P., Zimányi, E., Steinberger, R.: Improving web sites with web usage mining, web content mining, and semantic analysis. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006. LNCS, vol. 3831, pp. 430–439. Springer, Heidelberg (2006)
Ráez, A.M., López, L.A.U., Steinberger, R.: Adaptive selection of base classifiers in one-against-all learning for large multi-labeled collections. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS, vol. 3230, pp. 1–12. Springer, Heidelberg (2004)
Ríos, S.A., Velásquez, J.D., Vera, E.S., Yasuda, H., Aoki, T.: Using SOFM to improve web site text content. In: Wang, L., Chen, K., S. Ong, Y. (eds.) ICNC 2005. LNCS, vol. 3611, Part ll, pp. 622–626. Springer, Heidelberg (2005)
Rohatgi, V.K.: An Introduction to Probability Theory and Mathematical Statistics. John Wiley & Sons, Chichester (1976)
Sanderson, M.: Word sense disambiguation and information retrieval. In: Proc. of the 17th Int. Conf. on R&D in IR, SIGIR, pp. 142–150 (1994)
Srivastava, J., Cooley, R., Deshpande, M., Pang-Ning, T.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1(2) (2000)
Stumme, G., Maedche, A.: FCA-MERGE: Bottom-up merging of ontologies. In: Proc. of the 17th Int. Joint Conf. on Artificial Intelligence, IJCAI, pp. 225–234 (2001)
Wahli, U., Norguet, J.P., Andersen, J., Hargrove, N., Meser, M.: Websphere Version 5 Application Development Handbook. IBM Press (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Norguet, JP., Tshibasu-Kabeya, B., Bontempi, G., Zimányi, E. (2006). Category-Based Audience Metrics for Web Site Content Improvement Using Ontologies and Page Classification. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2006. Lecture Notes in Computer Science, vol 3999. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11765448_21
Download citation
DOI: https://doi.org/10.1007/11765448_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34616-6
Online ISBN: 978-3-540-34617-3
eBook Packages: Computer ScienceComputer Science (R0)