Abstract
With the emergence of the World Wide Web, analyzing and improving Web communication has become essential to adapt the Web content to the visitors’ expectations. Web communication analysis is traditionally performed by Web analytics software, which produce long lists of page-based audience metrics. These results suffer from page synonymy, page polysemy, page temporality, and page volatility. In addition, the metrics contain little semantics and are too detailed to be exploited by organization managers and chief editors, who need summarized and conceptual information to take high-level decisions. To obtain such metrics, we propose a method based on output page mining. Output page mining is a new kind of Web usage mining, between Web usage mining and Web content mining. In our method, we first collect the Web pages output by the Web server. Then, for a given taxonomy covering the Web site knwoledge domain, we aggregate the term weights in the output pages using OLAP tools, in order to obtain topic-based metrics representing the audience of the Web site topics. To demonstrate how our approach solves the cited problems, we compute topic-based metrics with SQL Server OLAP Analysis Service and our prototype WASA for real Web sites. Finally, we compare our results against those obtained with Google Analytics, a popular Web analytics tool.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Srivastava, J., Cooley, R., Deshpande, M., Pang-Ning, T.: Web usage mining: Discovery and applications of usage patterns from web data, SIGKDD Explorations 1(2)
March, J., Simon, H., Guetzkow, H.: Organizations, 2nd edn. Blackwell, Cambridge (1983)
Wahli, U., Norguet, J., Andersen, J., Hargrove, N., Meser, M.: Websphere Version 5 Application Development Handbook. IBM Press (2003), http://www.redbooks.ibm.com/redpieces/pdfs/sg246993.pdf
Chen, M.-S., Han, J., Yu, P.S.: Data mining: An overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8(6), 866–883 (1996)
Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on Web usage mining. Communications of the ACM 43(8), 142–151 (2000)
Aggarwal, C.C., Yu, P.S.: On disk caching of web objects in proxy servers. In: Proc. of the 6th Int. Conf. on Information and Knowledge Management, CIKM, pp. 238–245 (1997)
Perkowitz, M., Etzioni, O.: Towards adaptive web sites: Conceptual framework and case study. J. of Artif. Intell. 118(1-2), 245–275 (2000)
Büchner, A.G., Mulvenna, M.D.: Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record 27(4), 54–61 (1998)
Pirolli, P., Pitkow, J.E.: Distributions of surfers’ paths through the world wide web: Empirical characterizations. J. of the World Wide Web 2(1-2), 29–45 (1999)
Ríos, S.A., Velásquez, J.D., Vera, E.S., Yasuda, H., Aoki, T.: Using SOFM to improve web site text content. In: Proc. of the 1st Int. Conf. on Advances in Natural Computation, ICNC, Part II, pp. 622–626 (2005)
Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.E.: Using information scent to model user information needs and actions and the web. In: Proc. of the SIGCHI on Human Factors in Computing Systems, pp. 490–497 (2001)
Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. Data Knowl. Eng. 53(3), 225–241 (2005)
Materna, G.: Extraction par déformattage du contenu de pages Web dynamiques semi-structurées, travail de fin d’études d’Ingénieur civil informaticien, Faculté des Sciences Appliquées, Université Libre de Bruxelles (2002)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Stumme, G., Maedche, A.: FCA-MERGE: Bottom-up merging of ontologies. In: Proc. of the 17th Int. Joint Conf. on Artificial Intelligence, IJCAI, pp. 225–234 (2001)
Sweiger, M., Madsen, M., Langston, J., Lombard, H.: Clickstream Data Warehousing. John Wiley & Sons, Chichester (2002)
Malinowski, E., Zimányi, E.: OLAP hierarchies: A conceptual perspective. In: Persson, A., Stirna, J. (eds.) CAiSE 2004. LNCS, vol. 3084, pp. 477–491. Springer, Heidelberg (2004)
Norguet, J.P., Zimányi, E., Steinberger, R.: Improving web sites with web usage mining, web content mining, and semantic analysis. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006. LNCS, vol. 3831, pp. 430–439. Springer, Heidelberg (2006)
Steinberger, R., Pouliquen, B., Ignat, C.: Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis applications. In: Proc. B of the 7th Int. Multiconference on Language Technologies, IS 2004 (2004)
Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intelligent Systems 16(2), 72–79 (2001)
Steinberger, R., Pouliquen, B., Ignat, C.: Navigating multilingual news collection using automatically extracted information. In: Proc. of the 27th Int. Conf. on Information Technology Interfaces, ITI (2005)
Lozano-Tello, A., Gómez-Pérez, A.: ONTOMETRIC: A method to choose the appropriate ontology. J. of Database Manag. 15(2), 1–18 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Norguet, JP., Zimányi, E., Steinberger, R. (2009). Semantic Analysis of Web Site Audience by Integrating Web Usage Mining and Web Content Mining. In: Ting, IH., Wu, HJ. (eds) Web Mining Applications in E-commerce and E-services. Studies in Computational Intelligence, vol 172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88081-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-88081-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88080-6
Online ISBN: 978-3-540-88081-3
eBook Packages: EngineeringEngineering (R0)