Skip to main content

Semantic Analysis of Web Site Audience by Integrating Web Usage Mining and Web Content Mining

  • Chapter
  • 877 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 172))

Abstract

With the emergence of the World Wide Web, analyzing and improving Web communication has become essential to adapt the Web content to the visitors’ expectations. Web communication analysis is traditionally performed by Web analytics software, which produce long lists of page-based audience metrics. These results suffer from page synonymy, page polysemy, page temporality, and page volatility. In addition, the metrics contain little semantics and are too detailed to be exploited by organization managers and chief editors, who need summarized and conceptual information to take high-level decisions. To obtain such metrics, we propose a method based on output page mining. Output page mining is a new kind of Web usage mining, between Web usage mining and Web content mining. In our method, we first collect the Web pages output by the Web server. Then, for a given taxonomy covering the Web site knwoledge domain, we aggregate the term weights in the output pages using OLAP tools, in order to obtain topic-based metrics representing the audience of the Web site topics. To demonstrate how our approach solves the cited problems, we compute topic-based metrics with SQL Server OLAP Analysis Service and our prototype WASA for real Web sites. Finally, we compare our results against those obtained with Google Analytics, a popular Web analytics tool.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Srivastava, J., Cooley, R., Deshpande, M., Pang-Ning, T.: Web usage mining: Discovery and applications of usage patterns from web data, SIGKDD Explorations 1(2)

    Google Scholar 

  2. March, J., Simon, H., Guetzkow, H.: Organizations, 2nd edn. Blackwell, Cambridge (1983)

    Google Scholar 

  3. Wahli, U., Norguet, J., Andersen, J., Hargrove, N., Meser, M.: Websphere Version 5 Application Development Handbook. IBM Press (2003), http://www.redbooks.ibm.com/redpieces/pdfs/sg246993.pdf

  4. Chen, M.-S., Han, J., Yu, P.S.: Data mining: An overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8(6), 866–883 (1996)

    Article  Google Scholar 

  5. Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on Web usage mining. Communications of the ACM 43(8), 142–151 (2000)

    Article  Google Scholar 

  6. Aggarwal, C.C., Yu, P.S.: On disk caching of web objects in proxy servers. In: Proc. of the 6th Int. Conf. on Information and Knowledge Management, CIKM, pp. 238–245 (1997)

    Google Scholar 

  7. Perkowitz, M., Etzioni, O.: Towards adaptive web sites: Conceptual framework and case study. J. of Artif. Intell. 118(1-2), 245–275 (2000)

    Article  MATH  Google Scholar 

  8. Büchner, A.G., Mulvenna, M.D.: Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record 27(4), 54–61 (1998)

    Article  Google Scholar 

  9. Pirolli, P., Pitkow, J.E.: Distributions of surfers’ paths through the world wide web: Empirical characterizations. J. of the World Wide Web 2(1-2), 29–45 (1999)

    Article  Google Scholar 

  10. Ríos, S.A., Velásquez, J.D., Vera, E.S., Yasuda, H., Aoki, T.: Using SOFM to improve web site text content. In: Proc. of the 1st Int. Conf. on Advances in Natural Computation, ICNC, Part II, pp. 622–626 (2005)

    Google Scholar 

  11. Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.E.: Using information scent to model user information needs and actions and the web. In: Proc. of the SIGCHI on Human Factors in Computing Systems, pp. 490–497 (2001)

    Google Scholar 

  12. Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. Data Knowl. Eng. 53(3), 225–241 (2005)

    Article  Google Scholar 

  13. Materna, G.: Extraction par déformattage du contenu de pages Web dynamiques semi-structurées, travail de fin d’études d’Ingénieur civil informaticien, Faculté des Sciences Appliquées, Université Libre de Bruxelles (2002)

    Google Scholar 

  14. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  15. Stumme, G., Maedche, A.: FCA-MERGE: Bottom-up merging of ontologies. In: Proc. of the 17th Int. Joint Conf. on Artificial Intelligence, IJCAI, pp. 225–234 (2001)

    Google Scholar 

  16. Sweiger, M., Madsen, M., Langston, J., Lombard, H.: Clickstream Data Warehousing. John Wiley & Sons, Chichester (2002)

    Google Scholar 

  17. Malinowski, E., Zimányi, E.: OLAP hierarchies: A conceptual perspective. In: Persson, A., Stirna, J. (eds.) CAiSE 2004. LNCS, vol. 3084, pp. 477–491. Springer, Heidelberg (2004)

    Google Scholar 

  18. Norguet, J.P., Zimányi, E., Steinberger, R.: Improving web sites with web usage mining, web content mining, and semantic analysis. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006. LNCS, vol. 3831, pp. 430–439. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Steinberger, R., Pouliquen, B., Ignat, C.: Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis applications. In: Proc. B of the 7th Int. Multiconference on Language Technologies, IS 2004 (2004)

    Google Scholar 

  20. Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intelligent Systems 16(2), 72–79 (2001)

    Article  Google Scholar 

  21. Steinberger, R., Pouliquen, B., Ignat, C.: Navigating multilingual news collection using automatically extracted information. In: Proc. of the 27th Int. Conf. on Information Technology Interfaces, ITI (2005)

    Google Scholar 

  22. Lozano-Tello, A., Gómez-Pérez, A.: ONTOMETRIC: A method to choose the appropriate ontology. J. of Database Manag. 15(2), 1–18 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Norguet, JP., Zimányi, E., Steinberger, R. (2009). Semantic Analysis of Web Site Audience by Integrating Web Usage Mining and Web Content Mining. In: Ting, IH., Wu, HJ. (eds) Web Mining Applications in E-commerce and E-services. Studies in Computational Intelligence, vol 172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88081-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88081-3_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88080-6

  • Online ISBN: 978-3-540-88081-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics