Abstract
The paper describes an alternative method of website analysis and optimization that combines methods of web usage and web structure mining - discovering of web users’ behaviour patterns as well as discovering knowledge from the website structure. Its primary objective is identifying of web pages, in which the value of their importance, estimated by the website developers, does not correspond to the real behaviour of the website visitors. It was proved before that the expected visit rate correlate with the observed visit rate of the web pages. Consequently, the expected probabilities of visiting of web pages by a visitor were calculated using the PageRank method and observed probabilities were obtained from the web server log files using the web usage mining method. The observed and expected probabilities were compared using the residual analysis. While the sequence rules analysis can only uncover the potential problem of web pages with higher visit rate, the proposed method of residual analysis can also consider other web pages with a smaller visit rate. The obtained results can be successfully used for a website optimization and restructuring, improving website navigation, and adaptive website realisation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor. Newsl. 1, 12–23 (2000)
Liu, Y., Zhang, M., Cen, R., Ru, L., Ma, S.: Data cleansing for web information retrieval using query independent features. J. Am. Soc. Inform. Sci. Technol. 58, 1884–1898 (2007)
Chau, M., Chen, H.: A machine learning approach to web page filtering using content and structure analysis. Decis. Support Syst. 44, 482–494 (2008)
Jacob, A., Olivier, C., Carlos, C.: WITCH: a new approach to web spam detection. Yahoo! Research report no. YR-2008-001 (2008)
Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: web spam detection using the web topology. In: Conference Know Your Neighbors: Web Spam Detection Using the Web Topology, pp. 423–430. ACM (2006)
Gan, Q., Suel, T.: Improving web spam classifiers using link structure. In: Conference Improving Web Spam Classifiers Using Link Structure, pp. 17–20. ACM (2007)
Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Conference Detecting Spam Web Pages Through Content Analysis, pp. 83–92 (2006)
Stencl, M., St’astny, J.: Neural network learning algorithms comparison on numerical prediction of real data. In: Matousek, R. (ed.) 16th International Conference on Soft Computing Mendel 2010, pp. 280–285 (2010)
Lorentzen, D.G.: Webometrics benefitting from web mining? an investigation of methods and applications of two research fields. Scientometrics 99, 409–445 (2014)
Lili, Y., Yingbin, W., Zhanji, G., Yizhuo, C.: Research on PageRank and hyperlink-induced topic search in web structure mining. In: Conference Research on PageRank and Hyperlink-Induced Topic Search in Web Structure Mining, pp. 1–4 (2011)
Wu, G., Wei, Y.: Arnoldi versus GMRES for computing pageRank: a theoretical contribution to google’s pageRank problem. ACM Trans. Inf. Syst. 28, 1–28 (2010)
Jain, A., Sharma, R., Dixit, G., Tomar, V.: Page ranking algorithms in web mining, limitations of existing methods and a new method for indexing web pages. In: Proceedings of the 2013 International Conference on Communication Systems and Network Technologies, pp. 640–645. IEEE Computer Society (2013)
Ahmadi-Abkenari, F., Selamat, A.: A clickstream based web page importance metric for customized search engines. In: Nguyen, N.T. (ed.) Transactions on Computational Collective Intelligence XII. LNCS, vol. 8240, pp. 21–41. Springer, Heidelberg (2013)
Agichtein, E., Brill, E., Dumais, S.: Improving web search ranking by incorporating user behavior information. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–26. ACM, Seattle (2006)
Meiss, M.R., Menczer, F., Fortunato, S., Flammini, A., Vespignani, A.: Ranking web sites with real user traffic. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 65–76. ACM, Palo Alto (2008)
Su, J.-H., Wang, B.-W., Tseng, V.S.: Effective ranking and recommendation on web page retrieval by integrating association mining and PageRank. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 03, pp. 455–458. IEEE Computer Society (2008)
Pabarskaite, Z., Raudys, A.: A process of knowledge discovery from web log data: systematization and critical review. J. Intell. Inf. Syst. 28, 79–104 (2007)
Shutong, C., Congfu, X., Hongwei, D.: Website structure optimization technology based on customer interest clustering algorithm. In: Conference Website Structure Optimization Technology Based on Customer Interest Clustering Algorithm, pp. 802–804 (2008)
Wen-long, L., Ye-zheng, L.: A novel website structure optimization model for more effective web navigation. In: Conference A Novel Website Structure Optimization Model for More Effective Web Navigation, pp. 36–41 (2008)
Jeffrey, J., Karski, P., Lohrmann, B., Kianmehr, K., Alhajj, R.: Optimizing web structures using web mining techniques. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 653–662. Springer, Heidelberg (2007)
Wang, H., Liu, X.: Adaptive site design based on web mining and topology. In: Conference Adaptive Site Design Based on Web Mining and Topology, pp. 184–189 (2009)
Romero, C., Ventura, S., Zafra, A., Bra, P.D.: Applying web usage mining for personalizing hyperlinks in web-based adaptive educational systems. Comput. Educ. 53, 828–840 (2009)
Park, S., Suresh, N.C., Jeong, B.-K.: Sequence-based clustering for web usage mining: a new experimental framework and ANN-enhanced K-means algorithm. Data Knowl. Eng. 65, 512–543 (2008)
Hay, B., Wets, G., Vanhoof, K.: Web usage mining by means of multidimensional sequence alignment methods. In: Zaïane, O.R., Srivastava, J., Spiliopoulou, M., Masand, B. (eds.) WebKDD 2003. LNCS (LNAI), vol. 2703, pp. 50–65. Springer, Heidelberg (2003)
Hay, B., Wets, G., Vanhoof, K.: Segmentation of visiting patterns on web sites using a sequence alignment method. J. Retail. Consum. Serv. 10, 145–153 (2003)
Masseglia, F., Tanasa, D., Trousse, B.: Web usage mining: sequential pattern extraction with a very low support. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 513–522. Springer, Heidelberg (2004)
Oyanagi, S., Kubota, K., Nakase, A.: Mining WWW access sequence by matrix clustering. In: Zaïane, O.R., Srivastava, J., Spiliopoulou, M., Masand, B. (eds.) WebKDD 2003. LNCS (LNAI), vol. 2703, pp. 119–136. Springer, Heidelberg (2003)
Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowl. Inf. Syst. 1(1), 5–32 (1999)
Spiliopoulou, M., Faulstich, L.C.: WUM: a tool for web utilization analysis. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 184–203. Springer, Heidelberg (1999)
Chen, M.-S., Park, J.S., Yu, P.S.: Data mining for path traversal patterns in a web environment. In: Conference Data Mining for Path Traversal Patterns in a Web Environment, pp. 385–392 (1996)
Berendt, B., Spiliopoulou, M.: Analysis of navigation behaviour in web sites integrating multiple information systems. VLDB J. 9, 56–75 (2000)
Guerbas, A., Addam, O., Zaarour, O., Nagi, M., Elhajj, A., Ridley, M., Alhajj, R.: Effective web log mining and online navigational pattern prediction. Knowl.-Based Syst. 49, 50–62 (2013)
Cooley, R.: Web usage mining: discovery and application of interesting patterns from web data. Ph.D. thesis. University of Minnesota (2000)
Schmitt, E., Manning, H., Paul, Y., Tong, J.: Measuring Web Success. Forrester report (1999)
Downey, D., Dumais, S., Horvitz, E.: Models of searching and browsing: languages, studies, and applications. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 2740–2747. Morgan Kaufmann Publishers Inc., Hyderabad (2007)
Chien, S., Immorlica, N.: Semantic similarity between search engine queries using temporal correlation. In: Proceedings of the 14th International Conference on World Wide Web, pp. 2–11. ACM, Chiba (2005)
He, D., Göker, A.: Detecting session boundaries from web user logs. In: Conference Detecting Session Boundaries from Web User Logs, pp. 57–66 (2000)
Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 239–248. ACM, Chicago (2005)
Huynh, T., Miller, J.: Empirical observations on the session timeout threshold. Inf. Process. Manage. 45, 513–528 (2009)
Zhang, J., Ghorbani, A.A.: The reconstruction of user sessions from a server log using improved time-oriented heuristics. In: Conference The reconstruction of User Sessions from a Server Log Using Improved Time-Oriented Heuristics, pp. 315–322 (2009)
Seco, N., Cardoso, N.: Detecting user sessions in the Tumba! query log. Technical report., Faculdade de Ciências da Universidade de Lisboa (2006)
Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web-usage analysis. INFORMS J. Comput. 15, 171–190 (2003)
Gong, W., Baohui, T.: A new path filling method on data preprocessing in web mining. In: Conference A New Path Filling Method on Data Preprocessing in Web Mining, pp. 1033–1035 (2008)
Dhawan, S., Lathwal, M.: Study of preprocessing methods in web server logs. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 430–433 (2013)
Li, Y., Feng, B., Mao, Q.: Research on path completion technique in web usage mining. In: Proceedings of the 2008 International Symposium on Computer Science and Computational Technology, vol. 01, pp. 554–559. IEEE Computer Society (2008)
Tauscher, L., Greenberg, S.: Revisitation patterns in World Wide Web navigation. In: Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 399–406. ACM, Atlanta (1997)
Chitraa, V., Davamani, A.S.: An Efficient path completion technique for web log mining. In IEEE International Conference on Computational Intelligence and Computing Research (2010)
Zhang, C., Zhuang, L.: New path filling method on data preprocessing in web mining. Proc. Comput. Inf. Sci. 1, 112–115 (2008)
Liu, B.: Web data mining. Springer, New York (2007)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30, 107–117 (1998)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Standford Digital (1998)
Pirolli, P., Pitkow, J., Rao, R.: Silk from a sow’s ear: extracting usable structures from the web. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 118–125. ACM, Vancouver (1996)
Munk, M., Kapusta, J., Švec, P.: Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor. Procedia Comput. Sci. 1, 2273–2280 (2010)
Kapusta, J., Munk, M.: Web usage mining: analysis of expeced and observed visit rate UKF (2014)
Pilkova, A., Volna, J., Papula, J., Holienka, M.: The influence of intellectual capital on firm performance among slovak SMEs. In: Proceedings of the 10th International Conference on Intellectual Capital, Knowledge Management and Organisational Learning (Icickm-2013), pp. 329–338 (2013)
Kumar, P.R., Singh, A.K., Mohan, A.: Efficient methodologies to optimize website for link structure based search engines. In: Conference Efficient Methodologies to Optimize Website for Link Structure Based Search Engines, pp. 719–724 (2013)
Acknowledgements
This paper is published with the financial support of the project of Scientific Grant Agency (VEGA), project number VEGA 1/0392/13.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kapusta, J., Munk, M., Drlík, M. (2015). Identification of Underestimated and Overestimated Web Pages Using PageRank and Web Usage Mining Methods. In: Nguyen, N. (eds) Transactions on Computational Collective Intelligence XVIII. Lecture Notes in Computer Science(), vol 9240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48145-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-662-48145-5_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48144-8
Online ISBN: 978-3-662-48145-5
eBook Packages: Computer ScienceComputer Science (R0)