Abstract
The amount of information accumulating on World Wide Web is growing in size exponentially. This led to difficulty in accessing the relevant information as it becomes tough for a user to access his required information in minimum amount of time. As a result of single query placed by user in search engine a large number of search results appear in front of him and to dig out the most relevant web link becomes a cumbersome task for user which can lead to decrease in trust for search engine. This paper proposes an approach for web structure and web usage mining by using iterative improvement algorithm. Iterative improvement is a randomized algorithm which is used for solving combinatorial optimization problem. This technique helps in selecting top T web pages and prioritizing them in relevance order. Experimental evaluation has been done which shows significant improvement in performance. The parameters used are access frequency, time duration, no of visitors, hubs and authorities. They cover the area of both web structure and web usage mining.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Seyfi, A., Patel, A.: A focused crawler combinatory link and content model based on T-graph principles. Comput. Stand. Interfaces 43, 1–11 (2016)
Derhami, V., Khodadadian, E., Ghasemzadeh, M., Bidoki, A.M.Z.: Applying reinforcement learning for web pages ranking algorithms. Appl. Soft Comput. 13(4), 1686–1692 (2013)
Bidoki, A.M.Z., Ghodsnia, P., Yazdani, N., Oroumchian, F.: A3CRank: an adaptive ranking method based on connectivity, content and click-through data. Inf. Process. Manag. 46(2), 159–169 (2010)
Zheng, Z., Chen, K., Sun, G., Zha, H.: A regression framework for learning ranking functions using relative relevance judgments. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 287–294, July 2007
Bidoki, A.M.Z., Yazdani, N., Ghodsnia, P.: FICA: a novel intelligent crawling algorithm based on reinforcement learning. Web Intell. Agent Syst.: Int. J. 7(4), 363–373 (2009)
Choi, D.Y.: Enhancing the power of web search engines by means of fuzzy query. Decis. Support Syst. 35(1), 31–44 (2003)
Wang, H., Li, Y., Guo, K.: Countering web spam of link-based ranking based on link analysis. Procedia Eng. 23, 310–315 (2011)
Gupta, S.K., Singh, D., Doegar, A.: Web documents prioritization using genetic algorithm. In: IEEE International Conference on Computing for Sustainable Global Development (INDIACom), pp. 3042–3047 (2016)
Chaudhary, K., Gupta, S.K.: Prioritizing web links based on web usage and content data. In: IEEE International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), pp. 546–551 (2014)
Johnson, F., Kumar, S.: Web content mining using genetic algorithm. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds.) ICAC3 2013. CCIS, vol. 361, pp. 82–93. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36321-4_8
Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 993–1002 (2011)
Koundal, D.: Prioritize the ordering of URL queue in focused crawler. J. AI Data Min. 2(1), 25–31 (2014)
Bendersky, M., Croft, W.B., Diao, Y.: Quality-biased ranking of web documents. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 95–104 (2011)
Cho, J., Roy, S., Adams, R.E.: Page quality: in search of an unbiased web ranking. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 551–562 (2005)
Abdullah, S., Burke, E.K., McCollum, B.: Using a randomised iterative improvement algorithm with composite neighbourhood structures for the university course timetabling problem. In: Doerner, K.F., Gendreau, M., Greistorfer, P., Gutjahr, W., Hartl, R.F., Reimann, M. (eds.) Metaheuristics, pp. 153–169. Springer, Boston (2007). https://doi.org/10.1007/978-0-387-71921-4_8
Xue, G.R., Zeng, H.J., Chen, Z., Yu, Y., Ma, W.Y., Xi, W., Fan, W.: Optimizing web search using web click-through data. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 118–126 (2004)
Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on web usage mining. Commun. ACM 43(8), 142–151 (2000)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)
Narasimhan, H., Satheesh, S.: A randomized iterative improvement algorithm for photomosaic generation. In: Nature & Biologically Inspired Computing World Congress, pp. 777–781 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chaudhary, K., Gupta, N., Kumar, S. (2018). Web Documents Prioritization Using Iterative Improvement. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds) Smart and Innovative Trends in Next Generation Computing Technologies. NGCT 2017. Communications in Computer and Information Science, vol 827. Springer, Singapore. https://doi.org/10.1007/978-981-10-8657-1_35
Download citation
DOI: https://doi.org/10.1007/978-981-10-8657-1_35
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8656-4
Online ISBN: 978-981-10-8657-1
eBook Packages: Computer ScienceComputer Science (R0)