A Study on Different Types of Web Crawlers
The world wide web is a global information medium in which as many people as possible explore the information around the world. Search engine is a place where internet users search for the required content and the results are returned to users through websites, images or videos. Here web crawlers emerged that browses the web to gather and download pages relevant to user topics and store them in a large repository that makes the search engine more efficient. These web crawlers are becoming more important and growing daily. This paper presents the various web crawler types and their architectures. Comparisons are analyzed between these crawlers.
KeywordsWeb crawler Focused crawler Incremental crawler Distributed crawler Parallel crawler Hidden web crawler
The authors express gratitude towards the assistance provided by Accendere Knowledge Management Services Pvt. Ltd. In preparing the manuscripts. We also thank our mentors and faculty members who guided us throughout the research and helped us in achieving desired results.
- 1.Gupta, S.B.: The issues and challenges with the web crawlers. Int. J. Inf. Technol. Syst. 1, 1–10 (2012)Google Scholar
- 2.Castillo, C.: Effective web crawling. Ph.D. thesis. University of Chile (2004). Accessed 03 Oct 2018Google Scholar
- 3.Suebchua, T., Rungsawang, A., Yamana, H.: Adaptive focused website segment crawler. In: 19th International Conference on Network-Based Information Systems, pp. 181–187 (2016)Google Scholar
- 4.Gupta, A., Anand, P.: Focused web crawlers and its approaches. In: 2015 1st International Conference on Futuristic Trends on Computational Analysis and Knowledge Management ABLAZE 2015, pp. 619–622 (2015)Google Scholar
- 6.Yu, H., Han, J.: PEBL: positive example based learning for web page classification using SVM. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)Google Scholar
- 7.Sharma, S., Gupta, P.: The anatomy of web crawlers. In: International Conference on Computing, Communication and Automation ICCCA 2015, pp. 849–853 (2015)Google Scholar
- 9.Yuhao, F.: Design and implementation of distributed crawler system based on Scrapy. In: IOP Conference Series: Earth and Environmental Science, pp. 1–5 (2018)Google Scholar
- 10.Kumar, D., Mishra, R.: Deep web performance enhance on search engine. In: International Conference on Soft Computing Techniques and Implementations, ICSCTI 2015, pp. 137–140 (2015)Google Scholar
- 11.Raghavan, S., Garcia-Molina, H.: Crawling the hidden web. In: 27th VLDB Conference, Roma, Italy, pp. 1–10 (2010)Google Scholar