Advertisement

Fuzzy Based Efficient Mechanism for URL Assignment in Dynamic Web Crawler

  • Raghav Sharma
  • Rajesh Bhatia
  • Sahil Garg
  • Gagangeet Singh Aujla
  • Ravinder Singh MannEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 712)

Abstract

World wide web (WWW) is a huge collection of unorganized documents. To build the database from this unorganized network, web crawlers are often used. The crawler which interacts with millions of web pages needs to be efficient in order to make a search engine powerful. This utmost requirement necessitates the parallelization of web crawlers. In this work, a fuzzy-based technique for uniform resource locater (URL) assignment in dynamic web crawler is proposed that utilizes the task splitting property of the processor. In order to optimize the performance of the crawler, the proposed scheme addresses two important aspects, (i) creation of crawling framework with load balancing among parallel crawlers, and (ii) making of crawling process faster by using parallel crawlers with efficient network access. Several experiments are conducted to monitor the performance of the proposed scheme. The results prove the effectiveness of the proposed scheme.

Keywords

Web crawlers Uniform resource locater Static parallel crawler Dynamic parallel crawler Search engines Fuzzy logic 

References

  1. 1.
    Grover, S., Aujla, G.S.: Twitter data based prediction model for influenza epidemic. In: 2nd IEEE International Conference on Computing for Sustainable Global Development (INDIACom), pp. 873–879, March 2015Google Scholar
  2. 2.
    Etter, V., Grossglauser, M., Thiran, P.: Launch hard or go home!: predicting the success of kickstarter campaigns. In: 1st ACM Conference on Online Social Networks, pp. 177–182 (2013)Google Scholar
  3. 3.
    Seyfi, A., Patel, A.: A focused crawler combinatory link and content model based on t-graph principles. Comput. Standards Interfaces 43, 1–11 (2016)Google Scholar
  4. 4.
    Lu, H., Zhan, D., Zhou, L., He, D.: An improved focused crawler: using web page classification and link priority evaluation. Math. Probl. Eng. (2016)Google Scholar
  5. 5.
    Merlet, J.-P., Gosselin, C., Huang, T.: Parallel mechanisms. In: Springer Hand-book of Robotics, pp. 443–462. Springer, Heidelberg (2016)Google Scholar
  6. 6.
    Marin, M., Paredes, R., Bonacic, C.: High-performance priority queues for parallel crawlers. In: 10th ACM Workshop on Web Information and Data Management, pp. 47–54 (2008)Google Scholar
  7. 7.
    Ahmadi-Abkenari, F., Selamat, A.: An architecture for a focused trend parallel web crawler with the application of clickstream analysis. Inf. Sci. 184(1), 266–281 (2012)CrossRefGoogle Scholar
  8. 8.
    Cho, J., Garcia-Molina, H.: Parallel crawlers. In: 11th ACM International Conference on World Wide Web, pp. 124–135 (2002)Google Scholar
  9. 9.
    Chau, D.H., Pandit, S., Wang, S., Faloutsos, C.: Parallel crawling for online social networks. In: 16th ACM International Conference on World Wide Web, pp. 1283–1284 (2007)Google Scholar
  10. 10.
    Batsakis, S.E., Petrakis, G., Milios, E.: Improving the performance of focused web crawlers. Data Knowl. Eng. 68(10), 1001–1013 (2009)Google Scholar
  11. 11.
    Yadav, D., Sharma, A., Sanchez-Cuadrado, S., Morato, J.: An approach to design incremental parallel webcrawler. J. Theoret. Appl. Inf. Technol. 43(1), 08–29 (2012)Google Scholar
  12. 12.
    Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the web. ACM Trans. Internet Technol. 1(1), 2–43 (2001)CrossRefGoogle Scholar
  13. 13.
    Bedi, P., Thukral, A., Banati, H., Behl, A., Mendiratta, V.: A multi-threaded semantic focused crawler. J. Comput. Sci. Technol. 27(6), 1233–1242 (2012)CrossRefGoogle Scholar
  14. 14.
    Zhao, F., Zhou, J., Nie, C., Huang, H., Jin, H.: Smartcrawler: a two-stage crawler for efficiently harvesting deep-web interfaces. IEEE Trans. Serv. Comput. 9(4), 608–620 (2016)CrossRefGoogle Scholar
  15. 15.
    Guerriero, A., Ragni, F., Martines, C.: A dynamic URL assignment method for parallel web crawler. In: IEEE International Conference on Computational Intelligence for Measurement Systems and Applications, pp. 119–123, September 2010Google Scholar
  16. 16.
    Bhaginath, W.R., Shingade, S., Shirole, M.: Virtualized dynamic URL assignment web crawling model. In: International Conference on Advances in Engineering Technology Research, pp. 1–7, August 2014Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  • Raghav Sharma
    • 1
  • Rajesh Bhatia
    • 1
  • Sahil Garg
    • 2
  • Gagangeet Singh Aujla
    • 2
  • Ravinder Singh Mann
    • 3
    Email author
  1. 1.PEC University of TechnologyChandigarhIndia
  2. 2.Thapar UniversityPatialaIndia
  3. 3.Lyallpur Khalsa College of EngineeringJalandharIndia

Personalised recommendations