Abstract
Distributed crawling has shown that it can overcome important limitations of the centralized crawling paradigm. However, the distributed nature of current distributed crawlers is currently not fully utilized. The optimal benefits of this approach are usually limited to the sites hosting the crawler. In this work we describe IPMicra, a distributed location aware web crawler that utilizes an IP address hierarchy and allows crawling of links in a near optimal location aware manner. The crawler outperforms earlier distributed crawling approaches without a significant overhead.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bowman, C.M., Danzig, P.B., Hardy, D.R., Manber, U., Schwartz, M.F.: The Harvest information discovery and access system. Computer Networks and ISDN Systems 28(1-2), 119–125 (1995)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)
Fiedler, J., Hammer, J.: Using the web efficiently: Mobile crawlers. In: Proceedings of the Seventeenth AoM/IAoM International Conference on Computer Science, pp. 324–329. Maximilian Press Publishers, San Diego (1999)
Hammer, J., Fiedler, J.: Using mobile crawlers to search the web efficiently. International Journal of Computer and Information Science 1(1), 36–58 (2000)
Heydon, A., Najork, M.: Mercator: A scalable, extensible web crawler. World Wide Web 2(4), 219–229 (1978)
Google Inc. Google (September 2003), http://www.google.com/
Google Inc. Google search appliance (February 2004), http://www.google.com/appliance
Lawrence, S., Lee Giles, C.: Accessibility of information on the web. Nature 400(6740), 107–109 (1999)
LookSmart Ltd. Grub distributed internet crawler (2003), http://www.grub.org
Papapetrou, O., Papastavrou, S., Samaras, G.: Distributed indexing of the web using migrating crawlers. In: Proceedings of the Twelfth International World Wide Web Conference, WWW (2003)
Papapetrou, O., Papastavrou, S., Samaras, G.: Ucymicra: Distributed indexing of the web using migrating crawlers. In: Proceedings of the 7th East-European Conference on Advanced Databases and Information Systems, Dresden, Germany (2003)
SETI. Search for extra terrestrial intelligence (January 2004), http://setiathome.ssl.berkeley.edu/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Papapetrou, O., Samaras, G. (2004). Minimizing the Network Distance in Distributed Web Crawling. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2004: CoopIS, DOA, and ODBASE. OTM 2004. Lecture Notes in Computer Science, vol 3290. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30468-5_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-30468-5_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23663-4
Online ISBN: 978-3-540-30468-5
eBook Packages: Springer Book Archive