Web Crawler Architecture

Najork, Marc

doi:10.1007/978-1-4614-8265-9_457

Marc Najork³

64 Accesses
2 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

Boldi P, Codenotti B, Santini M, Vigna S. UbiCrawler: a scalable fully distributed web crawler. Software Pract Exper. 2004;34(8):711–26.
Article Google Scholar
Brin S, Page L. The anatomy of a large-scale hypertextual search engine. In: Proceedings of the 7th International World Wide Web Conference; 1998. p. 107–17.
Google Scholar
Burner M. Crawling towards eternity: building an archive of the world wide web. Web Tech Mag. 1997;2(5):37–40.
Google Scholar
Cho J, Garcia-Molina H. Parallel crawlers. In: Proceedings of the 11th International World Wide Web Conference; 2002. p. 124–35.
Google Scholar
Eichmann D. The RBSE spider – balancing effective search against web load. In: Proceedings of the 3rd International World Wide Web Conference; 1994.
Google Scholar
Gray M. Internet growth and statistics: credits and background. http://www.mit.edu/people/mkgray/net/background.html
Hafri Y, Djeraba C. High performance crawling system. In: Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval; 2004. p. 299–306.
Google Scholar
Heydon A, Najork M. Mercator: a scalable, extensible web crawler. World Wide Web. 1999;2(4): 219–29.
Article Google Scholar
Najork M, Heydon A. High-performance web crawling. Compaq SRC Research Report 173, Sept 2001.
Google Scholar
Raghavan S, Garcia-Molina H. Crawling the hidden web. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 129–38.
Google Scholar
Shkapenyuk V, Suel T. Design and implementation of a high-performance distributed web crawler. In: Proceedings of the 18th International Conference on Data Engineering; 2002. p. 357–68.
Google Scholar

Download references

Author information

Authors and Affiliations

Google, Inc., Mountain View, CA, USA
Marc Najork

Authors

Marc Najork
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Najork .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

Google Research, New York, NY, USA
Cong Yu

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Najork, M. (2018). Web Crawler Architecture. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_457

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_457
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics