Skip to main content

Web Structure in 2005

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4936))

Abstract

The estimated number of static web pages in Oct 2005 was over 20.3 billion, which was determined by multiplying the average number of pages per web server based on the results of three previous studies, 200 pages, by the estimated number of web servers on the Internet, 101.4 million. However, based on the analysis of 8.5 billion web pages that we crawled by Oct. 2005, we estimate the total number of web pages to be 53.7 billion. This is because the number of dynamic web pages has increased rapidly in recent years. We also analyzed the web structure using 3 billion of the 8.5 billion web pages that we have crawled. Our results indicate that the size of the ”CORE,” the central component of the bow tie structure, has increased in recent years, especially in the Chinese and Japanese web.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lawrence, S., Giles, C.L.: Searching the World Wide Web. Science 280(5360), 98–100 (1998)

    Article  Google Scholar 

  2. Lawrence, S., Giles, C.L.: Accessibility of Information on the Web. Nature, 400, 107–109 (1999)

    Article  Google Scholar 

  3. Institute for Information and Communications Policy, Statistics Investigation Report for contents on the World-Wide Web (2004), http://www.soumu.go.jp/iicp/chousakenkyu/seika/houkoku.html

  4. Netcraft: Web Server Survey (November 2006), http://news.netcraft.com/archives/2006/11/01/november_2006_web_server_survey.html

  5. e-Society Project, http://www.yama.info.waseda.ac.jp/~yamana/e-society/index_eng.htm

  6. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., State, R., Tomkins, A., Wiener, J.: Graph structure in the web. In: Proc. of 9th World Wide Web Conf., pp. 309–320 (2000)

    Google Scholar 

  7. Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Structural Properties of the African Web. In: Poster Proc. of 11th World Wide Web Conf., (2002)

    Google Scholar 

  8. Lie, G., Yu, Y., Han, J., Xue, G.: China web graph measurements and evolution. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, Springer, Heidelberg (2005)

    Google Scholar 

  9. Bharat, K., Broder, A.: A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines. Journal of Computer Networks and ISDN Systems 30(1-7), 379–388 (1998)

    Article  Google Scholar 

  10. Henzinger, M., Heydon, A., Mitzenmacher, M., Najork, M.: Measuring Index Quality using Random Walks on the Web. In: Proc. of 8th World Wide Web Conf., pp. 213–225 (1999)

    Google Scholar 

  11. Vaughan, L., Thelwall, M.: Search Engine Coverage Bias: Evidence and Possible Causes. Journal of Information Processing and Management 40(4), 693–707 (2004)

    Article  Google Scholar 

  12. Bar-Yossef, Z., Gurevich, M.: “Random Sampling from a Search Engine’s Index. In: Proc. of 15th World Wide Web Conf., pp. 367–376 (2006)

    Google Scholar 

  13. Basis Technology Rosette Language Identifier, http://www.basistech.com/language-identification/

  14. Hirai, H., Raghavan, S., G-Molina, H., Paepcke, A.: Webbase: A repository of the Web. In: Proc. of 9th World Wide Conf., pp. 277–293 (2000)

    Google Scholar 

  15. The Stanford WebBase Project, http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/

Download references

Author information

Authors and Affiliations

Authors

Editor information

William Aiello Andrei Broder Jeannette Janssen Evangelos Milios

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hirate, Y., Kato, S., Yamana, H. (2008). Web Structure in 2005. In: Aiello, W., Broder, A., Janssen, J., Milios, E. (eds) Algorithms and Models for the Web-Graph. WAW 2006. Lecture Notes in Computer Science, vol 4936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78808-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78808-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78807-2

  • Online ISBN: 978-3-540-78808-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics