Article Outline
Glossary
Definition of the Subject
Introduction
Graph Structure
Algorithms
Subsets of the Web Graph
Future Directions
Bibliography
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- Graph:
-
A set of nodes (vertices) connected by links (edges, arcs). In the Web graph, the nodes are webpages, and the edges are the hyperlinks between them.
- Indegree:
-
The number incoming edges to a node; in the case of the Web, it is the number of webpagespointing to a page.
- Outdegree:
-
The number of outgoing edges; in the case of the Web, it is the number of webpages a webpage points to.
- Strongly connected component:
-
A set of webpages such that any page can be reached from any other page by following hyperlinks.
- Weakly connected component:
-
A set of webpages such that any page can be reached from any other page by treating the hyperlinksas undirected.
- URL:
-
A unique resource locator that corresponds to an online information source.
- Hypertext transfer protocol (HTTP):
-
The communications protocol that allows web clients to communicate with web servers.
- Web server:
-
A program that responds to requests to for web pages.
- Webpage:
-
An information resource, identified bya URL, that is usually but not necessarily in the HTML (Hypertext Markup Language) format.A webpage may be static, meaning that it is stored on theserver as a document, or dynamic, meaning that it isgenerated dynamically at the point that it is requested by thebrowser, using scripts and/or back‐end databases.
- Domain:
-
A name that identifies one or more IP addresses, e. g. umich.edu.
- Top level domain:
-
The suffix of the domain name, sometimescorresponding to the purpose of the website or the country of origin for the website, e. g. “edu”, “com”, “gov”, “uk”, “cn”, etc.
- Website:
-
A collection of webpages that is hosted on one or more web servers and that share a common root URL, e. g. “www.springer.com”.
- Randomized network:
-
A network that preserves the degrees of each node relative to the original, but the edges themselves are rewired.
Bibliography
Primary Literature
Gray M (1997)Web growth summary.www.mit.edu/~mkgray/net/web-growth-summary.html. Accessed 9 March 2008
Walton M (2006)Web reaches new milestone: 100 million sites.www.cnn.com/2006/TECH/internet/11/01/100millionwebsites/index.html.Accessed 9 March 2008
Cho J, Garcia-Molina H, Haveliwala T, Lam W, Paepcke A, Raghavan S, Wesley G (2006)Stanford webbase components and applications.ACM Trans Inter Tech 6(2):153–186
Boldi P, Vigna S (2004) The webgraph framework I: compressiontechniques. In: Proceedings of the 13th International Conference on World Wide Web, New York, NY, pp 595–602
Boldi P, Codenotti B, Santini M, Vigna S (2004)UbiCrawler: a scalable fully distributed Web crawler.Softw Pract Experience 34(8):711–726
Adamic LA (1999)The Small World Web.Proc ECDL 99:443–452
Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S,Stata R, Tomkins A, Wiener J (2000)Graph structure in the Web.Comput Netw 33(1–6):309–320
Joshi A, Kumar R, Reed B, Tomkins A (2007)Anchor-based proximity measures. In: Proceedings of the 16th International World Wide Web Conference, Banff, Canada
Fetterly D, Manasse M, Najork M (2004) Spam, damn spam, and statistics:using statistical analysis to locate spam web pages. In: Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACMSIGMOD/PODS 2004, Paris, France, pp 1–6
Zhang B, Li H, Liu Y, Ji L, Xi W, Fan W, Chen Z, Ma WY (2005)Improving web search results using affinity graph.In: SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM, New York, pp 504–511
Broder AZ, Lempel R, Maghoul F, Pedersen J (2006)Efficient PageRank approximation via graph aggregation.Inf Retr 9(2):123–138
Adamic LA, Huberman BA (2002)Zipfs law and the internet.Glottometrics 3:143–150
Mitzenmacher M (2003)A brief history of generative models for power law and lognormal distributions.Internet Math 1(2):226–251
Newman MEJ (2005)Power laws, Pareto distributions and Zipf's law.Contemp Phys 46(5):323–351
Barabási AL, Albert R (1999)Emergence of Scaling in Random Networks.Science 286(5439):509
Pennock DM, Flake GW, Lawrence S, Glover EJ, Giles CL (2002)Winners don't take all: Characterizing the competition for links on the web.Proc Natl Acad Sci 99(8):5207
Newman MEJ (2002)Assortative Mixing in Networks.Phys Rev Lett 89(20):208701
Dill S, Kumar R, McCurley KS, Rajogopalan S, Sivakumar D, Tomkins A (2002)Self-Similarity In the Web.ACM Trans Internet Technol 2(3):205–223
Albert R, Jeong H, Barabási AL (1999) Internet: Diameter of theworld-wide web.Nature 401:130–131. doi:10.1038/43601
Garlaschelli D, Loffredo MI (2004)Patterns of link reciprocity in directed networks.Phys Rev Lett 93(26):268701
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002)Network Motifs: Simple Building Blocks of Complex Networks. Science 298(5594):824–827
Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U (2004) Superfamilies of designed and evolved networks.Science 303(5663):1538–1542
Brewington BE, Cybenko G (2000)How dynamic is the Web?Computer Netw 33(1–6):257–276
Fetterly D, Manasse M, Najork M, Wiener JL (2004)A large-scale study of the evolution of Web pages.Softw Pract Experience 34(2):213–237
Cho J, Garcia-Molina H (2003)Estimating frequency of change.ACM Trans Internet Technol 3(3):256–290
Douglis F, Feldmann A, Krishnamurthy B, Mogul J (1997)Rate of change and other metrics: a live study of the world wide web.USENIX Symposium on Internet Technologies and Systems, vol 119
Dasgupta A, Ghosh A, Kumar R, Olston C, Pandey S, Tomkins A (2007) Thediscoverability of the web. In: Proceedings of the 16th International Conference on World Wide Web, Banff, Canada,pp 421–430
Huberman BA, Pirolli PLT, Pitkow JE, Lukose RM (1998)Strong Regularities in World Wide Web Surfing.Science 280(5360):95
Lawrence S, Giles CL (1998)Searching the World Wide Web.Science 280(5360):98
Albert R, Barabási AL (2002)Statistical mechanics of complex networks.Rev Mod Phys 74(1):47–97
Simon HA (1955)On a Class of Skew Distribution Functions.Biometrika 42(3/4):425–440
Yule GU (1925)A Mathematical Theory of Evolution, Based on the Conclusions of Dr. JC Willis, FRS.Philos Trans Roy Soc Lond Ser B, Containing Papers of a Biological Character 213:21–87
Price DS (1976)A general theory of bibliometric and other cumulative advantage processes.J Am Soc Inf Sci 27(5–6):292–306
Adamic LA, Huberman BA (2000)Power-law distribution of the world wide web.Science 287(5461):2115a
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’networks. Nature 393(6684):440–442
Albert R, Barabási A-L (2000) Topology of Evolving Networks: Local Events andUniversality. Phys Rev Lett 85(24):5234–5237
Dorogovtsev SN, Mendes JFF (2000) Scaling behaviour of developing and decayingnetworks. Europhys Lett 52(1):33–39
Bianconi G, Barabasi AL (2001) Competition and multiscaling in evolvingnetworks. Europhys Lett 54(4):436–442
Jeong H, Neda Z, Barabasi AL (2003) Measuring preferential attachment in evolvingnetworks. Europhys Lett 61(4):567–572
Vázquez A (2003) Growing network with local rules: Preferential attachment,clustering hierarchy, and degree correlations. Phys Rev E 67(5):56104
Jackson MO, Rogers BW (2007) Meeting Strangers and Friends of Friends: How RandomAre Social Networks? Am Econ Rev 97(3):890–915
Kumar R, Raghavan P, Rajagopalan S, Sivakumar D, Tomkins A, Upfal E (2000)Stochastic models for the web graph. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, Redondo Beach,p 57
Leskovec J, Kleinberg J, Faloutsos C (2007)Graph evolution: Densification and shrinking diameters.ACM Trans Knowl Discov Data (TKDD) 1(1)
Menczer F (2002)Growing and navigating the small world Web by local content.Proc Natl Acad Sci 99(22):14014–14019
Brin S, Page L (1998)The anatomy of a large-scale hypertextual Web search engine.Comput Netw ISDN Syst 30(1–7):107–117
Litvak N, Volkovich Y, Donato D (2007) Determining factors behind the pageranklog-log plot. Lecture notes in computer science 4863:108
Haveliwala Taher H (2002) Topic-sensitive pagerank. In: WWW '02: Proceedings of the11th International Conference on World Wide Web, New York, pp 517–526
Henzinger MR, Heydon A, Mitzenmacher M, Najork M (2000) On near-uniformURL sampling.Comput Netw 33(1–6):295–308
Kleinberg JM (1999)Authoritative sources in a hyperlinked environment.J ACM 46(5):604–632
Pass G, Chowdhury A, Torgeson C (2006) A picture of search.Infoscale '06, HongKong. In: Proceedings of the 1st international conference on Scalable information systems. ACM, New York, pp 1
Fortunato S, Flammini A, Menczer F, Vespignani A (2006)Topical interests and the mitigation of search engine bias.Proc Natl Acad Sci 103(34):12684
Kleinberg JM, Kumar R, Raghavan P, Rajagopalan S, Tomkins AS (1999) TheWeb as a Graph: Measurements, Models, and Methods.Computing and Combinatorics: 5th Annual International Conference, Cocoon'99, Tokyo,Japan
Dourisboure Y, Geraci F, Pellegrini M (2007)Extraction and classification of dense communities in the web. In:Proceedings of the 16th International Conference on World Wide Web, Banff, Canada, pp 461–470
Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Extractinglarge-scale knowledge bases from the web. In: Proceedings of the 25th Very Large Data Bases Conference, Edinburgh, UK,pp 639–650
Flake GW, Lawrence S, Giles CL, Coetzee FM (2002)Self-organization and identification of Web communities.Computer 35(3):66–70
Clauset A, Newman MEJ, Moore C (2004)Finding community structure in very large networks.Phys Rev E 70(6):66111
Danon L, Díaz-Guilera A, Duch J, Arenas A (2005)Comparing community structure identification.J Stat Mech Theor Exp 9:P09008
Castillo C, Donato D, Gionis A, Murdock V, Silvestri F (2007)Know your neighbors: Web spam detection using the web topology.In: Proceedings of SIGIR, pp 423–430.ACM Press, Amsterdam
Leskovec J, Dumais S, Horvitz E (2007) Web projections: learning fromcontextual subgraphs of the web. In: Proceedings of the 16th International Conference on World Wide Web, Banff, Canada,pp 471–480
Anderson C (2006) The long tail.Hyperion, NewYork
Kumar R, Novak J, Raghavan P, Tomkins A (2004)Structure and evolution of blogspace.Commun ACM 47(12):35–39
Liben-Nowell D, Novak J, Kumar R, Raghavan P, Tomkins A (2005)Geographic routing in social networks.Proc Natl Acad Sci 102(33):11623–11628
Travers J, Milgram S (1969)An experimental study of the small world problem.Sociometry 32:425–443
Kleinberg J (2000)Navigation in a small world.Nature 406(6798):845
Kumar R, Novak J, Raghavan P, Tomkins A (2005)On the bursty evolution of blogspace.World Wide Web 8(2):159–178
Hindman M, Tsioutsiouliklis K, Johnson JA (2003)Googlearchy: How a Few Heavily-Linked Sites Dominate Politics on the Web.Annual Meeting of the Midwest Political Science Association
Drezner DW, Farrell H (2004)The power and politics of blogs.Download at http://www.danieldrezner.com/research/blogpaperfinal.pdf
Adamic LA, Glance N (2005) The political blogosphere and the 2004 USelection: divided they blog. In: Proceedings of the 3rd International Workshop on Link Discovery, Aug 21–25, Chicago, IL,pp 36–43
Newman MEJ (2006)Modularity and community structure in networks.Proc Natl Acad Sci 103(23):8577–8582
Gruhl D, Guha R, Liben-Nowell D, Tomkins A (2004)Information diffusion through blogspace.In: WWW '04: Proceedings of the 13th International Conference on World Wide Web.ACM Press, New York, pp 491–501
Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread ofinfluence through a social network. In: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Washington, DC, USA, pp 137–146
Adar E, Zhang L, Adamic LA, Lukose RM (2004) Implicit structure and the dynamics ofblogspace.Workshop on the Weblogging Ecosystem, New York, NY
Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N(2007) Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and DataMining, Washington, DC, USA, pp 420–429
Zlatic V, Bozicevic M, Stefancic H, Domazet M (2006)Wikipedias: Collaborative web-based encyclopedias as complex networks.Phys Rev E 74(1):016115
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time:densification laws, shrinking diameters and possible explanations. In: Conference on Knowledge Discovery in Data, Chicago, IL, USA,pp 177–187
Shi X, Tseng B, Adamic LA (2007)Looking at the Blogosphere Topology through Different Lenses.Proceedings of ICWSM'07.Boulder, CO, USA
Capocci A, Servedio VDP, Colaiori F, Buriol LS, Donato D, Leonardi S, Caldarelli G (2006)Preferential attachment in the growth of social networks: The internet encyclopedia wikipedia.Phys Rev E 74(3):036116
Halvey M, Keane MT, Smyth B (2006)Mobile web surfing is the same as web surfing.Commun ACM 49(3):76–81
Buyukkokten O, Garcia-Molina H, Paepcke A, Winograd T (2000)Power browser: efficient web browsing for pdas.In: CHI '00: Proceedings of the SIGCHI conference on Human Factors in Computing Systems.ACM, New York, pp 430–437
Books and Reviews
Aiello W, Broder A, Janssen J, Milios E (eds) (2006) Algorithms and Modelsfor the WebGraph: Proceedings of the 4th International Workshop, WAW 2006, Banff, Canada, 30 Nov–1 Dec
Bonato A (2005) A Survey of Models of theWeb graph. In: López-Ortiz A, Hamel A (eds) Combinatorial and Algorithmic Aspects of Networking.Lecture Notes inComputer Science, vol 3405.Springer, Berlin, pp 159
Bonato A, Chung FRK (eds) (2007) Algorithms and Models for the WebGraph:Proceedings of the 5th International Workshop, WAW 2007, San Diego, CA, USA
Caldarelli G (2007)Technological Networks: Internet and WWW in Scale-Free Networks: Complex Webs in Nature and Technology.Oxford University Press, Oxford
Huberman BA (2001)The Laws of the Web: Patterns in the Ecology of Information.MIT, Cambridge
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag
About this entry
Cite this entry
Adamic, L.A. (2012). World Wide Web, Graph Structure. In: Meyers, R. (eds) Computational Complexity. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1800-9_208
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1800-9_208
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1799-6
Online ISBN: 978-1-4614-1800-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering