Webgraph

Last updated November 18, 2024

The webgraph describes the directed links between pages of the World Wide Web. A graph, in general, consists of several vertices, some pairs connected by edges. In a directed graph, edges are directed lines or arcs. The webgraph is a directed graph, whose vertices correspond to the pages of the WWW, and a directed edge connects page X to page Y if there exists a hyperlink on page X, referring to page Y.

Properties

The degree distribution of the webgraph strongly differs from the degree distribution of the classical random graph model, the Erdős–Rényi model:^[1] in the Erdős–Rényi model, there are very few large degree nodes, relative to the webgraph's degree distribution. The precise distribution is unclear,^[2] however: it is relatively well described by a lognormal distribution, as well as the Barabási–Albert model for power laws.^[3]^[4]
The webgraph is an example of a scale-free network.

Applications

The webgraph is used for:

computing the PageRank ^[5] of the world wide web's pages;
computing the personalized PageRank;^[6]
detecting webpages of similar topics, through graph-theoretical properties only, like co-citation;^[7]
and identifying hubs and authorities in the web for HITS algorithm.

Related Research Articles

A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as

In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs lies at the intersection between graph theory and probability theory. From a mathematical perspective, random graphs are used to answer questions about the properties of typical graphs. Its practical applications are found in all areas in which complex networks need to be modeled – many random graph models are thus known, mirroring the diverse types of complex networks encountered in different areas. In a mathematical context, random graph refers almost exclusively to the Erdős–Rényi random graph model. In other contexts, any graph model may be referred to as a random graph.

<span class="mw-page-title-main">Small-world network</span> Graph where most nodes are reachable in a small number of steps

A small-world network is a graph characterized by a high clustering coefficient and low distances. On an example of social network, high clustering implies the high probability that two friends of one person are friends themselves. The low distances, on the other hand, mean that there is a short chain of social connections between any two people. Specifically, a small-world network is defined to be a network where the typical distance L between two randomly chosen nodes grows proportionally to the logarithm of the number of nodes N in the network, that is:

In the context of network theory, a complex network is a graph (network) with non-trivial topological features—features that do not occur in simple networks such as lattices or random graphs but often occur in networks representing real systems. The study of complex networks is a young and active area of scientific research inspired largely by empirical findings of real-world networks such as computer networks, biological networks, technological networks, brain networks, climate networks and social networks.

In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.

In network theory, a giant component is a connected component of a given random graph that contains a significant fraction of the entire graph's vertices.

A preferential attachment process is any of a class of processes in which some quantity, typically some form of wealth or credit, is distributed among a number of individuals or objects according to how much they already have, so that those who are already wealthy receive more than those who are not. "Preferential attachment" is only the most recent of many names that have been given to such processes. They are also referred to under the names Yule process, cumulative advantage, the rich get richer, and the Matthew effect. They are also related to Gibrat's law. The principal reason for scientific interest in preferential attachment is that it can, under suitable circumstances, generate power law distributions. If preferential attachment is non-linear, measured distributions may deviate from a power law. These mechanisms may generate distributions which are approximately power law over transient periods.

The Barabási–Albert (BA) model is an algorithm for generating random scale-free networks using a preferential attachment mechanism. Several natural and human-made systems, including the Internet, the World Wide Web, citation networks, and some social networks are thought to be approximately scale-free and certainly contain few nodes with unusually high degree as compared to the other nodes of the network. The BA model tries to explain the existence of such nodes in real networks. The algorithm is named for its inventors Albert-László Barabási and Réka Albert.

The Watts–Strogatz model is a random graph generation model that produces graphs with small-world properties, including short average path lengths and high clustering. It was proposed by Duncan J. Watts and Steven Strogatz in their article published in 1998 in the Nature scientific journal. The model also became known as the (Watts) beta model after Watts used $to formulate it in his popular science book Six Degrees .$

In the mathematical field of graph theory, the Erdős–Rényi model refers to one of two closely related models for generating random graphs or the evolution of a random network. These models are named after Hungarian mathematicians Paul Erdős and Alfréd Rényi, who introduced one of the models in 1959. Edgar Gilbert introduced the other model contemporaneously with and independently of Erdős and Rényi. In the model of Erdős and Rényi, all graphs on a fixed vertex set with a fixed number of edges are equally likely. In the model introduced by Gilbert, also called the Erdős–Rényi–Gilbert model, each edge has a fixed probability of being present or absent, independently of the other edges. These models can be used in the probabilistic method to prove the existence of graphs satisfying various properties, or to provide a rigorous definition of what it means for a property to hold for almost all graphs.

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

In mathematics and social science, a collaboration graph is a graph modeling some social network where the vertices represent participants of that network and where two distinct participants are joined by an edge whenever there is a collaborative relationship between them of a particular kind. Collaboration graphs are used to measure the closeness of collaborative relationships between the participants of the network.

In graph theory, a k-degenerate graph is an undirected graph in which every subgraph has at least one vertex of degree at most k: that is, some vertex in the subgraph touches k or fewer of the subgraph's edges. The degeneracy of a graph is the smallest value of k for which it is k-degenerate. The degeneracy of a graph is a measure of how sparse it is, and is within a constant factor of other sparsity measures such as the arboricity of a graph.

Evolving networks are networks that change as a function of time. They are a natural extension of network science since almost all real world networks evolve over time, either by adding or removing nodes or links over time. Often all of these processes occur simultaneously, such as in social networks where people make and lose friends over time, thereby creating and destroying edges, and some people become part of new social networks or leave their networks, changing the nodes in the network. Evolving network concepts build on established network theory and are now being introduced into studying networks in many diverse fields.

In social network analysis, the co-stardom network represents the collaboration graph of film actors i.e. movie stars. The co-stardom network can be represented by an undirected graph. Nodes correspond to the movie star actors and two nodes are linked if they co-starred (performed) in the same movie. The links are un-directed, and can be weighted or not depending on the goals of study. If the number of times two actors appeared in a movie is needed, links are assigned weights. Initially, the network was found to have a small-world property. Afterwards, it was discovered that more precisely it exhibits a scale-free (power-law) behavior.

Hierarchical network models are iterative algorithms for creating networks which are able to reproduce the unique properties of the scale-free topology and the high clustering of the nodes at the same time. These characteristics are widely observed in nature, from biology to language to some social networks.

Réka Albert is a Romanian-Hungarian scientist. She is a distinguished professor of physics and adjunct professor of biology at Pennsylvania State University and is noted for the Barabási–Albert model and research into scale-free networks and Boolean modeling of biological systems.

In a scale-free network the degree distribution follows a power law function. In some empirical examples this power-law fits the degree distribution well only in the high degree region; in some small degree nodes the empirical degree-distribution deviates from it. See for example the network of scientific citations. This deviation of the observed degree-distribution from the theoretical prediction at the low-degree region is often referred as low-degree saturation. The empirical degree-distribution typically deviates downward from the power-law function fitted on higher order nodes, which means low-degree nodes are less frequent in real data than what is predicted by the Barabási–Albert model.

The initial attractiveness is a possible extension of the Barabási–Albert model. The Barabási–Albert model generates scale-free networks where the degree distribution can be described by a pure power law. However, the degree distribution of most real life networks cannot be described by a power law solely. The most common discrepancies regarding the degree distribution found in real networks are the high degree cut-off and the low degree saturation. The inclusion of initial attractiveness in the Barabási–Albert model addresses the low-degree saturation phenomenon.

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.

References

↑ Erdős, Paul; Rényi, Alfréd (1960). "On the evolution of random graphs" (PDF). Publication of the Mathematical Institute of the Hungarian Academy of Sciences. 5: 17–61.
↑ Meusel, R.; Vigna, S.; Lehmberg, O.; Bizer, C. (2015). "The Graph Structure in the Web - Analyzed on Different Aggregation Levels" (PDF). Journal of Web Science. 1 (1): 33–47. doi:10.1561/106.00000003. hdl: 2434/372411 .
↑ Clauset, A.; Shalizi, C. R.; Newman, M. E. J. (2009). "Power-law distributions in empirical data". SIAM Rev. 51 (4): 661–703. arXiv: 0706.1062 . Bibcode:2009SIAMR..51..661C. doi:10.1137/070710111. S2CID 9155618.
↑ Barabási, Albert-László; Albert, Réka (October 1999). "Emergence of scaling in random networks" (PDF). Science. 286 (5439): 509–512. arXiv: cond-mat/9910332 . Bibcode:1999Sci...286..509B. doi:10.1126/science.286.5439.509. PMID 10521342. S2CID 524106..
↑ Brin, Sergey; Page, Lawrence (1998-04-01). "The anatomy of a large-scale hypertextual Web search engine". Computer Networks and ISDN Systems. Proceedings of the Seventh International World Wide Web Conference. 30 (1): 107–117. doi:10.1016/S0169-7552(98)00110-X. ISSN 0169-7552.
↑ Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th international conference on World Wide Web (WWW '03). ACM, New York, NY, USA, 271–279. doi : 10.1145/775152.775191
↑ Kumar, Ravi; Raghavan, Prabhakar; Rajagopalan, Sridhar; Tomkins, Andrew (1999). "Trawling the Web for emerging cyber-communities". Computer Networks. 31 (11–16): 1481–1493. CiteSeerX 10.1.1.89.4025 . doi:10.1016/S1389-1286(99)00040-7. S2CID 7069190.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Erdős, Paul; Rényi, Alfréd (1960). "On the evolution of random graphs" (PDF). Publication of the Mathematical Institute of the Hungarian Academy of Sciences. 5: 17–61.

[2] Meusel, R.; Vigna, S.; Lehmberg, O.; Bizer, C. (2015). "The Graph Structure in the Web - Analyzed on Different Aggregation Levels" (PDF). Journal of Web Science. 1 (1): 33–47. doi:10.1561/106.00000003. hdl: 2434/372411 .

[3] Clauset, A.; Shalizi, C. R.; Newman, M. E. J. (2009). "Power-law distributions in empirical data". SIAM Rev. 51 (4): 661–703. arXiv: 0706.1062 . Bibcode:2009SIAMR..51..661C. doi:10.1137/070710111. S2CID 9155618.

[4] Barabási, Albert-László; Albert, Réka (October 1999). "Emergence of scaling in random networks" (PDF). Science. 286 (5439): 509–512. arXiv: cond-mat/9910332 . Bibcode:1999Sci...286..509B. doi:10.1126/science.286.5439.509. PMID 10521342. S2CID 524106..

[5] Brin, Sergey; Page, Lawrence (1998-04-01). "The anatomy of a large-scale hypertextual Web search engine". Computer Networks and ISDN Systems. Proceedings of the Seventh International World Wide Web Conference. 30 (1): 107–117. doi:10.1016/S0169-7552(98)00110-X. ISSN 0169-7552.

[6] Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th international conference on World Wide Web (WWW '03). ACM, New York, NY, USA, 271–279. doi : 10.1145/775152.775191

[7] Kumar, Ravi; Raghavan, Prabhakar; Rajagopalan, Sridhar; Tomkins, Andrew (1999). "Trawling the Web for emerging cyber-communities". Computer Networks. 31 (11–16): 1481–1493. CiteSeerX 10.1.1.89.4025 . doi:10.1016/S1389-1286(99)00040-7. S2CID 7069190.

[1]

[2]

[3]

[4]

[5]

[6]

[7]