Random geometric graph

Last updated

In graph theory, a random geometric graph (RGG) is the mathematically simplest spatial network, namely an undirected graph constructed by randomly placing N nodes in some metric space (according to a specified probability distribution) and connecting two nodes by a link if and only if their distance is in a given range, e.g. smaller than a certain neighborhood radius, r.

Contents

Random geometric graphs resemble real human social networks in a number of ways. For instance, they spontaneously demonstrate community structure - clusters of nodes with high modularity. Other random graph generation algorithms, such as those generated using the Erdős–Rényi model or Barabási–Albert (BA) model do not create this type of structure. Additionally, random geometric graphs display degree assortativity according to their spatial dimension: [1] "popular" nodes (those with many links) are particularly likely to be linked to other popular nodes.

A real-world application of RGGs is the modeling of ad hoc networks. [2] Furthermore they are used to perform benchmarks for graph algorithms.

Definition

The generation of a random geometric graph for different connectivity parameters r. Random Geometric Graph.gif
The generation of a random geometric graph for different connectivity parameters r.

In the following, let  G = (V, E) denote an undirected Graph with a set of vertices V and a set of edges E ⊆ V × V. The set sizes are denoted by |V| = n and |E| = m. Additionally, if not noted otherwise, the metric space [0,1)d with the euclidean distance is considered, i.e. for any points the euclidean distance of x and y is defined as

.

A random geometric graph (RGG) is an undirected geometric graph with nodes randomly sampled from the uniform distribution of the underlying space [0,1)d. [3] Two vertices p, q ∈ V are connected if, and only if, their distance is less than a previously specified parameter r ∈ (0,1), excluding any loops. Thus, the parameters r and n fully characterize a RGG.

Algorithms

Naive algorithm

The naive approach is to calculate the distance of every vertex to every other vertex. As there are possible connections that are checked, the time complexity of the naive algorithm is . The samples are generated by using a random number generator (RNG) on . Practically, one can implement this using d random number generators on , one RNG for every dimension.

Pseudocode

V := generateSamples(n)  // Generates n samples in the unit cube.for eachpVdofor eachqV\{p} doif distance(p, q) ≤ rthen             addConnection(p, q) // Add the edge (p, q) to the edge data structure.end ifend forend for

As this algorithm is not scalable (every vertex needs information of every other vertex), Holtgrewe et al. and Funke et al. have introduced new algorithms for this problem.

Distributed algorithms

Holtgrewe et al.

This algorithm, which was proposed by Holtgrewe et al., was the first distributed RGG generator algorithm for dimension 2. [4] It partitions the unit square into equal sized cells with side length of at least . For a given number of processors, each processor is assigned cells, where For simplicity, is assumed to be a square number, but this can be generalized to any number of processors. Each processor then generates vertices, which are then distributed to their respective owners. Then the vertices are sorted by the cell number they fall into, for example with Quicksort. Next, each processor then sends their adjacent processors the information about the vertices in the border cells, such that each processing unit can calculate the edges in their partition independent of the other units. The expected running time is . An upper bound for the communication cost of this algorithm is given by , where denotes the time for an all-to-all communication with messages of length l bits to c communication partners. is the time taken for a point-to-point communication for a message of length l bits.

Since this algorithm is not communication free, Funke et al. proposed [4] a scalable distributed RGG generator for higher dimensions, which works without any communication between the processing units.

Funke et al.

The approach used in this algorithm [4] is similar to the approach in Holtgrewe: Partition the unit cube into equal sized chunks with side length of at least r. So in d = 2 this will be squares, in d = 3 this will be cubes. As there can only fit at most chunks per dimension, the number of chunks is capped at . As before, each processor is assigned chunks, for which it generates the vertices. To achieve a communication free process, each processor then generates the same vertices in the adjacent chunks by exploiting pseudorandomization of seeded hash functions. This way, each processor calculates the same vertices and there is no need for exchanging vertex information.

For dimension 3, Funke et al. showed that the expected running time is , without any cost for communication between processing units.

Properties

Isolated vertices and connectivity

The probability that a single vertex is isolated in a RGG is . [5] Let be the random variable counting how many vertices are isolated. Then the expected value of is . The term provides information about the connectivity of the RGG. For , the RGG is asymptotically almost surely connected. For , the RGG is asymptotically almost surely disconnected. And for , the RGG has a giant component that covers more than vertices and is Poisson distributed with parameter . It follows that if , the probability that the RGG is connected is and the probability that the RGG is not connected is .

For any -Norm ( ) and for any number of dimensions , a RGG possesses a sharp threshold of connectivity at with constant . In the special case of a two-dimensional space and the euclidean norm ( and ) this yields .

Hamiltonicity

It has been shown, that in the two-dimensional case, the threshold also provides information about the existence of a Hamiltonian cycle (Hamiltonian Path). [6] For any , if , then the RGG has asymptotically almost surely no Hamiltonian cycle and if for any , then the RGG has asymptotically almost surely a Hamiltonian cycle.

Clustering coefficient

The clustering coefficient of RGGs only depends on the dimension d of the underlying space [0,1)d. The clustering coefficient is [7]

for even and for odd where

For large , this simplifies to .

Generalized random geometric graphs

In 1988 Waxman [8] generalised the standard RGG by introducing a probabilistic connection function as opposed to the deterministic one suggested by Gilbert. The example introduced by Waxman was a stretched exponential where two nodes and connect with probability given by where is the euclidean separation and , are parameters determined by the system. This type of RGG with probabilistic connection function is often referred to a soft random geometric Graph, which now has two sources of randomness; the location of nodes (vertices) and the formation of links (edges). This connection function has been generalized further in the literature which is often used to study wireless networks without interference. The parameter represents how the signal decays with distance, when is free space, models a more cluttered environment like a town (= 6 models cities like New York) whilst models highly reflective environments. We notice that for is the Waxman model, whilst as and we have the standard RGG. Intuitively these type of connection functions model how the probability of a link being made decays with distance.

Overview of some results for Soft RGG

In the high density limit for a network with exponential connection function the number of isolated nodes is Poisson distributed, and the resulting network contains a unique giant component and isolated nodes only. [9] Therefore by ensuring there are no isolated nodes, in the dense regime, the network is a.a.s fully connected; similar to the results shown in [10] for the disk model. Often the properties of these networks such as betweenness centrality [11] and connectivity [9] are studied in the limit as the density which often means border effects become negligible. However, in real life where networks are finite, although can still be extremely dense, border effects will impact on full connectivity; in fact [12] showed that for full connectivity, with an exponential connection function, is greatly impacted by boundary effects as nodes near the corner/face of a domain are less likely to connect compared with those in the bulk. As a result full connectivity can be expressed as a sum of the contributions from the bulk and the geometries boundaries. A more general analysis of the connection functions in wireless networks has shown that the probability of full connectivity can be well approximated expressed by a few moments of the connection function and the regions geometry. [13]

Related Research Articles

<span class="mw-page-title-main">Logarithm</span> Inverse of the exponential function

In mathematics, the logarithm is the inverse function to exponentiation. That means that the logarithm of a number x to the base b is the exponent to which b must be raised to produce x. For example, since 1000 = 103, the logarithm base 10 of 1000 is 3, or log10 (1000) = 3. The logarithm of x to base b is denoted as logb (x), or without parentheses, logbx, or even without the explicit base, log x, when no confusion is possible, or when the base does not matter such as in big O notation.

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

The number π is a mathematical constant that is the ratio of a circle's circumference to its diameter, approximately equal to 3.14159. The number π appears in many formulae across mathematics and physics. It is an irrational number, meaning that it cannot be expressed exactly as a ratio of two integers, although fractions such as are commonly used to approximate it. Consequently, its decimal representation never ends, nor enters a permanently repeating pattern. It is a transcendental number, meaning that it cannot be a solution of an equation involving only sums, products, powers, and integers. The transcendence of π implies that it is impossible to solve the ancient challenge of squaring the circle with a compass and straightedge. The decimal digits of π appear to be randomly distributed, but no proof of this conjecture has been found.

<span class="mw-page-title-main">Log-normal distribution</span> Probability distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics (e.g., energies, concentrations, lengths, prices of financial instruments, and other metrics).

<span class="mw-page-title-main">Scale-free network</span> Network whose degree distribution follows a power law

A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as

In mathematics, the isoperimetric inequality is a geometric inequality involving the perimeter of a set and its volume. In -dimensional space the inequality lower bounds the surface area or perimeter of a set by its volume ,

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

<span class="mw-page-title-main">Stable distribution</span> Distribution of variables which satisfies a stability property under linear combinations

In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stable if its distribution is stable. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In the mathematical field of graph theory, the Laplacian matrix, also called the graph Laplacian, admittance matrix, Kirchhoff matrix or discrete Laplacian, is a matrix representation of a graph. Named after Pierre-Simon Laplace, the graph Laplacian matrix can be viewed as a matrix form of the negative discrete Laplace operator on a graph approximating the negative continuous Laplacian obtained by the finite difference method.

<span class="mw-page-title-main">Centrality</span> Degree of connectedness within a graph

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

von Mises distribution Probability distribution on the circle

In probability theory and directional statistics, the von Mises distribution is a continuous probability distribution on the circle. It is a close approximation to the wrapped normal distribution, which is the circular analogue of the normal distribution. A freely diffusing angle on a circle is a wrapped normally distributed random variable with an unwrapped variance that grows linearly in time. On the other hand, the von Mises distribution is the stationary distribution of a drift and diffusion process on the circle in a harmonic potential, i.e. with a preferred orientation. The von Mises distribution is the maximum entropy distribution for circular data when the real and imaginary parts of the first circular moment are specified. The von Mises distribution is a special case of the von Mises–Fisher distribution on the N-dimensional sphere.

In directional statistics, the von Mises–Fisher distribution, is a probability distribution on the -sphere in . If the distribution reduces to the von Mises distribution on the circle.

In the mathematical discipline of graph theory, the expander walk sampling theorem intuitively states that sampling vertices in an expander graph by doing relatively short random walk can simulate sampling the vertices independently from a uniform distribution. The earliest version of this theorem is due to Ajtai, Komlós & Szemerédi (1987), and the more general version is typically attributed to Gillman (1998).

<span class="mw-page-title-main">Erdős–Rényi model</span> Two closely related models for generating random graphs

In the mathematical field of graph theory, the Erdős–Rényi model refers to one of two closely related models for generating random graphs or the evolution of a random network. These models are named after Hungarian mathematicians Paul Erdős and Alfréd Rényi, who introduced one of the models in 1959. Edgar Gilbert introduced the other model contemporaneously with and independently of Erdős and Rényi. In the model of Erdős and Rényi, all graphs on a fixed vertex set with a fixed number of edges are equally likely. In the model introduced by Gilbert, also called the Erdős–Rényi–Gilbert model, each edge has a fixed probability of being present or absent, independently of the other edges. These models can be used in the probabilistic method to prove the existence of graphs satisfying various properties, or to provide a rigorous definition of what it means for a property to hold for almost all graphs.

<span class="mw-page-title-main">Karger's algorithm</span> Randomized algorithm for minimum cuts

In computer science and graph theory, Karger's algorithm is a randomized algorithm to compute a minimum cut of a connected graph. It was invented by David Karger and first published in 1993.

<span class="mw-page-title-main">Conductance (graph)</span> How quickly a random walk on a graph converges to steady state

In graph theory the conductance of a graph G = (V, E) measures how "well-knit" the graph is: it controls how fast a random walk on G converges to its stationary distribution. The conductance of a graph is often called the Cheeger constant of a graph as the analog of its counterpart in spectral geometry. Since electrical networks are intimately related to random walks with a long history in the usage of the term "conductance", this alternative name helps avoid possible confusion.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

In applied mathematics, lambda-connectedness deals with partial connectivity for a discrete space.

<span class="mw-page-title-main">Hyperbolic geometric graph</span>

A hyperbolic geometric graph (HGG) or hyperbolic geometric network (HGN) is a special type of spatial network where (1) latent coordinates of nodes are sprinkled according to a probability density function into a hyperbolic space of constant negative curvature and (2) an edge between two nodes is present if they are close according to a function of the metric (typically either a Heaviside step function resulting in deterministic connections between vertices closer than a certain threshold distance, or a decaying function of hyperbolic distance yielding the connection probability). A HGG generalizes a random geometric graph (RGG) whose embedding space is Euclidean.

References

  1. Antonioni, Alberto; Tomassini, Marco (28 September 2012). "Degree correlations in random geometric graphs". Physical Review E. 86 (3): 037101. arXiv: 1207.2573 . Bibcode:2012PhRvE..86c7101A. doi:10.1103/PhysRevE.86.037101. PMID   23031054. S2CID   14750415.
  2. Nekovee, Maziar (28 June 2007). "Worm epidemics in wireless ad hoc networks". New Journal of Physics. 9 (6): 189. arXiv: 0707.2293 . Bibcode:2007NJPh....9..189N. doi:10.1088/1367-2630/9/6/189. S2CID   203944.
  3. Penrose, Mathew. (2003). Random geometric graphs. Oxford: Oxford University Press. ISBN   0198506260. OCLC   51316118.
  4. 1 2 3 von Looz, Moritz; Strash, Darren; Schulz, Christian; Penschuck, Manuel; Sanders, Peter; Meyer, Ulrich; Lamm, Sebastian; Funke, Daniel (2017-10-20). "Communication-free Massively Distributed Graph Generation". arXiv: 1710.07565v3 [cs.DC].
  5. Perez, Xavier; Mitsche, Dieter; Diaz, Josep (2007-02-13). "Dynamic Random Geometric Graphs". arXiv: cs/0702074 . Bibcode:2007cs........2074D.{{cite journal}}: Cite journal requires |journal= (help)
  6. Perez, X.; Mitsche, D.; Diaz, J. (2006-07-07). "Sharp threshold for hamiltonicity of random geometric graphs". arXiv: cs/0607023 . Bibcode:2006cs........7023D.{{cite journal}}: Cite journal requires |journal= (help)
  7. Christensen, Michael; Dall, Jesper (2002-03-01). "Random Geometric Graphs". Physical Review E. 66 (1 Pt 2): 016121. arXiv: cond-mat/0203026 . Bibcode:2002PhRvE..66a6121D. doi:10.1103/PhysRevE.66.016121. PMID   12241440. S2CID   15193516.
  8. Waxman, B.M (1988). "Routing of multipoint connections". IEEE Journal on Selected Areas in Communications. 6 (9): 1617–1622. doi:10.1109/49.12889.
  9. 1 2 Mao, G; Anderson, B.D (2013). "Connectivity of large wireless networks under a general connection model". IEEE Transactions on Information Theory. 59 (3): 1761–1772. doi:10.1109/tit.2012.2228894. S2CID   3027610.
  10. Penrose, Mathew D (1997). "The longest edge of the random minimal spanning tree". The Annals of Applied Probability: 340361.
  11. Giles, Alexander P.; Georgiou, Orestis; Dettmann, Carl P. (2015). "Betweenness centrality in dense random geometric networks". 2015 IEEE International Conference on Communications (ICC). pp. 6450–6455. arXiv: 1410.8521 . Bibcode:2014arXiv1410.8521K. doi:10.1109/ICC.2015.7249352. ISBN   978-1-4673-6432-4. S2CID   928409.
  12. Coon, J; Dettmann, C P; Georgiou, O (2012). "Full connectivity: corners, edges and faces". Journal of Statistical Physics. 147 (4): 758–778. arXiv: 1201.3123 . Bibcode:2012JSP...147..758C. doi:10.1007/s10955-012-0493-y. S2CID   18794396.
  13. Dettmann, C.P; Georgiou, O (2016). "Random geometric graphs with general connection functions". Physical Review E. 93 (3): 032313. arXiv: 1411.3617 . Bibcode:2016PhRvE..93c2313D. doi:10.1103/physreve.93.032313. PMID   27078372. S2CID   124506496.