Closeness centrality

Last updated

In a connected graph, closeness centrality (or closeness) of a node is a measure of centrality in a network, calculated as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph. Thus, the more central a node is, the closer it is to all other nodes.

Contents

The number next to each node is the distance from that node to the square red node as measured by the length of the shortest path. The green edges illustrate one of the two shortest paths between the red square node and the red circle node. The closeness of the red square node is therefore 5/(1+1+1+2+2) = 5/7. Pathdegreeclosenessexampleedit.svg
The number next to each node is the distance from that node to the square red node as measured by the length of the shortest path. The green edges illustrate one of the two shortest paths between the red square node and the red circle node. The closeness of the red square node is therefore 5/(1+1+1+2+2) = 5/7.

Closeness was defined by Bavelas (1950) as the reciprocal of the farness, [1] [2] that is:

where is the distance (length of the shortest path) between vertices and . This unnormalised version of closeness is sometimes known as status. [3] [4] [5] When speaking of closeness centrality, people usually refer to its normalized form which represents the average length of the shortest paths instead of their sum. It is generally given by the previous formula multiplied by , where is the number of nodes in the graph resulting in:

The normalization of closeness simplifies the comparison of nodes in graphs of different sizes. For large graphs, the minus one in the normalisation becomes inconsequential and it is often dropped.

As one of the oldest centrality measures, closeness is often given in general discussions of network centrality measures in introductory texts [6] [7] [8] or in articles comparing different centrality measures. [9] [10] [11] [12] The values produced by many centrality measures can be highly correlated. [9] [13] [11] In particular, closeness and degree have been shown [12] to be related in many networks through an approximate relationship

where is the degree of vertex while and β are parameters found by fitting closeness and degree to this formula. The z parameter represents the branching factor, the average degree of nodes (excluding the root node and leaves) of the shortest-path trees used to approximate networks when demonstrating this relationship. [12] This is never an exact relationship but it captures a trend seen in many real-world networks.

Closeness is related to other length scales used in network science. For instance, the average shortest path length , the average distance between vertices in a network, is simply the average of the inverse closeness values

.

Taking distances from or to all other nodes is irrelevant in undirected graphs, whereas it can produce totally different results in directed graphs (e.g. a website can have a high closeness centrality from outgoing links, but low closeness centrality from incoming links).

Applications

Closeness is used in many different contexts. In bibliometrics closeness has been used to look at the way academics choose their journals and bibliographies in different fields [14] or to measure the impact of an author on a field and their social capital. [15] When used to select potential leads in customer data, closeness has been seen to lead to a significant gain in the success rate. [16] The closeness of a city in an air transport network has been shown to be highly correlated with socio-economic indicators such as gross regional domestic product. [17] Closeness has also been applied to biological networks [5] where, for instance, this was used to identify more than 50% of the global regulators within the top 2% of the ranked genes [18] or essential genes were found to have higher closeness than nonessential genes in protein-interaction networks. [19] In a metabolic network the closeness of nodes can identify the most important metabolites. [20]

In disconnected graphs

When a graph is not strongly connected, Beauchamp introduced in 1965 the idea of using the sum of reciprocal of distances, [21] instead of the reciprocal of the sum of distances, with the convention :

Beauchamp's modification follows the (much later in time) general principle proposed by Marchiori and Latora (2000) [22] that in graphs with infinite distances the harmonic mean behaves better than the arithmetic mean. Indeed, Bavelas's closeness can be described as the denormalized reciprocal of the arithmetic mean of distances, whereas Beauchamp's centrality is the reciprocal of the harmonic mean of distances.

This idea has resurfaced several time in the literature, often without the normalization factor : for undirected graphs under the name valued centrality by Dekker (2005) [23] and under the name harmonic centrality by Rochat (2009); [24] if was axiomatized by Garg (2009) [25] and proposed once again later by Opsahl (2010). [26] It was studied on general directed graphs by Boldi and Vigna (2014). [27] This idea is also quite similar to market potential proposed in Harris (1954) [28] which now often goes by the term market access. [29]

Variants

Dangalchev (2006), [30] in a work on network vulnerability proposes for undirected graphs a different definition:

This definition is used effectively for disconnected graphs and allows to create convenient formulae for graph operations. For example:

If graph is created by linking node of graph to node of graph then the combined closeness is:

if graph is created by collapsing node of graph and node of graph into one node then the closeness is: [31]

If graph is the shadow graph of graph , which has nodes, then closeness is: [32]

If graph is the thorn graph of graph , which has nodes, then closeness is: [33]

The natural generalization of this definition is: [34]

where belongs to (0,1). As increases from 0 to 1, the generalized closeness changes from local characteristic (degree) to global (number of connected nodes).

The information centrality of Stephenson and Zelen (1989) is another closeness measure, which computes the harmonic mean of the resistance distances towards a vertex x, which is smaller if x has many paths of small resistance connecting it to other vertices. [35]

In the classic definition of the closeness centrality, the spread of information is modeled by the use of shortest paths. This model might not be the most realistic for all types of communication scenarios. Thus, related definitions have been discussed to measure closeness, like the random walk closeness centrality introduced by Noh and Rieger (2004). It measures the speed with which randomly walking messages reach a vertex from elsewhere in the graph. [36] Hierarchical closeness of Tran and Kwon (2014) [37] is an extended closeness centrality to deal still in another way with the limitation of closeness in graphs that are not strongly connected. The hierarchical closeness explicitly includes information about the range of other nodes that can be affected by the given node.

See also

Related Research Articles

<span class="mw-page-title-main">Travelling salesman problem</span> NP-hard problem in combinatorial optimization

In the theory of computational complexity, the travelling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.

A* is a graph traversal and pathfinding algorithm that is used in many fields of computer science due to its completeness, optimality, and optimal efficiency. Given a weighted graph, a source node and a goal node, the algorithm finds the shortest path from source to goal.

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations.

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is also possible when the DAG has disconnected components.

In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes.

<span class="mw-page-title-main">Centrality</span> Degree of connectedness within a graph

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

<span class="mw-page-title-main">Degree distribution</span>

In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.

<span class="mw-page-title-main">Assortativity</span> Tendency for similar nodes to be connected

Assortativity, or assortative mixing, is a preference for a network's nodes to attach to others that are similar in some way. Though the specific measure of similarity may vary, network theorists often examine assortativity in terms of a node's degree. The addition of this characteristic to network models more closely approximates the behaviors of many real world networks.

<span class="mw-page-title-main">Random geometric graph</span> In graph theory, the mathematically simplest spatial network

In graph theory, a random geometric graph (RGG) is the mathematically simplest spatial network, namely an undirected graph constructed by randomly placing N nodes in some metric space and connecting two nodes by a link if and only if their distance is in a given range, e.g. smaller than a certain neighborhood radius, r.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

<span class="mw-page-title-main">Friendship paradox</span> Phenomenon that most people have fewer friends than their friends have, on average

The friendship paradox is the phenomenon first observed by the sociologist Scott L. Feld in 1991 that on average, an individual's friends have more friends than that individual. It can be explained as a form of sampling bias in which people with more friends are more likely to be in one's own friend group. In other words, one is less likely to be friends with someone who has very few friends. In contradiction to this, most people believe that they have more friends than their friends have.

<span class="mw-page-title-main">Betweenness centrality</span> Measure of a graphs centrality, based on shortest paths

In graph theory, betweenness centrality is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such that either the number of edges that the path passes through or the sum of the weights of the edges is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex.

<span class="mw-page-title-main">Exponential family random graph models</span>

Exponential Random Graph Models (ERGMs) are a family of statistical models for analyzing data from social and other networks. Examples of networks examined using ERGM include knowledge networks, organizational networks, colleague networks, social media networks, networks of scientific development, and others.

<span class="mw-page-title-main">Dependency network</span>

The dependency network approach provides a system level analysis of the activity and topology of directed networks. The approach extracts causal topological relations between the network's nodes, and provides an important step towards inference of causal activity relations between the network nodes. This methodology has originally been introduced for the study of financial data, it has been extended and applied to other systems, such as the immune system, and semantic networks.

Hierarchical closeness (HC) is a structural centrality measure used in network theory or graph theory. It is extended from closeness centrality to rank how centrally located a node is in a directed network. While the original closeness centrality of a directed network considers the most important node to be that with the least total distance from all other nodes, hierarchical closeness evaluates the most important node as the one which reaches the most nodes by the shortest paths. The hierarchical closeness explicitly includes information about the range of other nodes that can be affected by the given node. In a directed network where is the set of nodes and is the set of interactions, hierarchical closeness of a node called was proposed by Tran and Kwon as follows:

<span class="mw-page-title-main">Temporal network</span> Network whose links change over time

A temporal network, also known as a time-varying network, is a network whose links are active only at certain points in time. Each link carries information on when it is active, along with other possible characteristics such as a weight. Time-varying networks are of particular relevance to spreading processes, like the spread of information and disease, since each link is a contact opportunity and the time ordering of contacts is included.

<span class="mw-page-title-main">Efficiency (network science)</span>

In network science, the efficiency of a network is a measure of how efficiently it exchanges information and it is also called communication efficiency. The underlying idea is that the more distant two nodes are in the network, the less efficient their communication will be. The concept of efficiency can be applied to both local and global scales in a network. On a global scale, efficiency quantifies the exchange of information across the whole network where information is concurrently exchanged. The local efficiency quantifies a network's resistance to failure on a small scale. That is the local efficiency of a node characterizes how well information is exchanged by its neighbors when it is removed.

<span class="mw-page-title-main">Configuration model</span>

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.

A graph neural network (GNN) belongs to a class of artificial neural networks for processing data that can be represented as graphs.

<span class="mw-page-title-main">Brandes' algorithm</span> Algorithm for finding important nodes in a graph

In network theory, Brandes' algorithm is an algorithm for calculating the betweenness centrality of vertices in a graph. The algorithm was first published in 2001 by Ulrik Brandes. Betweenness centrality, along with other measures of centrality, is an important measure in many real-world networks, such as social networks and computer networks.

References

  1. Bavelas, Alex (1950). "Communication Patterns in Task‐Oriented Groups". The Journal of the Acoustical Society of America. 22 (6): 725–730. Bibcode:1950ASAJ...22..725B. doi:10.1121/1.1906679.
  2. Sabidussi, G (1966). "The centrality index of a graph". Psychometrika. 31 (4): 581–603. doi:10.1007/bf02289527. hdl: 10338.dmlcz/101401 . PMID   5232444. S2CID   119981743.
  3. Harary, Frank (1959). "Status and Contrastatus". Sociometry. 22 (1): 23–43. doi:10.2307/2785610. JSTOR   2785610.
  4. Hage, Per; Harary, Frank (1995). "Eccentricity and centrality in networks". Social Networks. 17 (1): 57–63. doi:10.1016/0378-8733(94)00248-9.
  5. 1 2 Wuchty, Stefan; Stadler, Peter F. (2003). "Centers of complex networks". Journal of Theoretical Biology. 223 (1): 45–53. Bibcode:2003JThBi.223...45W. doi:10.1016/S0022-5193(03)00071-7. PMID   12782116.
  6. Newman, M. E. J. (2010). Networks : an introduction. Oxford: Oxford University Press. ISBN   978-0-19-920665-0. OCLC   456837194.
  7. Latora, Vito (2017). Complex Networks : principles, methods and applications. Vincenzo Nicosia, Giovanni Russo. Cambridge, United Kingdom. ISBN   978-1-316-21600-2. OCLC   1004620089.{{cite book}}: CS1 maint: location missing publisher (link)
  8. Cosia, Michele (2021). The Atlas for the Aspiring Network Scientist. arXiv: 2101.00863 . ISBN   9788797282403.
  9. 1 2 Bolland, John M (1988). "Sorting out centrality: An analysis of the performance of four centrality models in real and simulated networks". Social Networks. 10 (3): 233–253. doi:10.1016/0378-8733(88)90014-7.
  10. Brandes, Ulrik; Hildenbrand, Jan (2014). "Smallest graphs with distinct singleton centers". Network Science. 2 (3): 416–418. doi:10.1017/nws.2014.25. ISSN   2050-1242. S2CID   3841410.
  11. 1 2 Schoch, David; Valente, Thomas W.; Brandes, Ulrik (2017). "Correlations among centrality indices and a class of uniquely ranked graphs". Social Networks. 50: 46–54. doi:10.1016/j.socnet.2017.03.010. S2CID   10932381.
  12. 1 2 3 Evans, Tim S.; Chen, Bingsheng (2022). "Linking the network centrality measures closeness and degree". Communications Physics. 5 (1): 172. arXiv: 2108.01149 . Bibcode:2022CmPhy...5..172E. doi:10.1038/s42005-022-00949-5. ISSN   2399-3650. S2CID   236881169.
  13. Valente, Thomas W.; Coronges, Kathryn; Lakon, Cynthia; Costenbader, Elizabeth (2008-01-01). "How Correlated Are Network Centrality Measures?". Connections (Toronto, Ont.). 28 (1): 16–26. ISSN   0226-1766. PMC   2875682 . PMID   20505784.
  14. Ni, Chaoqun; Sugimoto, Cassidy; Jiang, Jiepu (2011). Noyons, ED; Ngulube, Patrick; Leta, Jacqueline (eds.). "Degree, Closeness, and Betweenness: Application of group centrality measurements to explore macro-disciplinary evolution diachronically" (PDF): 605.{{cite journal}}: Cite journal requires |journal= (help)
  15. Yan, Erjia; Ding, Ying (2009). "Applying centrality measures to impact analysis: A coauthorship network analysis". Journal of the American Society for Information Science and Technology. 60 (10): 2107–2118. arXiv: 1012.4862 . doi:10.1002/asi.21128. S2CID   261294843.
  16. Kiss, Christine; Bichler, Martin (2008). "Identification of influencers — Measuring influence in customer networks". Decision Support Systems. 46 (1): 233–253. doi:10.1016/j.dss.2008.06.007. S2CID   9783337.
  17. Wang, Jiaoe; Mo, Huihui; Wang, Fahui; Jin, Fengjun (2011). "Exploring the network structure and nodal centrality of China's air transport network: A complex network approach". Journal of Transport Geography. 19 (4): 712–721. doi:10.1016/j.jtrangeo.2010.08.012.
  18. Koschützki, Dirk; Schreiber, Falk (2008). "Centrality Analysis Methods for Biological Networks and Their Application to Gene Regulatory Networks". Gene Regulation and Systems Biology. 2: 193–201. doi:10.4137/GRSB.S702. ISSN   1177-6250. PMC   2733090 . PMID   19787083.
  19. Hahn, Matthew W.; Kern, Andrew D. (2005). "Comparative Genomics of Centrality and Essentiality in Three Eukaryotic Protein-Interaction Networks". Molecular Biology and Evolution. 22 (4): 803–806. doi: 10.1093/molbev/msi072 . ISSN   1537-1719. PMID   15616139.
  20. Ma, H.-W.; Zeng, A.-P. (2003-07-22). "The connectivity structure, giant strong component and centrality of metabolic networks". Bioinformatics. 19 (11): 1423–1430. doi: 10.1093/bioinformatics/btg177 . ISSN   1367-4803. PMID   12874056.
  21. Beauchamp, Murray (1965). "An Improved Index of Centrality". Behavioral Science. 10 (2): 161–163. doi:10.1002/bs.3830100205. PMID   14284290.
  22. Marchiori, Massimo; Latora, Vito (2000), "Harmony in the small-world", Physica A, 285 (3–4): 539–546, arXiv: cond-mat/0008357 , Bibcode:2000PhyA..285..539M, doi:10.1016/s0378-4371(00)00311-3, S2CID   10523345
  23. Dekker, Anthony (2005). "Conceptual Distance in Social Network Analysis". Journal of Social Structure. 6 (3).
  24. Yannick Rochat. Closeness centrality extended to unconnected graphs: The harmonic centrality index (PDF). Applications of Social Network Analysis, ASNA 2009.
  25. Manuj Garg (2009), Axiomatic Foundations of Centrality in Networks, doi:10.2139/ssrn.1372441, S2CID   117717919
  26. Tore Opsahl (2010-03-20). "Closeness centrality in networks with disconnected components".
  27. Boldi, Paolo; Vigna, Sebastiano (2014), "Axioms for Centrality", Internet Mathematics, 10 (3–4): 222–262, doi: 10.1080/15427951.2013.865686
  28. Harris, Chauncy D. (1954). "The Market as a Factor in the Localization of Industry in the United States". Annals of the Association of American Geographers. 44 (4): 315–348. doi:10.2307/2561395. JSTOR   2561395.
  29. Gutberlet, Theresa. Cheap Coal versus Market Access: The Role of Natural Resources and Demand in Germany's Industrialization. Working Paper. 2014.
  30. Dangalchev, Ch (2006). "Residual Closeness in Networks". Physica A. 365 (2): 556. Bibcode:2006PhyA..365..556D. doi:10.1016/j.physa.2005.12.020.
  31. Dangalchev, Ch (2020). "Additional Closeness and Networks Growth". Fundamenta Informaticae. 176 (1): 1–15. doi:10.3233/FI-2020-1960. S2CID   226300861.
  32. Dangalchev, Ch (2024). "Closeness of Some Graph Operations". Computing Open. 2. doi:10.1142/S2972370124500090.
  33. Dangalchev, Ch (2018). "Residual Closeness of Generalized Thorn Graphs". Fundamenta Informaticae. 162 (1): 1–15. doi:10.3233/FI-2018-1710. S2CID   52073138.
  34. Dangalchev, Ch (2011). "Residual Closeness and Generalized Closeness". International Journal of Foundations of Computer Science. 22 (8): 1939–1948. doi:10.1142/s0129054111009136.
  35. Stephenson, K. A.; Zelen, M. (1989). "Rethinking centrality: Methods and examples". Social Networks. 11: 1–37. doi:10.1016/0378-8733(89)90016-6.
  36. Noh, J. D.; Rieger, H. (2004). "Random Walks on Complex Networks". Phys. Rev. Lett. 92 (11): 118701. arXiv: cond-mat/0307719 . Bibcode:2004PhRvL..92k8701N. doi:10.1103/physrevlett.92.118701. PMID   15089179. S2CID   14767557.
  37. Tran, Tien-Dzung; Kwon, Yung-Keun (2014). "Hierarchical closeness efficiently predicts disease genes in a directed signaling network". Computational Biology and Chemistry. 53: 191–197. doi:10.1016/j.compbiolchem.2014.08.023. PMID   25462327.