Friendship paradox

Last updated
Diagram of a social network of 7-8-year-old children, mapped by asking each child to indicate two others they would like to sit next to in class. The majority of children have fewer connections than the average of those they are connected to. Moreno Sociogram 2nd Grade.png
Diagram of a social network of 7-8-year-old children, mapped by asking each child to indicate two others they would like to sit next to in class. The majority of children have fewer connections than the average of those they are connected to.

The friendship paradox is the phenomenon first observed by the sociologist Scott L. Feld in 1991 that on average, an individual's friends have more friends than that individual. [1] It can be explained as a form of sampling bias in which people with more friends are more likely to be in one's own friend group. In other words, one is less likely to be friends with someone who has very few friends. In contradiction to this, most people believe that they have more friends than their friends have. [2] [3] [4] [5]

Contents

The same observation can be applied more generally to social networks defined by other relations than friendship: for instance, most people's sexual partners have had (on the average) a greater number of sexual partners than they have. [6] [7]

The friendship paradox is an example of how network structure can significantly distort an individual's local observations. [8] [9]

Mathematical explanation

In spite of its apparently paradoxical nature, the phenomenon is real, and can be explained as a consequence of the general mathematical properties of social networks. The mathematics behind this are directly related to the arithmetic-geometric mean inequality and the Cauchy–Schwarz inequality. [10]

Formally, Feld assumes that a social network is represented by an undirected graph G = (V, E), where the set V of vertices corresponds to the people in the social network, and the set E of edges corresponds to the friendship relation between pairs of people. That is, he assumes that friendship is a symmetric relation: if x is a friend of y, then y is a friend of x. The friendship between x and y is therefore modeled by the edge {x, y}, and the number of friends an individual has corresponds to a vertex's degree. The average number of friends of a person in the social network is therefore given by the average of the degrees of the vertices in the graph. That is, if vertex v has d(v) edges touching it (representing a person who has d(v) friends), then the average number μ of friends of a random person in the graph is

The average number of friends that a typical friend has can be modeled by choosing a random person (who has at least one friend), and then calculating how many friends their friends have on average. This amounts to choosing, uniformly at random, an edge of the graph (representing a pair of friends) and an endpoint of that edge (one of the friends), and again calculating the degree of the selected endpoint. The probability of a certain vertex to be chosen is

The first factor corresponds to how likely it is that the chosen edge contains the vertex, which increases when the vertex has more friends. The halving factor simply comes from the fact that each edge has two vertices. So the expected value of the number of friends of a (randomly chosen) friend is

We know from the definition of variance that

where is the variance of the degrees in the graph. This allows us to compute the desired expected value as

For a graph that has vertices of varying degrees (as is typical for social networks), is strictly positive, which implies that the average degree of a friend is strictly greater than the average degree of a random node.

Another way of understanding how the first term came is as follows. For each friendship (u, v), a node u mentions that v is a friend and v has d(v) friends. There are d(v) such friends who mention this. Hence the square of d(v) term. We add this for all such friendships in the network from both the u's and v's perspective, which gives the numerator. The denominator is the number of total such friendships, which is twice the total edges in the network (one from the u's perspective and the other from the v's).

After this analysis, Feld goes on to make some more qualitative assumptions about the statistical correlation between the number of friends that two friends have, based on theories of social networks such as assortative mixing, and he analyzes what these assumptions imply about the number of people whose friends have more friends than they do. Based on this analysis, he concludes that in real social networks, most people are likely to have fewer friends than the average of their friends' numbers of friends. However, this conclusion is not a mathematical certainty; there exist undirected graphs (such as the graph formed by removing a single edge from a large complete graph) that are unlikely to arise as social networks but in which most vertices have higher degree than the average of their neighbors' degrees.

The Friendship Paradox may be restated in graph theory terms as "the average degree of a randomly selected node in a network is less than the average degree of neighbors of a randomly selected node", but this leaves unspecified the exact mechanism of averaging (i.e., macro vs micro averaging). Let be an undirected graph with and , having no isolated nodes. Let the set of neighbors of node be denoted . The average degree is then . Let the number of "friends of friends" of node be denoted . Note that this can count 2-hop neighbors multiple times, but so does Feld's analysis. We have . Feld considered the following "micro average" quantity.

However, there is also the (equally legitimate) "macro average" quantity, given by

The computation of MacroAvg can be expressed as the following pseudocode.

Algorithm MacroAvg
  1. for each node
    1. initialize
  2. for each edge
  3. return

Each edge contributes to MacroAvg the quantity , because . We thus get

.

Thus, we have both and , but no inequality holds between them. [11]

In a 2023 paper, a parallel paradox, but for negative, antagonistic, or animosity ties, termed the "enmity paradox," was defined and demonstrated by Ghasemian and Christakis. [12] In brief, one's enemies have more enemies than one does, too. This paper also documented diverse phenomena is "mixed worlds" of both hostile and friendly ties.

Applications

The analysis of the friendship paradox implies that the friends of randomly selected individuals are likely to have higher than average centrality. This observation has been used as a way to forecast and slow the course of epidemics, by using this random selection process to choose individuals to immunize or monitor for infection while avoiding the need for a complex computation of the centrality of all nodes in the network. [13] [14] [15] In a similar manner, in polling and election forecasting, friendship paradox has been exploited in order to reach and query well-connected individuals who may have knowledge about how numerous other individuals are going to vote. [16] However, when utilized in such contexts, the friendship paradox inevitably introduces bias by over-representing individuals with many friends, potentially skewing resulting estimates. [17] [18]

A study in 2010 by Christakis and Fowler showed that flu outbreaks can be detected almost two weeks before traditional surveillance measures would do so by using the friendship paradox in monitoring the infection in a social network. [19] They found that using the friendship paradox to analyze the health of central friends is "an ideal way to predict outbreaks, but detailed information doesn't exist for most groups, and to produce it would be time-consuming and costly." [20] This extends to the spread of ideas as well, with evidence that the friendship paradox can be used to track and predict the spread of ideas and misinformation through networks. [21] [13] [22] This observation has been explained with the argument that individuals with more social connections may be the driving forces behind the spread of these ideas and beliefs, and as such can be used as early-warning signals. [18]

Friendship paradox based sampling (i.e., sampling random friends) has been theoretically and empirically shown to outperform classical uniform sampling for the purpose of estimating the power-law degree distributions of scale-free networks. [23] [24] The reason is that sampling the network uniformly will not collect enough samples from the characteristic heavy tail part of the power-law degree distribution to properly estimate it. However, sampling random friends incorporates more nodes from the tail of the degree distribution (i.e., more high degree nodes) into the sample. Hence, friendship paradox based sampling captures the characteristic heavy tail of a power-law degree distribution more accurately and reduces the bias and variance of the estimation. [24]

The "generalized friendship paradox" states that the friendship paradox applies to other characteristics as well. For example, one's co-authors are on average likely to be more prominent, with more publications, more citations and more collaborators, [25] [26] [27] or one's followers on Twitter have more followers. [28] The same effect has also been demonstrated for Subjective Well-Being by Bollen et al. (2017), [29] who used a large-scale Twitter network and longitudinal data on subjective well-being for each individual in the network to demonstrate that both a Friendship and a "happiness" paradox can occur in online social networks.

The friendship paradox has also been used as a means to identify structurally influential nodes within social networks, so as to magnify social contagion of diverse practices relevant to human welfare and public health. This has been shown to be possible in large-scale field trials related to the adoption of multivitamins [30] or maternal and child health practices [31] in Honduras or of iron-fortified salt in India. [32] This technique is valuable because, by exploiting the friendship paradox, one can identify such influential nodes without the expense and delay of actually mapping the whole network.

See also

Related Research Articles

<span class="mw-page-title-main">Log-normal distribution</span> Probability distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics (e.g., energies, concentrations, lengths, prices of financial instruments, and other metrics).

<span class="mw-page-title-main">Scale-free network</span> Network whose degree distribution follows a power law

A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as

<span class="mw-page-title-main">Polylogarithm</span> Special mathematical function

In mathematics, the polylogarithm (also known as Jonquière's function, for Alfred Jonquière) is a special function Lis(z) of order s and argument z. Only for special values of s does the polylogarithm reduce to an elementary function such as the natural logarithm or a rational function. In quantum statistics, the polylogarithm function appears as the closed form of integrals of the Fermi–Dirac distribution and the Bose–Einstein distribution, and is also known as the Fermi–Dirac integral or the Bose–Einstein integral. In quantum electrodynamics, polylogarithms of positive integer order arise in the calculation of processes represented by higher-order Feynman diagrams.

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

In particle theory, the skyrmion is a topologically stable field configuration of a certain class of non-linear sigma models. It was originally proposed as a model of the nucleon by Tony Skyrme in 1961. As a topological soliton in the pion field, it has the remarkable property of being able to model, with reasonable accuracy, multiple low-energy properties of the nucleon, simply by fixing the nucleon radius. It has since found application in solid-state physics, as well as having ties to certain areas of string theory.

In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes.

<span class="mw-page-title-main">Centrality</span> Degree of connectedness within a graph

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

<span class="mw-page-title-main">Degree distribution</span>

In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.

<span class="mw-page-title-main">Assortativity</span> Tendency for similar nodes to be connected

Assortativity, or assortative mixing, is a preference for a network's nodes to attach to others that are similar in some way. Though the specific measure of similarity may vary, network theorists often examine assortativity in terms of a node's degree. The addition of this characteristic to network models more closely approximates the behaviors of many real world networks.

<span class="mw-page-title-main">Random geometric graph</span> In graph theory, the mathematically simplest spatial network

In graph theory, a random geometric graph (RGG) is the mathematically simplest spatial network, namely an undirected graph constructed by randomly placing N nodes in some metric space and connecting two nodes by a link if and only if their distance is in a given range, e.g. smaller than a certain neighborhood radius, r.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

<span class="mw-page-title-main">Modularity (networks)</span> Measure of network community structure

Modularity is a measure of the structure of networks or graphs which measures the strength of division of a network into modules. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. Biological networks, including animal brains, exhibit a high degree of modularity. However, modularity maximization is not statistically consistent, and finds communities in its own null model, i.e. fully random graphs, and therefore it cannot be used to find statistically significant community structures in empirical networks. Furthermore, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities.

<span class="mw-page-title-main">Closeness centrality</span>

In a connected graph, closeness centrality of a node is a measure of centrality in a network, calculated as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph. Thus, the more central a node is, the closer it is to all other nodes.

<span class="mw-page-title-main">Degree-preserving randomization</span>

Degree Preserving Randomization is a technique used in Network Science that aims to assess whether or not variations observed in a given graph could simply be an artifact of the graph's inherent structural properties rather than properties unique to the nodes, in an observed network.

<span class="mw-page-title-main">Louvain method</span> Clustering and community detection algorithm

The Louvain method for community detection is a method to extract non-overlapping communities from large networks created by Blondel et al. from the University of Louvain. The method is a greedy optimization method that appears to run in time where is the number of nodes in the network.

<span class="mw-page-title-main">Lancichinetti–Fortunato–Radicchi benchmark</span> Algorithm

Lancichinetti–Fortunato–Radicchibenchmark is an algorithm that generates benchmark networks. They have a priori known communities and are used to compare different community detection methods. The advantage of the benchmark over other methods is that it accounts for the heterogeneity in the distributions of node degrees and of community sizes.

<span class="mw-page-title-main">Configuration model</span>

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.

<span class="mw-page-title-main">Copying network models</span>

Copying network models are network generation models that use a copying mechanism to form a network, by repeatedly duplicating and mutating existing nodes of the network. Such a network model has first been proposed in 1999 to explain the network of links between web pages, but since has been used to model biological and citation networks as well.

<span class="mw-page-title-main">Random graph theory of gelation</span>

Random graph theory of gelation is a mathematical theory for sol–gel processes. The theory is a collection of results that generalise the Flory–Stockmayer theory, and allow identification of the gel point, gel fraction, size distribution of polymers, molar mass distribution and other characteristics for a set of many polymerising monomers carrying arbitrary numbers and types of reactive functional groups.

A graph neural network (GNN) belongs to a class of artificial neural networks for processing data that can be represented as graphs.

References

  1. Feld, Scott L. (1991), "Why your friends have more friends than you do", American Journal of Sociology , 96 (6): 1464–1477, doi:10.1086/229693, JSTOR   2781907, S2CID   56043992 .
  2. Zuckerman, Ezra W.; Jost, John T. (2001), "What makes you think you're so popular? Self evaluation maintenance and the subjective side of the "friendship paradox"" (PDF), Social Psychology Quarterly, 64 (3): 207–223, doi:10.2307/3090112, JSTOR   3090112 .
  3. McRaney, David (2012), You are Not So Smart, Oneworld Publications, p. 160, ISBN   978-1-78074-104-8
  4. Felmlee, Diane; Faris, Robert (2013), "Interaction in social networks", in DeLamater, John; Ward, Amanda (eds.), Handbook of Social Psychology (2nd ed.), Springer, pp. 439–464, ISBN   978-9400767720 . See in particular "Friendship ties", p. 452.
  5. Lau, J. Y. F. (2011), An Introduction to Critical Thinking and Creativity: Think More, Think Better, John Wiley & Sons, p. 191, ISBN   978-1-118-03343-2
  6. Kanazawa, Satoshi (2009), "The Scientific Fundamentalist: A Look at the Hard Truths About Human Nature – Why your friends have more friends than you do", Psychology Today , archived from the original on 2009-11-07.
  7. Burkeman, Oliver (30 January 2010), "This column will change your life: Ever wondered why your friends seem so much more popular than you are? There's a reason for that", The Guardian .
  8. Lerman, Kristina; Yan, Xiaoran; Wu, Xin-Zeng (2016-02-17). "The "Majority Illusion" in Social Networks". PLOS ONE. 11 (2): e0147617. arXiv: 1506.03022 . Bibcode:2016PLoSO..1147617L. doi: 10.1371/journal.pone.0147617 . ISSN   1932-6203. PMC   4757419 . PMID   26886112.
  9. Alipourfard, Nazanin; Nettasinghe, Buddhika; Abeliuk, Andrés; Krishnamurthy, Vikram; Lerman, Kristina (2020-02-05). "Friendship paradox biases perceptions in directed networks". Nature Communications. 11 (1): 707. arXiv: 1905.05286 . Bibcode:2020NatCo..11..707A. doi:10.1038/s41467-020-14394-x. ISSN   2041-1723. PMC   7002371 . PMID   32024843.
  10. Ben Sliman, Malek; Kohli, Rajeev (2019), "The extended directed friendship paradox", SSRN, doi:10.2139/ssrn.3395317, S2CID   219376223
  11. Gupta, Yash; Chakrabarti, Soumen (2021), Friends of friends (PDF)
  12. Ghasemian, Amir; Christakis, Nicholas A. (2023-11-16). "The enmity paradox". Scientific Reports. 13 (1): 20040. arXiv: 2304.10076 . Bibcode:2023NatSR..1320040G. doi:10.1038/s41598-023-47167-9. ISSN   2045-2322. PMC   10654772 . PMID   37973933.
  13. 1 2 Cohen, Reuven; Havlin, Shlomo; ben-Avraham, Daniel (2003), "Efficient immunization strategies for computer networks and populations", Phys. Rev. Lett., 91 (24), 247901, arXiv: cond-mat/0207387 , Bibcode:2003PhRvL..91x7901C, doi:10.1103/PhysRevLett.91.247901, PMID   14683159, S2CID   919625 .
  14. Christakis, N. A.; Fowler, J. H. (2010), "Social network sensors for early detection of contagious outbreaks", PLOS ONE, 5 (9), e12948, arXiv: 1004.4792 , Bibcode:2010PLoSO...512948C, doi: 10.1371/journal.pone.0012948 , PMC   2939797 , PMID   20856792 .
  15. Wilson, Mark (November 2010), "Using the friendship paradox to sample a social network", Physics Today, 63 (11): 15–16, Bibcode:2010PhT....63k..15W, doi:10.1063/1.3518199 .
  16. Nettasinghe, Buddhika; Krishnamurthy, Vikram (2019). ""What Do Your Friends Think?": Efficient Polling Methods for Networks Using Friendship Paradox". IEEE Transactions on Knowledge and Data Engineering: 1. arXiv: 1802.06505 . doi:10.1109/tkde.2019.2940914. ISSN   1041-4347. S2CID   3335133.
  17. Feld, Scott L.; McGail, Alec (September 2020). "Egonets as systematically biased windows on society". Network Science. 8 (3): 399–417. doi:10.1017/nws.2020.5. ISSN   2050-1242. S2CID   216301650.
  18. 1 2 Galesic, Mirta; Bruine de Bruin, Wändi; Dalege, Jonas; Feld, Scott L.; Kreuter, Frauke; Olsson, Henrik; Prelec, Drazen; Stein, Daniel L.; van der Does, Tamara (July 2021). "Human social sensing is an untapped resource for computational social science". Nature. 595 (7866): 214–222. Bibcode:2021Natur.595..214G. doi:10.1038/s41586-021-03649-2. ISSN   1476-4687. PMID   34194037. S2CID   235697772.
  19. Christakis, Nicholas A.; Fowler, James H. (September 15, 2010). "Social Network Sensors for Early Detection of Contagious Outbreaks". PLOS ONE. 5 (9): e12948. arXiv: 1004.4792 . Bibcode:2010PLoSO...512948C. doi: 10.1371/journal.pone.0012948 . PMC   2939797 . PMID   20856792.
  20. Schnirring, Lisa (Sep 16, 2010). "Study: Friend 'sentinels' provide early flu warning". CIDRAP News. Archived from the original on May 6, 2013. Retrieved August 14, 2012.
  21. Garcia-Herranz, Manuel; Moro, Esteban; Cebrian, Manuel; Christakis, Nicholas A.; Fowler, James H. (2014-04-09). "Using Friends as Sensors to Detect Global-Scale Contagious Outbreaks". PLOS ONE. 9 (4): e92413. Bibcode:2014PLoSO...992413G. doi: 10.1371/journal.pone.0092413 . ISSN   1932-6203. PMC   3981694 . PMID   24718030.
  22. Kumar, Vineet; Krackhardt, David; Feld, Scott (2021-05-18). "Interventions with Inversity in Unknown Networks Can Help Regulate Contagion". arXiv: 2105.08758 [cs.SI].
  23. Eom, Young-Ho; Jo, Hang-Hyun (2015-05-11). "Tail-scope: Using friends to estimate heavy tails of degree distributions in large-scale complex networks". Scientific Reports. 5 (1): 9752. arXiv: 1411.6871 . Bibcode:2015NatSR...5E9752E. doi:10.1038/srep09752. ISSN   2045-2322. PMC   4426729 . PMID   25959097.
  24. 1 2 Nettasinghe, Buddhika; Krishnamurthy, Vikram (2021-05-19). "Maximum Likelihood Estimation of Power-law Degree Distributions via Friendship Paradox-based Sampling". ACM Transactions on Knowledge Discovery from Data. 15 (6): 1–28. arXiv: 1908.00310 . doi:10.1145/3451166. ISSN   1556-4681. S2CID   199064540.
  25. Eom, Young-Ho; Jo, Hang-Hyun (2014), "Generalized friendship paradox in complex networks: The case of scientific collaboration", Scientific Reports, 4, 4603, arXiv: 1401.1458 , Bibcode:2014NatSR...4E4603E, doi:10.1038/srep04603, PMC   3980335 , PMID   24714092
  26. Grund, Thomas U. (2014), "Why Your Friends Are More Important And Special Than You Think" (PDF), Sociological Science, 1: 128–140, doi: 10.15195/v1.a10
  27. Dickerson, Kelly (16 January 2014). "Why Your Friends Are Probably More Popular, Richer, and Happier Than You". Slate Magazine . The Slate Group . Retrieved 17 January 2014.
  28. Hodas, Nathan; Kooti, Farshad; Lerman, Kristina (May 2013). "Friendship Paradox Redux: Your Friends are More Interesting than You". arXiv: 1304.3480 [cs.SI].
  29. Bollen, Johan; Goncalves, Bruno; Van de Leemput, Ingrid; Guanchen, Ruan (2017), "The happiness paradox: your friends are happier than you", EPJ Data Science, 6, arXiv: 1602.02665 , doi:10.1140/epjds/s13688-017-0100-1, S2CID   2044182
  30. Kim, David A.; Hwong, Alison R.; Stafford, Derek; Hughes, D. Alex; O'Malley, A. James; Fowler, James H.; Christakis, Nicholas A. (2015-07-11). "Social network targeting to maximise population behaviour change: a cluster randomised controlled trial". Lancet. 386 (9989): 145–153. doi:10.1016/S0140-6736(15)60095-2. ISSN   1474-547X. PMC   4638320 . PMID   25952354.
  31. Shakya, Holly B.; Stafford, Derek; Hughes, D. Alex; Keegan, Thomas; Negron, Rennie; Broome, Jai; McKnight, Mark; Nicoll, Liza; Nelson, Jennifer; Iriarte, Emma; Ordonez, Maria; Airoldi, Edo; Fowler, James H.; Christakis, Nicholas A. (2017-03-01). "Exploiting social influence to magnify population-level behaviour change in maternal and child health: study protocol for a randomised controlled trial of network targeting algorithms in rural Honduras". BMJ Open. 7 (3): e012996. doi:10.1136/bmjopen-2016-012996. ISSN   2044-6055. PMC   5353315 . PMID   28289044.
  32. Alexander, Marcus; Forastiere, Laura; Gupta, Swati; Christakis, Nicholas A. (2022-07-26). "Algorithms for seeding social networks can enhance the adoption of a public health intervention in urban India". Proceedings of the National Academy of Sciences. 119 (30): e2120742119. Bibcode:2022PNAS..11920742A. doi: 10.1073/pnas.2120742119 . ISSN   0027-8424. PMC   9335263 . PMID   35862454.