Network homophily

Last updated

Network homophily refers to the theory in network science which states that, based on node attributes, similar nodes may be more likely to attach to each other than dissimilar ones. The hypothesis is linked to the model of preferential attachment and it draws from the phenomenon of homophily in social sciences and much of the scientific analysis of the creation of social ties based on similarity comes from network science. [1] In fact, empirical research seems to indicate the frequent occurrence of homophily in real networks. Homophily in social relations may lead to a commensurate distance in networks leading to the creation of clusters that have been observed in social networking services. [2] Homophily is a key topic in network science as it can determine the speed of the diffusion of information and ideas.

Contents

Node attributes and homophily

The existence of network homophily may necessitate a closer examination of node attributes as opposed to other theories on network evolution which focus on network properties. It is often assumed that nodes are identical and the evolution of networks is determined by the characteristics of the broader network such as the degree. Degree heterogeneity is also observed as a prevalent phenomenon (with a large number of nodes having a small number of links and a few of them having many). [3] It may be linked to homophily as the two seem to show similar characteristics in networks. A large number of excess links caused by degree heterogeneity might be confused with homophily. [4]

Influence on network evolution

Kim and Altmann (2017) find that homophily may affect the evolution of the degree distribution of scale-free networks. More specifically, homophily may cause a bias towards convexity instead of the often hypothesised concave shape of networks. [5] Thus, homophily can significantly (and uniformly) affect the emergence of scale-free networks influenced by preferential attachment, regardless of the type of seed networks observed (e.g. whether it is centralized or decentralized). Although the size of clusters might affect the magnitude of relative homophily. [5] A higher level of homophily can be associated to a more convex cumulative degree distribution instead of a concave one. Although not as salient, the link density of the network might also lead to short-term, localized deviations in the shape of the distribution.

In the development of the shape of the cumulative degree distribution curve the effects of the link structure of existing nodes (among themselves and with new nodes) and homophily work against each other, with the former leading to concavity and homophily causing convexity. Consequently, there is a level of homophily such that the two effects cancel each other out and the cumulative degree distribution reaches a linear shape in a log-log scale. [5] Large variety of shapes observed in empirical studies of real complex networks may be explained by these phenomena. A low level of homophily could then be linked to a convex shape of cumulative degree distributions which have been observed in networks of Facebook wall posts, Flickr users, and message boards; while linear shapes have been noted in the networks of and software class dependency, Yahoo adverts, and YouTube users. [5] Compared to these two shapes, convexity seems to be much less prevalent with examples in Google Plus and Filmtipset networks. This can be explained by the argument that high levels of homophily may significantly decrease the viability of networks, hence making convexity less frequent in complex networks.

Long-run convergence

In the long run, networks tend to converge in the case of unbiased network-based search. Nevertheless, younger nodes might show some bias in their connections. [6] Bias may arise during network development through random meetings and network based search which are the two main processes through which new agents connect to established nodes. Bramoullé et al. (2012) illustrate this by conducting a study on the citation network of physics journals from the American Physical Society (APS) between 1985 and 2003. [6] The two stages of the network development process of new nodes in this context is the random but potentially type biased finding of an article or reference by authors, and the discovery of references through citations in popular articles. Because similar articles are likely to cite similar references bias may arise in the formation of connections. Convergence is explained by three models of integration: weak integration, long-run integration, and partial integration. [6]

Weak integration states that well-established nodes have a higher tendency to create new connections than young nodes regardless of the type of the node. Thus bias in link probabilities is eliminated over time as nodes age. Long-run integration states that the types of neighbouring nodes of any node will converge to the global distribution of types of the network as a whole which eliminates biases among neighbouring nodes. Partial integration causes the distribution of type in neighbouring nodes to converge monotonically to the global distribution with time albeit with some bias in the limit.

Homophily leads new nodes to connect to similar nodes with a higher probability but this bias is less apparent between second degree nodes than between first degree nodes of any given node. With time the connections created by network-based search get more and more prevalent (with the increase in the number of neighbours), and because second degree connections contain more and more randomly found nodes the connections of older nodes become more diverse and less influenced by homophily. Thus the citations of an older article is likely to come from a larger variety of subjects and scientific research fields.

Related Research Articles

<span class="mw-page-title-main">Scale-free network</span> Network whose degree distribution follows a power law

A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as

<span class="mw-page-title-main">Social network analysis</span> Analysis of social structures using network and graph theory

Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of nodes and the ties, edges, or links that connect them. Examples of social structures commonly visualized through social network analysis include social media networks, meme spread, information circulation, friendship and acquaintance networks, peer learner networks, business networks, knowledge networks, difficult working relationships, collaboration graphs, kinship, disease transmission, and sexual relationships. These networks are often visualized through sociograms in which nodes are represented as points and ties are represented as lines. These visualizations provide a means of qualitatively assessing networks by varying the visual representation of their nodes and edges to reflect attributes of interest.

<span class="mw-page-title-main">Small-world experiment</span> Experiments examining the average path length for social networks

The small-world experiment comprised several experiments conducted by Stanley Milgram and other researchers examining the average path length for social networks of people in the United States. The research was groundbreaking in that it suggested that human society is a small-world-type network characterized by short path-lengths. The experiments are often associated with the phrase "six degrees of separation", although Milgram did not use this term himself.

<span class="mw-page-title-main">Diffusion of innovations</span> Theory on how and why new ideas spread

Diffusion of innovations is a theory that seeks to explain how, why, and at what rate new ideas and technology spread. The theory was popularized by Everett Rogers in his book Diffusion of Innovations, first published in 1962. Rogers argues that diffusion is the process by which an innovation is communicated through certain channels over time among the participants in a social system. The origins of the diffusion of innovations theory are varied and span multiple disciplines.

<span class="mw-page-title-main">Homophily</span> Process by which people befriend similar people

Homophily is a concept in sociology describing the tendency of individuals to associate and bond with similar others, as in the proverb "birds of a feather flock together". The presence of homophily has been discovered in a vast array of network studies: over 100 studies have observed homophily in some form or another, and they establish that similarity is associated with connection. The categories on which homophily occurs include age, gender, class, and organizational role.

<span class="mw-page-title-main">Small-world network</span> Graph where most nodes are reachable in a small number of steps

A small-world network is a graph characterized by a high clustering coefficient and low distances. On an example of social network, high clustering implies the high probability that two friends of one person are friends themselves. The low distances, on the other hand, mean that there is a short chain of social connections between any two people. Specifically, a small-world network is defined to be a network where the typical distance L between two randomly chosen nodes grows proportionally to the logarithm of the number of nodes N in the network, that is:

<span class="mw-page-title-main">Degree distribution</span>

In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.

In the study of complex networks, assortative mixing, or assortativity, is a bias in favor of connections between network nodes with similar characteristics. In the specific case of social networks, assortative mixing is also known as homophily. The rarer disassortative mixing is a bias in favor of connections between dissimilar nodes.

<span class="mw-page-title-main">Barabási–Albert model</span> Scale-free network generation algorithm

The Barabási–Albert (BA) model is an algorithm for generating random scale-free networks using a preferential attachment mechanism. Several natural and human-made systems, including the Internet, the World Wide Web, citation networks, and some social networks are thought to be approximately scale-free and certainly contain few nodes with unusually high degree as compared to the other nodes of the network. The BA model tries to explain the existence of such nodes in real networks. The algorithm is named for its inventors Albert-László Barabási and Réka Albert.

<span class="mw-page-title-main">Watts–Strogatz model</span> Method of generating random small-world graphs

The Watts–Strogatz model is a random graph generation model that produces graphs with small-world properties, including short average path lengths and high clustering. It was proposed by Duncan J. Watts and Steven Strogatz in their article published in 1998 in the Nature scientific journal. The model also became known as the (Watts) beta model after Watts used to formulate it in his popular science book Six Degrees.

<span class="mw-page-title-main">Assortativity</span> Tendency for similar nodes to be connected

Assortativity, or assortative mixing, is a preference for a network's nodes to attach to others that are similar in some way. Though the specific measure of similarity may vary, network theorists often examine assortativity in terms of a node's degree. The addition of this characteristic to network models more closely approximates the behaviors of many real world networks.

<span class="mw-page-title-main">Echo chamber (media)</span> Situation that reinforces beliefs by repetition inside a closed system

In news media and social media, an echo chamber is an environment or ecosystem in which participants encounter beliefs that amplify or reinforce their preexisting beliefs by communication and repetition inside a closed system and insulated from rebuttal. An echo chamber circulates existing views without encountering opposing views, potentially resulting in confirmation bias. Echo chambers may increase social and political polarization and extremism. On social media, it is thought that echo chambers limit exposure to diverse perspectives, and favor and reinforce presupposed narratives and ideologies.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

Heterophily, or love of the different, is the tendency of individuals to collect in diverse groups; it is the opposite of homophily. This phenomenon can be seen in relationships between individuals. As a result, it can be analyzed in the workplace to create a more efficient and innovative workplace. It has also become an area of social network analysis.

<span class="mw-page-title-main">Friendship paradox</span> Phenomenon that most people have fewer friends than their friends have, on average

The friendship paradox is the phenomenon first observed by the sociologist Scott L. Feld in 1991 that on average, an individual's friends have more friends than that individual. It can be explained as a form of sampling bias in which people with more friends are more likely to be in one's own friend group. In other words, one is less likely to be friends with someone who has very few friends. In contradiction to this, most people believe that they have more friends than their friends have.

<span class="mw-page-title-main">Evolving network</span>

Evolving networks are networks that change as a function of time. They are a natural extension of network science since almost all real world networks evolve over time, either by adding or removing nodes or links over time. Often all of these processes occur simultaneously, such as in social networks where people make and lose friends over time, thereby creating and destroying edges, and some people become part of new social networks or leave their networks, changing the nodes in the network. Evolving network concepts build on established network theory and are now being introduced into studying networks in many diverse fields.

Three degrees of influence is a theory in the realm of social networks, proposed by Nicholas A. Christakis and James H. Fowler in 2007. It has since been explored by scientists in numerous disciplines using diverse statistical, mathematical, psychological, sociological, and biological approaches. Numerous large-scale in-person and online experiments have documented this phenomenon in the intervening years.

A clique, in the social sciences, is a small group of individuals who interact with one another and share similar interests rather than include others. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popularity. Although cliques are most commonly studied during adolescence and middle childhood development, they exist in all age groups. They are often bound together by shared social characteristics such as ethnicity and socioeconomic status. Examples of common or stereotypical adolescent cliques include athletes, nerds, and "outsiders".

<span class="mw-page-title-main">Degree-preserving randomization</span>

Degree Preserving Randomization is a technique used in Network Science that aims to assess whether or not variations observed in a given graph could simply be an artifact of the graph's inherent structural properties rather than properties unique to the nodes, in an observed network.

In network theory, collective classification is the simultaneous prediction of the labels for multiple objects, where each label is predicted using information about the object's observed features, the observed features and labels of its neighbors, and the unobserved labels of its neighbors. Collective classification problems are defined in terms of networks of random variables, where the network structure determines the relationship between the random variables. Inference is performed on multiple random variables simultaneously, typically by propagating information between nodes in the network to perform approximate inference. Approaches that use collective classification can make use of relational information when performing inference. Examples of collective classification include predicting attributes of individuals in a social network, classifying webpages in the World Wide Web, and inferring the research area of a paper in a scientific publication dataset.

References

  1. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). "Birds of a Feather: Homophily in Social Networks". Annual Review of Sociology. 27:415–444.
  2. Himelboim, I., Sweetser, K. D., Tinkham, S. F., Cameron, K., Danelo, M., & West, K. (2014). Valence-based homophily on Twitter: Network Analysis of Emotions and Political Talk in the 2012 Presidential Election. New Media & Society.
  3. Albert, Réka; Barabási, Albert-László (2002). "Statistical mechanics of complex networks" (PDF). Reviews of Modern Physics 74: 47–97. arXiv : cond-mat/0106096. Bibcode : 2002RvMP...74...47A. doi : 10.1103/RevModPhys.74.47
  4. Graham, B. S. (July 2014). "An econometric model of link formation with degree heterogeneity". NBER Working Paper No. 20341. doi: 10.3386/w20341 .
  5. 1 2 3 4 Kim, K., & Altmann, J. (2017). "Effect of Homophily on Network Formation". Communications in Nonlinear Science and Numerical Simulation. 44: 482-494.
  6. 1 2 3 Bramoullé, Y.; Currarini, S.; Jackson, M. O.; Pin, P.; Rogers, B. W. (2012). "Homophily and long-run integration in social networks". Journal of Economic Theory . 147 (5): 1754–1786. arXiv: 1201.4564 . doi:10.1016/j.jet.2012.05.007.