Bipartite network projection

Last updated

Bipartite network projection is an extensively used method for compressing information about bipartite networks. [1] Since the one-mode projection is always less informative than the original bipartite graph, an appropriate method for weighting network connections is often required. Optimal weighting methods reflect the nature of the specific network, conform to the designer's objectives and aim at minimizing information loss.

Contents

Background

Bipartite networks are a particular class of complex networks, whose nodes are divided into two sets X and Y, and only connections between two nodes in different sets are allowed. For the convenience of directly showing the relation structure among a particular set of nodes, bipartite networks are usually compressed by one-mode projection. This means that the ensuing network contains nodes of only either of the two sets, and two X (or, alternatively, Y) nodes are connected only when they have at least one common neighboring Y (or, alternatively, X) node.

"Possible projections of a simple bipartite network" Bipartite network projection.png
"Possible projections of a simple bipartite network"

The simplest method involves projecting the bipartite network onto an unweighted network, without taking into account the topology of the network or the frequency of sharing a connection to the elements of the opposing set. Since bipartite networks with largely different structures can have exactly the same one-mode representation in this case, a lucid illustration of the original network topology usually requires the use of some weighting method.

Possible weighting methods

According to the designer's needs and the topological properties of the given network, several different weighting methods have been proposed. Since the redistribution of weights is found to have a strong effect on the community structure (especially in dense networks), the methodological choice must be made with care. [2]

  1. Simple weighting. Simple weighting means that edges are weighted directly by the number of times the common association is repeated. (This is the method applied in the attached graph on the right.) This approach works fine for a wide range of settings such as molecular gastronomy or most social networks. However, it can be misleading if the marginal impact of one additional association is not fixed but dependent on some characteristics of the network (e.g. on the original weight between the respective nodes). This can be the case for example in scientific collaborations as pointed out by Fan et al.. [2]
  2. Hyperbolic weighting. In the common case of decreasing marginal contribution of additional links to a node, the use of simple weighting might not be very illuminating. For example, in scientific collaboration networks, two scientists whose names appear on a paper together with many other coauthors are expected to know one another less than two who were the sole authors of a paper. [3] In order to account for this so-called saturation effect, it has been proposed to weight edges inversely according to the number of common affiliations in the neighboring set. This is most easily achieved by introducing a scaling factor 1/(n - 1) onto the simple count, which weakens the link between nodes with more popular common matches.
  3. Weighting based on resource allocation. With simple and hyperbolic weighting, the projected adjacency matrix is always set to be symmetrical, which implies that a link between two projected nodes carries the same weight for both vertices. Moreover, information contained by edges whose "target" nodes are of degree 1 in the original network will be lost in the projection, which can have grievous consequences in some real networks with a lot of independent edge sets. To overcome these shortcomings, Zhou et al. has proposed a weighting method that is based on assuming that a certain amount of resource is associated with each node in the projection, and the directional weight w_ij represents the proportion of the resource node j would like to distribute to node i. Resource allocation is based on the bipartite graph, involves equal distribution across neighbors and consists of two steps: first from the projected set to the non-projected set, and then back. Numerical simulations indicate that this projecting method can perform remarkably than some widely used method (such as collaborative filtering) for personal recommendation purposes. [1]

Backbones of bipartite projections

Each weighting method yields a weighted unipartite or one-mode network, where the weights reflect the extent to which two nodes shared common neighbors in the original bipartite network. The values of these weights depend on the degrees of the two sets of nodes in the original bipartite network. For example, in a co-authorship network, [4] the number of observed co-authorships depends on (1) the number of papers each author wrote and (2) the number of authors on each paper. Backbone algorithms designed for bipartite projections (in contrast to other weighted network backbone algorithms such as the disparity filter) use this information from the original bipartite network to identify statistically significantly large (or small) weights in the projection. [5] When only the edges with statistically significant weights are retained, these algorithms yield an unweighted and typically sparse "backbone" network that can be more informative to analyze and visualize.

Some notable applications

Related Research Articles

<span class="mw-page-title-main">Graph theory</span> Area of discrete mathematics

In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices which are connected by edges. A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics.

<span class="mw-page-title-main">Bipartite graph</span> Graph divided into two independent sets

In the mathematical field of graph theory, a bipartite graph is a graph whose vertices can be divided into two disjoint and independent sets and , that is every edge connects a vertex in to one in . Vertex sets and are usually called the parts of the graph. Equivalently, a bipartite graph is a graph that does not contain any odd-length cycles.

This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges.

<span class="mw-page-title-main">Graph (discrete mathematics)</span> Vertices connected in pairs by edges

In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called vertices and each of the related pairs of vertices is called an edge. Typically, a graph is depicted in diagrammatic form as a set of dots or circles for the vertices, joined by lines or curves for the edges. Graphs are one of the objects of study in discrete mathematics.

In the mathematical discipline of graph theory, a matching or independent edge set in an undirected graph is a set of edges without common vertices. In other words, a subset of the edges is a matching if each vertex appears in at most one edge of that matching. Finding a matching in a bipartite graph can be treated as a network flow problem.

In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes.

In graph theory, a cut is a partition of the vertices of a graph into two disjoint subsets. Any cut determines a cut-set, the set of edges that have one endpoint in each subset of the partition. These edges are said to cross the cut. In a connected graph, each cut-set determines a unique cut, and in some cases cuts are identified with their cut-sets rather than with their vertex partitions.

<span class="mw-page-title-main">Minimum cut</span> Partition of a graph by removing fewest possible edges

In graph theory, a minimum cut or min-cut of a graph is a cut that is minimal in some metric.

In computer science and graph theory, the maximum weight matching problem is the problem of finding, in a weighted graph, a matching in which the sum of weights is maximized.

Biological network inference is the process of making inferences and predictions about biological networks. By using networks to analyze patterns in biological systems, such as food-webs, we can visualize the nature and strength of interactions between species, DNA, proteins, and more.

In mathematics and social science, a collaboration graph is a graph modeling some social network where the vertices represent participants of that network and where two distinct participants are joined by an edge whenever there is a collaborative relationship between them of a particular kind. Collaboration graphs are used to measure the closeness of collaborative relationships between the participants of the network.

In graph theory and theoretical computer science, the longest path problem is the problem of finding a simple path of maximum length in a given graph. A path is called simple if it does not have any repeated vertices; the length of a path may either be measured by its number of edges, or by the sum of the weights of its edges. In contrast to the shortest path problem, which can be solved in polynomial time in graphs without negative-weight cycles, the longest path problem is NP-hard and the decision version of the problem, which asks whether a path exists of at least some given length, is NP-complete. This means that the decision problem cannot be solved in polynomial time for arbitrary graphs unless P = NP. Stronger hardness results are also known showing that it is difficult to approximate. However, it has a linear time solution for directed acyclic graphs, which has important applications in finding the critical path in scheduling problems.

In combinatorial optimization, the matroid intersection problem is to find a largest common independent set in two matroids over the same ground set. If the elements of the matroid are assigned real weights, the weighted matroid intersection problem is to find a common independent set with the maximum possible weight. These problems generalize many problems in combinatorial optimization including finding maximum matchings and maximum weight matchings in bipartite graphs and finding arborescences in directed graphs.

<span class="mw-page-title-main">Betweenness centrality</span> Measure of a graphs centrality, based on shortest paths

In graph theory, betweenness centrality is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such that either the number of edges that the path passes through or the sum of the weights of the edges is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex.

In social network analysis, the co-stardom network represents the collaboration graph of film actors i.e. movie stars. The co-stardom network can be represented by an undirected graph of nodes and links. Nodes correspond to the movie star actors and two nodes are linked if they co-starred (performed) in the same movie. The links are un-directed, and can be weighted or not depending on the goals of study. If the number of times two actors appeared in a movie is needed, links are assigned weights. The co-stardom network can also be represented by a bipartite graph where nodes are of two types: actors and movies. And edges connect different types of nodes if they have a relationship. Initially the network was found to have a small-world property. Afterwards, it was discovered that it exhibits a scale-free (power-law) behavior.

Random walk closeness centrality is a measure of centrality in a network, which describes the average speed with which randomly walking processes reach a node from other nodes of the network. It is similar to the closeness centrality except that the farness is measured by the expected length of a random walk rather than by the shortest path.

<span class="mw-page-title-main">Citation graph</span> Directed graph describing citations in documents

A citation graph, in information science and bibliometrics, is a directed graph that describes the citations within a collection of documents.

<span class="mw-page-title-main">Ingredient-flavor network</span>

In network science, ingredient-flavor networks are networks describing the sharing of flavor compounds of culinary ingredients. In the bipartite form, an ingredient-flavor network consist of two different types of nodes: the ingredients used in the recipes and the flavor compounds that contributes to the flavor of each ingredients. The links connecting different types of nodes are undirected, represent certain compound occur in each ingredients. The ingredient-flavor network can also be projected in the ingredient or compound space where nodes are ingredients or compounds, links represents the sharing of the same compounds to different ingredients or the coexistence in the same ingredient of different compounds.

<span class="mw-page-title-main">Disparity filter algorithm of weighted network</span>

Disparity filter is a network reduction algorithm to extract the backbone structure of undirected weighted network. Many real world networks such as citation networks, food web, airport networks display heavy tailed statistical distribution of nodes' weight and strength. Disparity filter can sufficiently reduce the network without destroying the multi-scale nature of the network. The algorithm is developed by M. Angeles Serrano, Marian Boguna and Alessandro Vespignani.

Network medicine is the application of network science towards identifying, preventing, and treating diseases. This field focuses on using network topology and network dynamics towards identifying diseases and developing medical drugs. Biological networks, such as protein-protein interactions and metabolic pathways, are utilized by network medicine. Disease networks, which map relationships between diseases and biological factors, also play an important role in the field. Epidemiology is extensively studied using network science as well; social networks and transportation networks are used to model the spreading of disease across populations. Network medicine is a medically focused area of systems biology.

References

  1. 1 2 "Bipartite network projection and personal recommendation" by Tao Zhou, Jie Ren, Matúš Medo and Yi-Cheng Zhang in PHYSICAL REVIEW E 76(4): 046115 (2007)
  2. 1 2 "The effect of weight on community structure of networks" by Ying Fan, Menghui Li, Peng Zhang, Jinshan Wu, Zengru Di in PHYSICA A 378 (2007) 583–590 [ permanent dead link ]
  3. "Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality" by M. E. J. Newman in PHYSICAL REVIEW E, vol. 64, 016132 (2001)
  4. 1 2 "Why social networks are different from other types of networks" by Newman and Park in PHYSICAL REVIEW E 68, 036122 (2003)
  5. "Comparing alternatives to the fixed degree sequence model for extracting the backbone of bipartite projections" by Zachary P. Neal, Rachel Domagalski, and Bruce Sagan in Scientific Reports 11, 23929 (2021)
  6. Ahn, Y. Y.; Ahnert, S. E.; Bagrow, J. P.; Barabási, A. L. (2011). ""Flavor network and the principles of food pairing" by Yong-Yeol Ahn, Sebastian E. Ahnert, James P. Bagrow & Albert-Laszlo Barabasi in NATURE, SCIENTIFIC REPORTS 1 : 196, DOI: 10.1038/srep00196" (PDF). Scientific Reports. 1: 196. doi:10.1038/srep00196. PMC   3240947 . PMID   22355711. Archived from the original (PDF) on 2012-03-07. Retrieved 2012-05-17.
  7. "The Small World of the American Corporate Elite, 1982-2001" by Davis, Yoo and Baker in STRATEGIC ORGANIZATION August 2003 vol. 1 no. 3 301-326
  8. "The human disease network" by Kwang-Il Goh, Michael E. Cusick, David Valle, Barton Childs, Marc Vidal, and Albert-Laszlo Barabasi in PNAS vol. 104, no. 21 (2007)
  9. "The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship, co-attendance and other co-behaviors" by Zachary P. Neal in Social Networks 39, 84-97 (2014)