Power graph analysis

Last updated

In computational biology, power graph analysis is a method for the analysis and representation of complex networks. Power graph analysis is the computation, analysis and visual representation of a power graph from a graph (networks).

Contents

Power graph analysis can be thought of as a lossless compression algorithm for graphs. [1] It extends graph syntax with representations of cliques, bicliques and stars. Compression levels of up to 95% have been obtained for complex biological networks.

Hypergraphs are a generalization of graphs in which edges are not just couples of nodes but arbitrary n-tuples. Power graphs are not another generalization of graphs, but instead a novel representation of graphs that proposes a shift from the "node and edge" language to one using cliques, bicliques and stars as primitives.

Power graphs

The primitive motifs used for power graph analysis and their corresponding diagrammatic representation: biclique, clique and star. PowerGraphs.png
The primitive motifs used for power graph analysis and their corresponding diagrammatic representation: biclique, clique and star.

Graphical representation

Graphs are drawn with circles or points that represent nodes and lines connecting pairs of nodes that represent edges. Power graphs extend the syntax of graphs with power nodes, which are drawn as a circle enclosing nodes or other power nodes, and power edges, which are lines between power nodes.

Bicliques are two sets of nodes with an edge between every member of one set and every member of the other set. In a power graph, a biclique is represented as an edge between two power nodes.

Cliques are a set of nodes with an edge between every pair of nodes. In a power graph, a clique is represented by a power node with a loop.

Stars are a set of nodes with an edge between every member of that set and a single node outside the set. In a power graph, a star is represented by a power edge between a regular node and a power node.

Formal definition

Given a graph where is the set of nodes and is the set of edges, a power graph is a graph defined on the power set of power nodes connected to each other by power edges: . Hence power graphs are defined on the power set of nodes as well as on the power set of edges of the graph .

The semantics of power graphs are as follows: if two power nodes are connected by a power edge, this means that all nodes of the first power node are connected to all nodes of the second power node. Similarly, if a power node is connected to itself by a power edge, this signifies that all nodes in the power node are connected to each other by edges.

The following two conditions are required:

Analogy to Fourier analysis

The Fourier analysis of a function can be seen as a rewriting of the function in terms of harmonic functions instead of pairs. This transformation changes the point of view from time domain to frequency domain and enables many interesting applications in signal analysis, data compression, and filtering. Similarly, Power graph analysis is a rewriting or decomposition of a network using bicliques, cliques and stars as primitive elements (just as harmonic functions for Fourier analysis). It can be used to analyze, compress and filter networks. There are, however, several key differences. First, in Fourier analysis the two spaces (time and frequency domains) are the same function space - but stricto sensu, power graphs are not graphs. Second, there is not a unique power graph representing a given graph. Yet a very interesting class of power graphs are minimal power graphs which have the fewest power edges and power nodes necessary to represent a given graph.

Minimal power graphs

Two different power graphs that represent the same graph. PowerGraphNonUnicity.png
Two different power graphs that represent the same graph.

In general, there is no unique minimal power graph for a given graph. In this example (right) a graph of four nodes and five edges admits two minimal power graphs of two power edges each. The main difference between these two minimal power graphs is the higher nesting level of the second power graph as well as a loss of symmetry with respect to the underlying graph. Loss of symmetry is only a problem in small toy examples since complex networks rarely exhibit such symmetries in the first place. Additionally, one can minimize the nesting level but even then, there is in general not a unique minimal power graph of minimal nesting level.

Power graph greedy algorithm

The power graph greedy algorithm relies on two simple steps to perform the decomposition:

The first step identifies candidate power nodes through a hierarchical clustering of the nodes in the network based on the similarity of their neighboring nodes. The similarity of two sets of neighbors is taken as the Jaccard index of the two sets.

The second step performs a greedy search for possible power edges between candidate power nodes. Power edges abstracting the most edges in the original network are added first to the power graph. Thus bicliques, cliques and stars are incrementally replaced with power edges, until all remaining single edges are also added. Candidate power nodes that are not the end point of any power edge are ignored.

Modular decomposition

Modular decomposition can be used to compute a power graph by using the strong modules of the modular decomposition. Modules in modular decomposition are groups of nodes in a graph that have identical neighbors. A Strong Module is a module that does not overlap with another module. However, in complex networks strong modules are more the exception than the rule. Therefore, the power graphs obtained through modular decomposition are far from minimality. The main difference between modular decomposition and power graph analysis is the emphasis of power graph analysis in decomposing graphs not only using modules of nodes but also modules of edges (cliques, bicliques). Indeed, power graph analysis can be seen as a loss-less simultaneous clustering of both nodes and edges.

Applications

Biological networks

Power graph analysis has been shown to be useful for the analysis of several types of biological networks such as protein-protein interaction networks, [2] domain-peptide binding motifs, gene regulatory networks [3] and homology/paralogy networks. Also a network of significant disease-trait pairs [4] have been recently visualized and analyzed with power graphs.

Network compression, a new measure derived from power graphs, has been proposed as a quality measure for protein interaction networks. [5]

Drug repositioning

Power graphs have been also applied to the analysis of drug-target-disease networks [6] for drug repositioning.

Social networks

Power graphs have been applied to large-scale data in social networks, for community mining [7] or for modeling author types. [8]

See also

Related Research Articles

<span class="mw-page-title-main">Graph theory</span> Area of discrete mathematics

In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices which are connected by edges. A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics.

<span class="mw-page-title-main">Tree decomposition</span> Mapping of a graph into a tree

In graph theory, a tree decomposition is a mapping of a graph into a tree that can be used to define the treewidth of the graph and speed up solving certain computational problems on the graph.

In mathematics, the transitive closureR+ of a homogeneous binary relation R on a set X is the smallest relation on X that contains R and is transitive. For finite sets, "smallest" can be taken in its usual sense, of having the fewest related pairs; for infinite sets R+ is the unique minimal transitive superset of R.

This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges.

<span class="mw-page-title-main">Clique (graph theory)</span> Subset of the vertices of a node-link graph that are all adjacent to each other

In the mathematical area of graph theory, a clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. That is, a clique of a graph is an induced subgraph of that is complete. Cliques are one of the basic concepts of graph theory and are used in many other mathematical problems and constructions on graphs. Cliques have also been studied in computer science: the task of finding whether there is a clique of a given size in a graph is NP-complete, but despite this hardness result, many algorithms for finding cliques have been studied.

<span class="mw-page-title-main">Cograph</span> Graph formed by complementation and disjoint union

In graph theory, a cograph, or complement-reducible graph, or P4-free graph, is a graph that can be generated from the single-vertex graph K1 by complementation and disjoint union. That is, the family of cographs is the smallest class of graphs that includes K1 and is closed under complementation and disjoint union.

Johnson's algorithm is a way to find the shortest paths between all pairs of vertices in an edge-weighted directed graph. It allows some of the edge weights to be negative numbers, but no negative-weight cycles may exist. It works by using the Bellman–Ford algorithm to compute a transformation of the input graph that removes all negative weights, allowing Dijkstra's algorithm to be used on the transformed graph. It is named after Donald B. Johnson, who first published the technique in 1977.

<span class="mw-page-title-main">Markov random field</span>

In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties. The concept originates from the Sherrington–Kirkpatrick model.

<span class="mw-page-title-main">Maximal independent set</span> Independent set which is not a subset of any other independent set

In graph theory, a maximal independent set (MIS) or maximal stable set is an independent set that is not a subset of any other independent set. In other words, there is no vertex outside the independent set that may join it because it is maximal with respect to the independent set property.

In graph theory, the treewidth of an undirected graph is an integer number which specifies, informally, how far the graph is from being a tree. The smallest treewidth is 1; the graphs with treewidth 1 are exactly the trees and the forests. The graphs with treewidth at most 2 are the series–parallel graphs. The maximal graphs with treewidth exactly k are called k-trees, and the graphs with treewidth at most k are called partial k-trees. Many other well-studied graph families also have bounded treewidth.

In constraint satisfaction, a decomposition method translates a constraint satisfaction problem into another constraint satisfaction problem that is binary and acyclic. Decomposition methods work by grouping variables into sets, and solving a subproblem for each set. These translations are done because solving binary acyclic problems is a tractable problem.

<span class="mw-page-title-main">Community structure</span> Concept in graph theory

In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into sets of nodes such that each set of nodes is densely connected internally. In the particular case of non-overlapping community finding, this implies that the network divides naturally into groups of nodes with dense connections internally and sparser connections between groups. But overlapping communities are also allowed. The more general definition is based on the principle that pairs of nodes are more likely to be connected if they are both members of the same community(ies), and less likely to be connected if they do not share communities. A related but different problem is community search, where the goal is to find a community that a certain vertex belongs to.

<span class="mw-page-title-main">Modularity (networks)</span> Measure of network community structure

Modularity is a measure of the structure of networks or graphs which measures the strength of division of a network into modules. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. Biological networks, including animal brains, exhibit a high degree of modularity. However, modularity maximization is not statistically consistent, and finds communities in its own null model, i.e. fully random graphs, and therefore it cannot be used to find statistically significant community structures in empirical networks. Furthermore, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities.

In the mathematical fields of graph theory and combinatorial optimization, the bipartite dimension or biclique cover number of a graph G = (VE) is the minimum number of bicliques (that is complete bipartite subgraphs), needed to cover all edges in E. A collection of bicliques covering all edges in G is called a biclique edge cover, or sometimes biclique cover. The bipartite dimension of G is often denoted by the symbol d(G).

In graph theory, the modular decomposition is a decomposition of a graph into subsets of vertices called modules. A module is a generalization of a connected component of a graph. Unlike connected components, however, one module can be a proper subset of another. Modules therefore lead to a recursive (hierarchical) decomposition of the graph, instead of just a partition.

In the mathematical field of graph theory, the intersection number of a graph is the smallest number of elements in a representation of as an intersection graph of finite sets. In such a representation, each vertex is represented as a set, and two vertices are connected by an edge whenever their sets have a common element. Equivalently, the intersection number is the smallest number of cliques needed to cover all of the edges of .

In the study of graph algorithms, Courcelle's theorem is the statement that every graph property definable in the monadic second-order logic of graphs can be decided in linear time on graphs of bounded treewidth. The result was first proved by Bruno Courcelle in 1990 and independently rediscovered by Borie, Parker & Tovey (1992). It is considered the archetype of algorithmic meta-theorems.

<span class="mw-page-title-main">Skew partition</span>

In graph theory, a skew partition of a graph is a partition of its vertices into two subsets, such that the induced subgraph formed by one of the two subsets is disconnected and the induced subgraph formed by the other subset is the complement of a disconnected graph. Skew partitions play an important role in the theory of perfect graphs.

In the mathematical fields of graph theory and finite model theory, the logic of graphs deals with formal specifications of graph properties using sentences of mathematical logic. There are several variations in the types of logical operation that can be used in these sentences. The first-order logic of graphs concerns sentences in which the variables and predicates concern individual vertices and edges of a graph, while monadic second-order graph logic allows quantification over sets of vertices or edges. Logics based on least fixed point operators allow more general predicates over tuples of vertices, but these predicates can only be constructed through fixed-point operators, restricting their power.

<span class="mw-page-title-main">Graham–Pollak theorem</span>

In graph theory, the Graham–Pollak theorem states that the edges of an -vertex complete graph cannot be partitioned into fewer than complete bipartite graphs. It was first published by Ronald Graham and Henry O. Pollak in two papers in 1971 and 1972, in connection with an application to telephone switching circuitry.

References

  1. Matthias Reimann; Loïc Royer; Simone Daminelli; Michael Schroeder (2015). Matthias Dehmer; Frank Emmert-Streib; Stefan Pickl (eds.). Computational Network Theory: Theoretical Foundations and Applications. Quantitative and Network Biology Series. Vol. 5. Wiley-Blackwell. ISBN   978-3-527-33724-8.
  2. Royer, Loïc; Reimann, Matthias; Andreopoulos, Bill; Schroeder, Michael (11 Jul 2008). Berg, Johannes (ed.). "Unraveling Protein Networks with Power Graph Analysis". PLOS Computational Biology. 4 (7): e1000108. Bibcode:2008PLSCB...4E0108R. doi:10.1371/journal.pcbi.1000108. PMC   2424176 . PMID   18617988.
  3. Martina Maisel; Hans-Jörg Habisch; Loïc Royer; Alexander Herr; Javorina Milosevic; Andreas Hermann; Stefan Liebau; Rolf Brenner; Johannes Schwarz; Michael Schroeder; Alexander Storch (15 Oct 2010). "Genome-wide expression profiling and functional network analysis upon neuroectodermal conversion of human mesenchymal stem cells suggest HIF-1 and miR-124a as important regulators". Experimental Cell Research. 316 (17): 2760–78. doi:10.1016/j.yexcr.2010.06.012. PMID   20599952.
  4. Li, Li; Ruau, David J.; Patel, Chirag J.; Weber, Susan C.; Chen, Rong; Tatonetti, Nicholas P.; Dudley, Joel T.; Butte, Atul J. (30 April 2014). "Disease Risk Factors Identified Through Shared Genetic Architecture and Electronic Medical Records". Sci. Transl. Med. 6 (234): 234ra57. doi:10.1126/scitranslmed.3007191. PMC   4323098 . PMID   24786325.
  5. Royer, Loïc; Reimann, Matthias; Stewart, Francis A.; Schroeder, Michael (18 Jun 2012). "Network compression as a quality measure for protein interaction networks". PLOS ONE. 7 (6): e35729. Bibcode:2012PLoSO...735729R. doi: 10.1371/journal.pone.0035729 . PMC   3377704 . PMID   22719828.
  6. Daminelli, Simone; Haupt, Joachim V.; Reimann, Matthias; Schroeder, Michael (26 Apr 2012). "Drug repositioning through incomplete bi-cliques in an integrated drug–target–disease network". Integrative Biology. 4 (7): 778–88. doi:10.1039/C2IB00154C. PMID   22538435.
  7. George Tsatsaronis; Matthias Reimann; Iraklis Varlamis; Orestis Gkorgkas; Kjetil Nørvåg (2011). "Efficient community detection using power graph analysis". Proceedings of the 9th workshop on Large-scale and distributed informational retrieval. Lsds-Ir '11. pp. 21–26. doi:10.1145/2064730.2064738. ISBN   9781450309592. S2CID   10224386.{{cite book}}: CS1 maint: date and year (link)
  8. George Tsatsaronis; Iraklis Varlamis; Sunna Torge; Matthias Reimann; Kjetil Nørvåg; Michael Schroeder; Matthias Zschunke (2011). "How to Become a Group Leader? or Modeling Author Types Based on Graph Mining". Research and Advanced Technology for Digital Libraries: International Conference on Theory and Practice of Digital Libraries, TPDL. Lecture Notes in Computer Science. Vol. 6966. SpringerLink. pp. 15–26. CiteSeerX   10.1.1.299.714 . doi:10.1007/978-3-642-24469-8_4. ISBN   978-3-642-24468-1.