Force-directed graph drawing

Last updated
Social network visualization using a force-directed graph drawing algorithm SocialNetworkAnalysis.png
Social network visualization using a force-directed graph drawing algorithm
Visualization of links between pages on a wiki using a force-directed layout Visualization of wiki structure using prefuse visualization package.png
Visualization of links between pages on a wiki using a force-directed layout

Force-directed graph drawing algorithms are a class of algorithms for drawing graphs in an aesthetically-pleasing way. Their purpose is to position the nodes of a graph in two-dimensional or three-dimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of edges and the set of nodes, based on their relative positions, and then using these forces either to simulate the motion of the edges and nodes or to minimize their energy. [2]

Contents

While graph drawing can be a difficult problem, force-directed algorithms, being physical simulations, usually require no special knowledge about graph theory such as planarity.

Forces

Force-directed graph drawing algorithms assign forces among the set of edges and the set of nodes of a graph drawing. Typically, spring-like attractive forces based on Hooke's law are used to attract pairs of endpoints of the graph's edges towards each other, while simultaneously repulsive forces like those of electrically charged particles based on Coulomb's law are used to separate all pairs of nodes. In equilibrium states for this system of forces, the edges tend to have uniform length (because of the spring forces), and nodes that are not connected by an edge tend to be drawn further apart (because of the electrical repulsion). Edge attraction and vertex repulsion forces may be defined using functions that are not based on the physical behavior of springs and particles; for instance, some force-directed systems use springs whose attractive force is logarithmic rather than linear.

An alternative model considers a spring-like force for every pair of nodes where the ideal length of each spring is proportional to the graph-theoretic distance between nodes i and j, without using a separate repulsive force. Minimizing the difference (usually the squared difference) between Euclidean and ideal distances between nodes is then equivalent to a metric multidimensional scaling problem.

A force-directed graph can involve forces other than mechanical springs and electrical repulsion. A force analogous to gravity may be used to pull vertices towards a fixed point of the drawing space; this may be used to pull together different connected components of a disconnected graph, which would otherwise tend to fly apart from each other because of the repulsive forces, and to draw nodes with greater centrality to more central positions in the drawing; [3] it may also affect the vertex spacing within a single component. Analogues of magnetic fields may be used for directed graphs. Repulsive forces may be placed on edges as well as on nodes in order to avoid overlap or near-overlap in the final drawing. In drawings with curved edges such as circular arcs or spline curves, forces may also be placed on the control points of these curves, for instance to improve their angular resolution. [4]

Methods

Once the forces on the nodes and edges of a graph have been defined, the behavior of the entire graph under these sources may then be simulated as if it were a physical system. In such a simulation, the forces are applied to the nodes, pulling them closer together or pushing them further apart. This is repeated iteratively until the system comes to a mechanical equilibrium state; i.e., their relative positions do not change anymore from one iteration to the next. The positions of the nodes in this equilibrium are used to generate a drawing of the graph.

For forces defined from springs whose ideal length is proportional to the graph-theoretic distance, stress majorization gives a very well-behaved (i.e., monotonically convergent) [5] and mathematically elegant way to minimize these differences and, hence, find a good layout for the graph.

It is also possible to employ mechanisms that search more directly for energy minima, either instead of or in conjunction with physical simulation. Such mechanisms, which are examples of general global optimization methods, include simulated annealing and genetic algorithms.

Advantages

The following are among the most important advantages of force-directed algorithms:

Good-quality results
At least for graphs of medium size (up to 50–500 vertices), the results obtained have usually very good quality based on the following criteria: uniform edge length, uniform vertex distribution and showing symmetry. This last criterion is among the most important ones and is hard to achieve with any other type of algorithm.
Flexibility
Force-directed algorithms can be easily adapted and extended to fulfill additional aesthetic criteria. This makes them the most versatile class of graph drawing algorithms. Examples of existing extensions include the ones for directed graphs, 3D graph drawing, [6] cluster graph drawing, constrained graph drawing, and dynamic graph drawing.
Intuitive
Since they are based on physical analogies of common objects, like springs, the behavior of the algorithms is relatively easy to predict and understand. This is not the case with other types of graph-drawing algorithms.
Simplicity
Typical force-directed algorithms are simple and can be implemented in a few lines of code. Other classes of graph-drawing algorithms, like the ones for orthogonal layouts, are usually much more involved.
Interactivity
Another advantage of this class of algorithm is the interactive aspect. By drawing the intermediate stages of the graph, the user can follow how the graph evolves, seeing it unfold from a tangled mess into a good-looking configuration. In some interactive graph drawing tools, the user can pull one or more nodes out of their equilibrium state and watch them migrate back into position. This makes them a preferred choice for dynamic and online graph-drawing systems.
Strong theoretical foundations
While simple ad-hoc force-directed algorithms often appear in the literature and in practice (because they are relatively easy to understand), more reasoned approaches are starting to gain traction. Statisticians have been solving similar problems in multidimensional scaling (MDS) since the 1930s, and physicists also have a long history of working with related n-body problems - so extremely mature approaches exist. As an example, the stress majorization approach to metric MDS can be applied to graph drawing as described above. This has been proven to converge monotonically. [5] Monotonic convergence, the property that the algorithm will at each iteration decrease the stress or cost of the layout, is important because it guarantees that the layout will eventually reach a local minimum and stop. Damping schedules cause the algorithm to stop, but cannot guarantee that a true local minimum is reached.

Disadvantages

The main disadvantages of force-directed algorithms include the following:

High running time
The typical force-directed algorithms are in general considered to run in cubic time (), where is the number of nodes of the input graph. This is because the number of iterations is estimated to be linear (), and in every iteration, all pairs of nodes need to be visited and their mutual repulsive forces computed. This is related to the N-body problem in physics. However, since repulsive forces are local in nature the graph can be partitioned such that only neighboring vertices are considered. Common techniques used by algorithms for determining the layout of large graphs include high-dimensional embedding, [7] multi-layer drawing and other methods related to N-body simulation. For example, the Barnes–Hut simulation-based method FADE [8] can improve the running time to be linearithmic, or per iteration. As a rough guide, in a few seconds one can expect to draw at most 1,000 nodes with a standard per iteration technique, and 100,000 with a per iteration technique. [8] Force-directed algorithms, when combined with a graph clustering approach, can draw graphs of millions of nodes. [9]
Poor local minima
It is easy to see that force-directed algorithms produce a graph with minimal energy, in particular one whose total energy is only a local minimum. The local minimum found can be, in many cases, considerably worse than a global minimum, which translates into a low-quality drawing. For many algorithms, especially the ones that allow only down-hill moves of the vertices, the final result can be strongly influenced by the initial layout, that in most cases is randomly generated. The problem of poor local minima becomes more important as the number of vertices of the graph increases. A combined application of different algorithms is helpful to solve this problem. [10] For example, using the Kamada–Kawai algorithm [11] to quickly generate a reasonable initial layout and then the Fruchterman–Reingold algorithm [12] to improve the placement of neighbouring nodes. Another technique to achieve a global minimum is to use a multilevel approach. [13]

History

Force-directed methods in graph drawing date back to the work of Tutte (1963), who showed that polyhedral graphs may be drawn in the plane with all faces convex by fixing the vertices of the outer face of a planar embedding of the graph into convex position, placing a spring-like attractive force on each edge, and letting the system settle into an equilibrium. [14] Because of the simple nature of the forces in this case, the system cannot get stuck in local minima, but rather converges to a unique global optimum configuration. Because of this work, embeddings of planar graphs with convex faces are sometimes called Tutte embeddings.

The combination of attractive forces on adjacent vertices, and repulsive forces on all vertices, was first used by Eades (1984); [15] additional pioneering work on this type of force-directed layout was done by Fruchterman & Reingold (1991). [12] The idea of using only spring forces between all pairs of vertices, with ideal spring lengths equal to the vertices' graph-theoretic distance, is from Kamada & Kawai (1989). [11]

See also

Related Research Articles

<span class="mw-page-title-main">Graph theory</span> Area of discrete mathematics

In mathematics and computer science, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices which are connected by edges. A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics.

In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cross each other. Such a drawing is called a plane graph, or a planar embedding of the graph. A plane graph can be defined as a planar graph with a mapping from every node to a point on a plane, and from every edge to a plane curve on that plane, such that the extreme points of each curve are the points mapped from its end nodes, and all curves are disjoint except on their extreme points.

<span class="mw-page-title-main">Breadth-first search</span> Algorithm to search the nodes of a graph

Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level. Extra memory, usually a queue, is needed to keep track of the child nodes that were encountered but not yet explored.

<span class="mw-page-title-main">Depth-first search</span> Search algorithm

Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node and explores as far as possible along each branch before backtracking. Extra memory, usually a stack, is needed to keep track of the nodes discovered so far along a specified branch which helps in backtracking of the graph.

<span class="mw-page-title-main">Bellman–Ford algorithm</span> Algorithm for finding the shortest paths in graphs

The Bellman–Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph. It is slower than Dijkstra's algorithm for the same problem, but more versatile, as it is capable of handling graphs in which some of the edge weights are negative numbers. The algorithm was first proposed by Alfonso Shimbel, but is instead named after Richard Bellman and Lester Ford Jr., who published it in 1958 and 1956, respectively. Edward F. Moore also published a variation of the algorithm in 1959, and for this reason it is also sometimes called the Bellman–Ford–Moore algorithm.

<span class="mw-page-title-main">Hypergraph</span> Generalization of graph theory

In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices.

<span class="mw-page-title-main">Graph drawing</span> Visualization of node-link graphs

Graph drawing is an area of mathematics and computer science combining methods from geometric graph theory and information visualization to derive two-dimensional depictions of graphs arising from applications such as social network analysis, cartography, linguistics, and bioinformatics.

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

<span class="mw-page-title-main">Graph embedding</span> Embedding a graph in a topological space, often Euclidean

In topological graph theory, an embedding of a graph on a surface is a representation of on in which points of are associated with vertices and simple arcs are associated with edges in such a way that:

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of vertices from an n-vertex graph can partition the graph into disjoint subgraphs each of which has at most vertices.

In polyhedral combinatorics, a branch of mathematics, Steinitz's theorem is a characterization of the undirected graphs formed by the edges and vertices of three-dimensional convex polyhedra: they are exactly the 3-vertex-connected planar graphs. That is, every convex polyhedron forms a 3-connected planar graph, and every 3-connected planar graph can be represented as the graph of a convex polyhedron. For this reason, the 3-connected planar graphs are also known as polyhedral graphs.

<span class="mw-page-title-main">Matchstick graph</span> Graph with edges of length one, able to be drawn without crossings

In geometric graph theory, a branch of mathematics, a matchstick graph is a graph that can be drawn in the plane in such a way that its edges are line segments with length one that do not cross each other. That is, it is a graph that has an embedding which is simultaneously a unit distance graph and a plane graph. For this reason, matchstick graphs have also been called planar unit-distance graphs. Informally, matchstick graphs can be made by placing noncrossing matchsticks on a flat surface, hence the name.

<span class="mw-page-title-main">NetworkX</span> Python library for graphs and networks

NetworkX is a Python library for studying graphs and networks. NetworkX is free software released under the BSD-new license.

<span class="mw-page-title-main">Layered graph drawing</span> Graph drawing with vertices in horizontal layers

Layered graph drawing or hierarchical graph drawing is a type of graph drawing in which the vertices of a directed graph are drawn in horizontal rows or layers with the edges generally directed downwards. It is also known as Sugiyama-style graph drawing after Kozo Sugiyama, who first developed this drawing style.

<span class="mw-page-title-main">Circular layout</span> Graph drawing with vertices on a circle

In graph drawing, a circular layout is a style of drawing that places the vertices of a graph on a circle, often evenly spaced so that they form the vertices of a regular polygon.

<span class="mw-page-title-main">Arc diagram</span> Graph drawing with vertices on a line

An arc diagram is a style of graph drawing, in which the vertices of a graph are placed along a line in the Euclidean plane, with edges being drawn as semicircles in one or both of the two halfplanes bounded by the line, or as smooth curves formed by sequences of semicircles. In some cases, line segments of the line itself are also allowed as edges, as long as they connect only vertices that are consecutive along the line. Variations of this drawing style in which the semicircles are replaced by convex curves of some other type are also commonly called arc diagrams.

<span class="mw-page-title-main">NodeXL</span> Network analysis and visualization package for Microsoft Excel

NodeXL is a network analysis and visualization software package for Microsoft Excel 2007/2010/2013/2016. The package is similar to other network visualization tools such as Pajek, UCINet, and Gephi. It is widely applied in ring, mapping of vertex and edge, and customizable visual attributes and tags. NodeXL enables researchers to undertake social network analysis work metrics such as centrality, degree, and clustering, as well as monitor relational data and describe the overall relational network structure. When applied to Twitter data analysis, it showed the total network of all users participating in public discussion and its internal structure through data mining. It allows social Network analysis (SNA) to emphasize the relationships rather than the isolated individuals or organizations, allowing interested parties to investigate the two-way dialogue between organizations and the public. SNA also provides a flexible measurement system and parameter selection to confirm the influential nodes in the network, such as in-degree and out-degree centrality. The software contains network visualization, social network analysis features, access to social media network data importers, advanced network metrics, and automation.

In graph drawing and geometric graph theory, a Tutte embedding or barycentric embedding of a simple, 3-vertex-connected, planar graph is a crossing-free straight-line embedding with the properties that the outer face is a convex polygon and that each interior vertex is at the average of its neighbors' positions. If the outer polygon is fixed, this condition on the interior vertices determines their position uniquely as the solution to a system of linear equations. Solving the equations geometrically produces a planar embedding. Tutte's spring theorem, proven by W. T. Tutte, states that this unique solution is always crossing-free, and more strongly that every face of the resulting planar embedding is convex. It is called the spring theorem because such an embedding can be found as the equilibrium position for a system of springs representing the edges of the graph.

Simultaneous embedding is a technique in graph drawing and information visualization for visualizing two or more different graphs on the same or overlapping sets of labeled vertices, while avoiding crossings within both graphs. Crossings between an edge of one graph and an edge of the other graph are allowed.

<span class="mw-page-title-main">Cutwidth</span> Property in graph theory

In graph theory, the cutwidth of an undirected graph is the smallest integer with the following property: there is an ordering of the vertices of the graph, such that every cut obtained by partitioning the vertices into earlier and later subsets of the ordering is crossed by at most edges. That is, if the vertices are numbered , then for every , the number of edges with and is at most .

References

  1. Grandjean, Martin (2015), "Introduction à la visualisation de données, l'analyse de réseau en histoire", Geschichte und Informatik 18/19 (PDF), pp. 109–128
  2. Kobourov, Stephen G. (2012), Spring Embedders and Force-Directed Graph Drawing Algorithms, arXiv: 1201.3011 , Bibcode:2012arXiv1201.3011K .
  3. Bannister, M. J.; Eppstein, D.; Goodrich, M. T.; Trott, L. (2012), "Force-directed graph drawing using social gravity and scaling", Proc. 20th Int. Symp. Graph Drawing, arXiv: 1209.0748 , Bibcode:2012arXiv1209.0748B .
  4. Chernobelskiy, R.; Cunningham, K.; Goodrich, M. T.; Kobourov, S. G.; Trott, L. (2011), "Force-directed Lombardi-style graph drawing", Proc. 19th Symposium on Graph Drawing (PDF), pp. 78–90.
  5. 1 2 de Leeuw, Jan (1988), "Convergence of the majorization method for multidimensional scaling", Journal of Classification, 5 (2), Springer: 163–180, doi:10.1007/BF01897162, S2CID   122413124 .
  6. Vose, Aaron. "3D Phylogenetic Tree Viewer" . Retrieved 3 June 2012.
  7. Harel, David; Koren, Yehuda (2002), "Graph drawing by high-dimensional embedding", Proceedings of the 9th International Symposium on Graph Drawing, Springer, pp. 207–219, CiteSeerX   10.1.1.20.5390 , ISBN   3-540-00158-1
  8. 1 2 Quigley, Aaron; Eades, Peter (2001), "FADE: Graph Drawing, Clustering, and Visual Abstraction", Proceedings of the 8th International Symposium on Graph Drawing (PDF), pp. 197–210, ISBN   3-540-41554-8 .
  9. "A Gallery of Large Graphs" . Retrieved 22 Oct 2017.
  10. Collberg, Christian; Kobourov, Stephen; Nagra, Jasvir; Pitts, Jacob; Wampler, Kevin (2003), "A System for Graph-based Visualization of the Evolution of Software", Proceedings of the 2003 ACM Symposium on Software Visualization (SoftVis '03), New York, NY, USA: ACM, pp. 77–86, figures on p. 212, doi:10.1145/774833.774844, ISBN   1-58113-642-0, S2CID   824991, To achieve an aesthetically pleasing layout of the graph it is also necessary to employ modified Fruchterman–Reingold forces, as the Kamada–Kawai method does not achieve satisfactory methods by itself but rather creates a good approximate layout so that the Fruchterman-Reingold calculations can quickly "tidy up" the layout.
  11. 1 2 Kamada, Tomihisa; Kawai, Satoru (1989), "An algorithm for drawing general undirected graphs", Information Processing Letters, 31 (1), Elsevier: 7–15, doi:10.1016/0020-0190(89)90102-6 .
  12. 1 2 Fruchterman, Thomas M. J.; Reingold, Edward M. (1991), "Graph Drawing by Force-Directed Placement", Software: Practice and Experience, 21 (11), Wiley: 1129–1164, doi:10.1002/spe.4380211102, S2CID   31468174 .
  13. http://jgaa.info/accepted/2003/Walshaw2003.7.3.pdf A Multilevel Algorithm for Force-Directed Graph-Drawing
  14. Tutte, W. T. (1963), "How to draw a graph", Proceedings of the London Mathematical Society, 13 (52): 743–768, doi:10.1112/plms/s3-13.1.743 .
  15. Eades, Peter (1984), "A Heuristic for Graph Drawing", Congressus Numerantium, 42 (11): 149–160.

Further reading