A method for pruning dense networks to highlight key links
Information mapping |
---|
Topics and fields |
Node–link approaches |
|
See also |
Relationships among a set of elements are often represented as a square matrix with entries representing the relations between all pairs of the elements. Relations such as distances, dissimilarities, similarities, relatedness, correlations, co-occurrences, conditional probabilities, etc., can be represented by such matrices. Such data can also be represented as networks with weighted links between the elements. Such matrices and networks are extremely dense and are not easily apprehended without some form of data reduction or pruning.
A pathfinder network results from applying a pruning method that removes weaker links from a (usually dense) network according to the lengths of alternative paths (see below). [1] [2] [3] It is used as a psychometric scaling method based on graph theory and used in the study of expertise, education, [4] knowledge acquisition, mental models, [5] and knowledge engineering. It is also employed in generating communication networks, [6] software debugging, [7] visualizing scientific citation patterns, [8] information retrieval, and other forms of data visualization. [9] Pathfinder networks are potentially applicable to any problem addressed by network theory.
Network pruning aims to highlight the more important links between elements represented in a network. It helps to simplify the collection of connections involved which is valuable in data visualization and in comprehending essential relations among the elements represented in the network.
Several psychometric scaling methods start from pairwise data and yield structures revealing the underlying organization of the data. Data clustering and multidimensional scaling are two such methods. Network scaling represents another method based on graph theory. Pathfinder networks are derived from matrices of data for pairs of entities. Because the algorithm uses distances, similarity data are inverted to yield dissimilarities for the computations.
In the pathfinder network, the entities correspond to the nodes of the generated network, and the links in the network are determined by the patterns of proximities. For example, if the proximities are similarities, links will generally connect nodes of high similarity. When proximities are distances or dissimilarities, links will connect the shorter distances. The links in the network will be undirected if the proximities are symmetrical for every pair of entities. Symmetrical proximities mean that the order of the entities is not important, so the proximity of i and j is the same as the proximity of j and i for all pairs i,j. If the proximities are not symmetrical for every pair, the links will be directed.
The pathfinder algorithm uses two parameters.
Path distance is computed as: , where is the distance of the link in the path and . For , is simply the sum of the distances of the links in the path. For , is the maximum of the distances of the links in the path because . A link is pruned if its distance is greater than the minimum distance of paths between the nodes connected by the link. Efficient methods for finding minimum distances include the Floyd–Warshall algorithm (for ) and Dijkstra's algorithm (for any value of ).
A network generated with particular values of and is called a . Both of the parameters have the effect of decreasing the number of links in the network as their values are increased. The network with the minimum number of links is obtained when and , i.e., .
With ordinal-scale data (see level of measurement), the parameter should be because the same would result from any positive monotonic transformation of the proximity data. Other values of require data measured on a ratio scale. The parameter can be varied to yield the desired number of links in the network or to focus on more local relations with smaller values of .
Essentially, pathfinder networks preserve the shortest possible paths given the data. Therefore, links are eliminated when they are not on shortest paths. The will be the minimum spanning tree for the links defined by the proximity data if a unique minimum spanning tree exists. In general, the includes all of the links in any minimum spanning tree.
Here is an example of an undirected pathfinder network derived from average ratings of a group of biology graduate students. The students rated the relatedness of all pairs of the terms shown, and the mean rating for each pair was computed. The solid blue links are the (labeled "both" in the figure). The dotted red links are added in the . For the added links, there are no 2-link paths shorter than the link distance but there is at least one shorter path with more than two links in the data. A minimal spanning tree would have 24 links so the 26 links in implies that there is more than one minimum spanning tree. There are two cycles present so there are tied distances in the set of links in the cycle. Breaking each cycle would require removing one of the tied links in each cycle.
Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.
In graph theory, the shortest path problem is the problem of finding a path between two vertices in a graph such that the sum of the weights of its constituent edges is minimized.
A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.
Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.
In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a one-dimensional sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of objects in a set into a configuration of points mapped into an abstract Cartesian space.
In optimization theory, maximum flow problems involve finding a feasible flow through a flow network that obtains the maximum possible flow rate.
In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.
Stochastic gradient descent is an iterative method for optimizing an objective function with suitable smoothness properties. It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient by an estimate thereof. Especially in high-dimensional optimization problems this reduces the very high computational burden, achieving faster iterations in exchange for a lower convergence rate.
Mean shift is a non-parametric feature-space mathematical analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm. Application domains include cluster analysis in computer vision and image processing.
In graph theory, a random geometric graph (RGG) is the mathematically simplest spatial network, namely an undirected graph constructed by randomly placing N nodes in some metric space and connecting two nodes by a link if and only if their distance is in a given range, e.g. smaller than a certain neighborhood radius, r.
In statistics, single-linkage clustering is one of several methods of hierarchical clustering. It is based on grouping clusters in bottom-up fashion, at each step combining two clusters that contain the closest pair of elements not yet belonging to the same cluster as each other.
The smallest-circle problem is a computational geometry problem of computing the smallest circle that contains all of a given set of points in the Euclidean plane. The corresponding problem in n-dimensional space, the smallest bounding sphere problem, is to compute the smallest n-sphere that contains all of a given set of points. The smallest-circle problem was initially proposed by the English mathematician James Joseph Sylvester in 1857.
Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ().
In applied mathematics, topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise. Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools.
In the mathematical theory of artificial neural networks, universal approximation theorems are theorems of the following form: Given a family of neural networks, for each function from a certain function space, there exists a sequence of neural networks from the family, such that according to some criterion. That is, the family of neural networks is dense in the function space.
Persistent homology is a method for computing topological features of a space at different spatial resolutions. More persistent features are detected over a wide range of spatial scales and are deemed more likely to represent true features of the underlying space rather than artifacts of sampling, noise, or particular choice of parameters.
In statistics, Ward's method is a criterion applied in hierarchical cluster analysis. Ward's minimum variance method is a special case of the objective function approach originally presented by Joe H. Ward, Jr. Ward suggested a general agglomerative hierarchical clustering procedure, where the criterion for choosing the pair of clusters to merge at each step is based on the optimal value of an objective function. This objective function could be "any function that reflects the investigator's purpose." Many of the standard clustering procedures are contained in this very general class. To illustrate the procedure, Ward used the example where the objective function is the error sum of squares, and this example is known as Ward's method or more precisely Ward's minimum variance method.
A central problem in algorithmic graph theory is the shortest path problem. One of the generalizations of the shortest path problem is known as the single-source-shortest-paths (SSSP) problem, which consists of finding the shortest paths from a source vertex to all other vertices in the graph. There are classical sequential algorithms which solve this problem, such as Dijkstra's algorithm. In this article, however, we present two parallel algorithms solving this problem.
Further information on pathfinder networks and several examples of the application of PFnets to a variety of problems can be found in the references.
Three papers describing fast implementations of pathfinder networks:
(The two variants by Quirin et al. are significantly faster. While the former can be applied with or and any value for , the latter can only be applied in cases where and .)