KHOPCA clustering algorithm

Last updated
KHOPCA running in a 3-D environment. KHOPCA 3D example 1.png
KHOPCA running in a 3-D environment.

KHOPCA is an adaptive clustering algorithm originally developed for dynamic networks. KHOPCA (-hop clustering algorithm) provides a fully distributed and localized approach to group elements such as nodes in a network according to their distance from each other. [1] [2] KHOPCA operates proactively through a simple set of rules that defines clusters, which are optimal with respect to the applied distance function.

Contents

KHOPCA's clustering process explicitly supports joining and leaving of nodes, which makes KHOPCA suitable for highly dynamic networks. However, it has been demonstrated that KHOPCA also performs in static networks. [2]

Besides applications in ad hoc and wireless sensor networks, KHOPCA can be used in localization and navigation problems, networked swarming, and real-time data clustering and analysis.

Algorithm description

KHOPCA (-hop clustering algorithm) operates proactively through a simple set of rules that defines clusters with variable -hops. A set of local rules describes the state transition between nodes. A node's weight is determined only depending on the current state of its neighbors in communication range. Each node of the network is continuously involved in this process. As result, -hop clusters are formed and maintained in static as well as dynamic networks.

KHOPCA does not require any predetermined initial configuration. Therefore, a node can potentially choose any weight (between and ). However, the choice of the initial configuration does influence the convergence time.

Initialization

The prerequisites in the start configuration for the application of the rules are the following.

The following rules describe the state transition for a node with weight . These rules have to be executed on each node in the order described here.

Rule 1

KHOPCA rule 1 KHOPCA rule 1.png
KHOPCA rule 1

The first rule has the function of constructing an order within the cluster. This happens through a node detects the direct neighbor with the highest weight , which is higher than the node's own weight . If such a direct neighbor is detected, the node changes its own weight to be the weight of the highest weight within the neighborhood subtracted by 1. Applied iteratively, this process creates a top-to-down hierarchical cluster structure.

ifmax(W(N(n)))>w_nw_n=max(W(N(n)))-1

Rule 2

KHOPCA rule 2 KHOPCA rule 2 a.png
KHOPCA rule 2

The second rule deals with the situation where nodes in a neighborhood are on the minimum weight level. This situation can happen if, for instance, the initial configuration assigns the minimum weight to all nodes. If there is a neighborhood with all nodes having the minimum weight level, the node declares itself as cluster center. Even if coincidentally all nodes declare themselves as cluster centers, the conflict situation will be resolved by one of the other rules.

ifmax(W(N(n))==MIN&w_n==MINw_n=MAX;

Rule 3

KHOPCA rule 3 KHOPCA rule 3 a.png
KHOPCA rule 3

The third rule describes situations where nodes with leveraged weight values, which are not cluster centers, attract surrounding nodes with lower weights. This behavior can lead to fragmented clusters without a cluster center. In order to avoid fragmented clusters, the node with higher weight value is supposed to successively decrease its own weight with the objective to correct the fragmentation by allowing the other nodes to reconfigure according to the rules.

ifmax(W(N(n)))<=w_n&&w_n!=MAXw_n=w_n-1;

Rule 4

KHOPCA rule 4 KHOPCA rule 4 a.png
KHOPCA rule 4

The fourth rule resolves the situation where two cluster centers connect in 1-hop neighborhood and need to decide which cluster center should continue its role as cluster center. Given any specific criterion (e.g., device ID, battery power), one cluster center remains while the other cluster center is hierarchized in 1-hop neighborhood to that new cluster center. The choice of the specific criterion to resolve the decision-making depends on the used application scenario and on the available information.

ifmax(W(N(n))==MAX&&w_n==MAXw_n=applycriteriontoselectanodefromset(max(W(N(n)),w_n);w_n=w_n-1;

Examples

1D

An exemplary sequence of state transitions applying the described four rules is illustrated below.

KHOPCA 1D example 1.png

2D

KHOPCA acting in a dynamic 2D simulation. The geometry is based on a geometric random graph; all existing links are drawn in this network.

KHOPCA 2D k3a.jpg

3D

KHOPCA also works in a dynamic 3D environment. The cluster connections are illustrated with bold lines.

KHOPCA 3D example 2.png

Guarantees

It has been demonstrated that KHOPCA terminates after a finite number of state transitions in static networks. [2]

Related Research Articles

<span class="mw-page-title-main">Dijkstra's algorithm</span> Graph search algorithm

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

In mathematics, computer science and especially graph theory, a distance matrix is a square matrix containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the distance being used to define this matrix may or may not be a metric. If there are N elements, this matrix will have size N×N. In graph-theoretic applications, the elements are more often referred to as points, nodes or vertices.

A Hopfield network is a spin glass system used to model neural networks, based on Ernst Ising's work with Wilhelm Lenz on the Ising model of magnetic materials. Hopfield networks were first described with respect to recurrent neural networks by Shun'ichi Amari in 1972 and with respect to biological neural networks by William Little in 1974, and were popularised by John Hopfield in 1982. Hopfield networks serve as content-addressable ("associative") memory systems with binary threshold nodes, or with continuous variables. Hopfield networks also provide a model for understanding human memory.

<span class="mw-page-title-main">Degree distribution</span>

In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.

Neural gas is an artificial neural network, inspired by the self-organizing map and introduced in 1991 by Thomas Martinetz and Klaus Schulten. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors. The algorithm was coined "neural gas" because of the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space. It is applied where data compression or vector quantization is an issue, for example speech recognition, image processing or pattern recognition. As a robustly converging alternative to the k-means clustering it is also used for cluster analysis.

In computational complexity theory, Polynomial Local Search (PLS) is a complexity class that models the difficulty of finding a locally optimal solution to an optimization problem. The main characteristics of problems that lie in PLS are that the cost of a solution can be calculated in polynomial time and the neighborhood of a solution can be searched in polynomial time. Therefore it is possible to verify whether or not a solution is a local optimum in polynomial time. Furthermore, depending on the problem and the algorithm that is used for solving the problem, it might be faster to find a local optimum instead of a global optimum.

<span class="mw-page-title-main">Watts–Strogatz model</span> Method of generating random small-world graphs

The Watts–Strogatz model is a random graph generation model that produces graphs with small-world properties, including short average path lengths and high clustering. It was proposed by Duncan J. Watts and Steven Strogatz in their article published in 1998 in the Nature scientific journal. The model also became known as the (Watts) beta model after Watts used to formulate it in his popular science book Six Degrees.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

<span class="mw-page-title-main">Modularity (networks)</span> Measure of network community structure

Modularity is a measure of the structure of networks or graphs which measures the strength of division of a network into modules. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. Biological networks, including animal brains, exhibit a high degree of modularity. However, modularity maximization is not statistically consistent, and finds communities in its own null model, i.e. fully random graphs, and therefore it cannot be used to find statistically significant community structures in empirical networks. Furthermore, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities.

<span class="mw-page-title-main">Growing self-organizing map</span> Growing variant of a self-organizing map

A growing self-organizing map (GSOM) is a growing variant of a self-organizing map (SOM). The GSOM was developed to address the issue of identifying a suitable map size in the SOM. It starts with a minimal number of nodes and grows new nodes on the boundary based on a heuristic. By using the value called Spread Factor (SF), the data analyst has the ability to control the growth of the GSOM.

<span class="mw-page-title-main">Top tree</span>

A top tree is a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median.

<span class="mw-page-title-main">Katz centrality</span> Measure of centrality in a network based on nodal influence

In graph theory, the Katz centrality or alpha centrality of a node is a measure of centrality in a network. It was introduced by Leo Katz in 1953 and is used to measure the relative degree of influence of an actor within a social network. Unlike typical centrality measures which consider only the shortest path between a pair of actors, Katz centrality measures influence by taking into account the total number of walks between a pair of actors.

In queueing theory, a discipline within the mathematical theory of probability, the backpressure routing algorithm is a method for directing traffic around a queueing network that achieves maximum network throughput, which is established using concepts of Lyapunov drift. Backpressure routing considers the situation where each job can visit multiple service nodes in the network. It is an extension of max-weight scheduling where each job visits only a single service node.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

Robustness, the ability to withstand failures and perturbations, is a critical attribute of many complex systems including complex networks.

WACA is a clustering algorithm for dynamic networks. WACA uses a heuristic weight function for self-organized cluster creation. The election of clusterheads is based on local network information only.

<span class="mw-page-title-main">Configuration model</span>

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.

In computational and mathematical biology, a biological lattice-gas cellular automaton (BIO-LGCA) is a discrete model for moving and interacting biological agents, a type of cellular automaton. The BIO-LGCA is based on the lattice-gas cellular automaton (LGCA) model used in fluid dynamics. A BIO-LGCA model describes cells and other motile biological agents as point particles moving on a discrete lattice, thereby interacting with nearby particles. Contrary to classic cellular automaton models, particles in BIO-LGCA are defined by their position and velocity. This allows to model and analyze active fluids and collective migration mediated primarily through changes in momentum, rather than density. BIO-LGCA applications include cancer invasion and cancer progression.

References

  1. Brust, Matthias R.; Frey, Hannes; Rothkugel, Steffen (2007-01-01). "Adaptive multi-hop clustering in mobile networks". Proceedings of the 4th international conference on mobile technology, applications, and systems and the 1st international symposium on Computer human interaction in mobile technology. Mobility '07. New York, NY, USA: ACM. pp. 132–138. doi:10.1145/1378063.1378086. ISBN   9781595938190. S2CID   33469900.
  2. 1 2 3 Brust, Matthias R.; Frey, Hannes; Rothkugel, Steffen (2008-01-01). "Dynamic multi-hop clustering for mobile hybrid wireless networks". Proceedings of the 2nd international conference on Ubiquitous information management and communication. ICUIMC '08. New York, NY, USA: ACM. pp. 130–135. doi:10.1145/1352793.1352820. ISBN   9781595939937. S2CID   15200455.