KHOPCA clustering algorithm

Last updated October 13, 2024

KHOPCA is an adaptive clustering algorithm originally developed for dynamic networks. KHOPCA ( ${\textstyle k}$ -hop clustering algorithm) provides a fully distributed and localized approach to group elements such as nodes in a network according to their distance from each other.^[1]^[2] KHOPCA operates proactively through a simple set of rules that defines clusters, which are optimal with respect to the applied distance function.

Algorithm description

KHOPCA ( ${\textstyle k}$ -hop clustering algorithm) operates proactively through a simple set of rules that defines clusters with variable ${\textstyle k}$ -hops. A set of local rules describes the state transition between nodes. A node's weight is determined only depending on the current state of its neighbors in communication range. Each node of the network is continuously involved in this process. As result, ${\textstyle k}$ -hop clusters are formed and maintained in static as well as dynamic networks.

KHOPCA does not require any predetermined initial configuration. Therefore, a node can potentially choose any weight (between ${\textstyle MIN}$ and ${\textstyle MAX}$ ). However, the choice of the initial configuration does influence the convergence time.

Initialization

The prerequisites in the start configuration for the application of the rules are the following.

$\mathrm {N}$ is the network with nodes and links, whereby each node has a weight $w$ .
Each node ${\textstyle n}$ in $\mathrm {N}$ node stores the same positive values ${\textstyle MIN}$ and ${\textstyle MAX}$ , with ${\textstyle MIN<MAX}$ .
A node ${\textstyle n}$ with weight ${\textstyle w_{n}=MAX}$ is called cluster center.
${\textstyle k}$ is ${\textstyle MAX}$ - ${\textstyle MIN}$ and represents the maximum size a cluster can have from the most outer node to the cluster center. The cluster diameter is therefore ${\textstyle k\cdot 2-1}$ .
$\mathrm {N} (n)$ returns the direct neighbors of node ${\textstyle n}$ .
${\textstyle W(\mathrm {N} )}$ is the set of weights of all nodes of $\mathrm {N}$ .

The following rules describe the state transition for a node ${\textstyle n}$ with weight ${\textstyle w_{n}}$ . These rules have to be executed on each node in the order described here.

Rule 1

The first rule has the function of constructing an order within the cluster. This happens through a node ${\textstyle n}$ detects the direct neighbor with the highest weight ${\textstyle w}$ , which is higher than the node's own weight ${\textstyle w_{n}}$ . If such a direct neighbor is detected, the node ${\textstyle n}$ changes its own weight to be the weight of the highest weight within the neighborhood subtracted by 1. Applied iteratively, this process creates a top-to-down hierarchical cluster structure.

ifmax(W(N(n)))>w_nw_n=max(W(N(n)))-1

Rule 2

The second rule deals with the situation where nodes in a neighborhood are on the minimum weight level. This situation can happen if, for instance, the initial configuration assigns the minimum weight to all nodes. If there is a neighborhood with all nodes having the minimum weight level, the node ${\textstyle n}$ declares itself as cluster center. Even if coincidentally all nodes declare themselves as cluster centers, the conflict situation will be resolved by one of the other rules.

ifmax(W(N(n))==MIN&w_n==MINw_n=MAX;

Rule 3

The third rule describes situations where nodes with leveraged weight values, which are not cluster centers, attract surrounding nodes with lower weights. This behavior can lead to fragmented clusters without a cluster center. In order to avoid fragmented clusters, the node with higher weight value is supposed to successively decrease its own weight with the objective to correct the fragmentation by allowing the other nodes to reconfigure according to the rules.

ifmax(W(N(n)))<=w_n&&w_n!=MAXw_n=w_n-1;

Rule 4

The fourth rule resolves the situation where two cluster centers connect in 1-hop neighborhood and need to decide which cluster center should continue its role as cluster center. Given any specific criterion (e.g., device ID, battery power), one cluster center remains while the other cluster center is hierarchized in 1-hop neighborhood to that new cluster center. The choice of the specific criterion to resolve the decision-making depends on the used application scenario and on the available information.

ifmax(W(N(n))==MAX&&w_n==MAXw_n=applycriteriontoselectanodefromset(max(W(N(n)),w_n);w_n=w_n-1;

Examples

1D

An exemplary sequence of state transitions applying the described four rules is illustrated below.

2D

KHOPCA acting in a dynamic 2D simulation. The geometry is based on a geometric random graph; all existing links are drawn in this network.

3D

KHOPCA also works in a dynamic 3D environment. The cluster connections are illustrated with bold lines.

Guarantees

It has been demonstrated that KHOPCA terminates after a finite number of state transitions in static networks.^[2]

Related Research Articles

<span class="mw-page-title-main">Dijkstra's algorithm</span> Algorithm for finding shortest paths

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

In mathematics, computer science and especially graph theory, a distance matrix is a square matrix containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the distance being used to define this matrix may or may not be a metric. If there are $N$ elements, this matrix will have size $N \times N$ . In graph-theoretic applications, the elements are more often referred to as points, nodes or vertices.

A Hopfield network is a form of recurrent neural network, or a spin glass system, that can serve as a content-addressable memory. The Hopfield network, named for John Hopfield, consists of a single layer of neurons, where each neuron is connected to every other neuron except itself. These connections are bidirectional and symmetric, meaning the weight of the connection from neuron i to neuron j is the same as the weight from neuron j to neuron i. Patterns are associatively recalled by fixing certain inputs, and dynamically evolve the network to minimize an energy function, towards local energy minimum states that correspond to stored patterns. Patterns are associatively learned by a Hebbian learning algorithm.

In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.

Neural gas is an artificial neural network, inspired by the self-organizing map and introduced in 1991 by Thomas Martinetz and Klaus Schulten. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors. The algorithm was coined "neural gas" because of the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space. It is applied where data compression or vector quantization is an issue, for example speech recognition, image processing or pattern recognition. As a robustly converging alternative to the k-means clustering it is also used for cluster analysis.

In computational complexity theory, Polynomial Local Search (PLS) is a complexity class that models the difficulty of finding a locally optimal solution to an optimization problem. The main characteristics pls take it

of problems that lie in PLS are that the cost of a solution can be calculated in polynomial time and the neighborhood of a solution can be searched in polynomial time. Therefore it is possible to verify whether or not a solution is a local optimum in polynomial time.

The Watts–Strogatz model is a random graph generation model that produces graphs with small-world properties, including short average path lengths and high clustering. It was proposed by Duncan J. Watts and Steven Strogatz in their article published in 1998 in the Nature scientific journal. The model also became known as the (Watts) beta model after Watts used $to formulate it in his popular science book Six Degrees .$

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

Modularity is a measure of the structure of networks or graphs which measures the strength of division of a network into modules. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. Biological networks, including animal brains, exhibit a high degree of modularity. However, modularity maximization is not statistically consistent, and finds communities in its own null model, i.e. fully random graphs, and therefore it cannot be used to find statistically significant community structures in empirical networks. Furthermore, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities.

<span class="mw-page-title-main">Growing self-organizing map</span> Growing variant of a self-organizing map

A growing self-organizing map (GSOM) is a growing variant of a self-organizing map (SOM). The GSOM was developed to address the issue of identifying a suitable map size in the SOM. It starts with a minimal number of nodes and grows new nodes on the boundary based on a heuristic. By using the value called Spread Factor (SF), the data analyst has the ability to control the growth of the GSOM.

A top tree is a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median.

In graph theory, the Katz centrality or alpha centrality of a node is a measure of centrality in a network. It was introduced by Leo Katz in 1953 and is used to measure the relative degree of influence of an actor within a social network. Unlike typical centrality measures which consider only the shortest path between a pair of actors, Katz centrality measures influence by taking into account the total number of walks between a pair of actors.

In queueing theory, a discipline within the mathematical theory of probability, the backpressure routing algorithm is a method for directing traffic around a queueing network that achieves maximum network throughput, which is established using concepts of Lyapunov drift. Backpressure routing considers the situation where each job can visit multiple service nodes in the network. It is an extension of max-weight scheduling where each job visits only a single service node.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

Robustness, the ability to withstand failures and perturbations, is a critical attribute of many complex systems including complex networks.

WACA is a clustering algorithm for dynamic networks. WACA uses a heuristic weight function for self-organized cluster creation. The election of clusterheads is based on local network information only.

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.

In computational and mathematical biology, a biological lattice-gas cellular automaton (BIO-LGCA) is a discrete model for moving and interacting biological agents, a type of cellular automaton. The BIO-LGCA is based on the lattice-gas cellular automaton (LGCA) model used in fluid dynamics. A BIO-LGCA model describes cells and other motile biological agents as point particles moving on a discrete lattice, thereby interacting with nearby particles. Contrary to classic cellular automaton models, particles in BIO-LGCA are defined by their position and velocity. This allows to model and analyze active fluids and collective migration mediated primarily through changes in momentum, rather than density. BIO-LGCA applications include cancer invasion and cancer progression.

References

↑ Brust, Matthias R.; Frey, Hannes; Rothkugel, Steffen (2007-01-01). "Adaptive multi-hop clustering in mobile networks". Proceedings of the 4th international conference on mobile technology, applications, and systems and the 1st international symposium on Computer human interaction in mobile technology. Mobility '07. New York, NY, USA: ACM. pp. 132–138. doi:10.1145/1378063.1378086. ISBN 9781595938190. S2CID 33469900.
1 2 3 Brust, Matthias R.; Frey, Hannes; Rothkugel, Steffen (2008-01-01). "Dynamic multi-hop clustering for mobile hybrid wireless networks". Proceedings of the 2nd international conference on Ubiquitous information management and communication. ICUIMC '08. New York, NY, USA: ACM. pp. 130–135. doi:10.1145/1352793.1352820. ISBN 9781595939937. S2CID 15200455.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Brust, Matthias R.; Frey, Hannes; Rothkugel, Steffen (2007-01-01). "Adaptive multi-hop clustering in mobile networks". Proceedings of the 4th international conference on mobile technology, applications, and systems and the 1st international symposium on Computer human interaction in mobile technology. Mobility '07. New York, NY, USA: ACM. pp. 132–138. doi:10.1145/1378063.1378086. ISBN 9781595938190. S2CID 33469900.

[:0-2] 1 2 3 Brust, Matthias R.; Frey, Hannes; Rothkugel, Steffen (2008-01-01). "Dynamic multi-hop clustering for mobile hybrid wireless networks". Proceedings of the 2nd international conference on Ubiquitous information management and communication. ICUIMC '08. New York, NY, USA: ACM. pp. 130–135. doi:10.1145/1352793.1352820. ISBN 9781595939937. S2CID 15200455.

[1]

[2]