Metric k-center

Last updated January 26, 2024

In graph theory, the metric $k$ -center problem is a combinatorial optimization problem studied in theoretical computer science. Given $n$ cities with specified distances, one wants to build $k$ warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of $k$ vertices for which the largest distance of any point to its closest vertex in the $k$ -set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality.

Formal definition

Let $(X,d)$ be a metric space where $X$ is a set and $d$ is a metric
A set $\mathbf {V} \subseteq {\mathcal {X}}$ , is provided together with a parameter $k$ . The goal is to find a subset ${\mathcal {C}}\subseteq \mathbf {V}$ with $|{\mathcal {C}}|=k$ such that the maximum distance of a point in $\mathbf {V}$ to the closest point in ${\mathcal {C}}$ is minimized. The problem can be formally defined as follows:
For a metric space ( ${\mathcal {X}}$ ,d),

Input: a set $\mathbf {V} \subseteq {\mathcal {X}}$ , and a parameter $k$ .
Output: a set ${\mathcal {C}}\subseteq \mathbf {V}$ of $k$ points.
Goal: Minimize the cost $r^{\mathcal {C}}(\mathbf {V} )={\underset {v\in V}{\max }}$ d(v, ${\mathcal {C}}$ )

That is, Every point in a cluster is in distance at most $r^{\mathcal {C}}(V)$ from its respective center. ^[1]

The k-Center Clustering problem can also be defined on a complete undirected graph G = (V, E) as follows:
Given a complete undirected graph G = (V, E) with distances d(v_i, v_j) ∈ N satisfying the triangle inequality, find a subset C ⊆ V with |C| = k while minimizing:

\max _{v\in V}\min _{c\in C}d(v,c)

Computational complexity

In a complete undirected graph G = (V, E), if we sort the edges in non-decreasing order of the distances: d(e₁) ≤ d(e₂) ≤ ... ≤ d(e_m) and let G_i = (V, E_i), where E_i = {e₁, e₂, ..., e_i}. The k-center problem is equivalent to finding the smallest index i such that G_i has a dominating set of size at most k. ^[2]

Although Dominating Set is NP-complete, the k-center problem remains NP-hard. This is clear, since the optimality of a given feasible solution for the k-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution (i.e. the smallest index i such that G_i has a dominating set of size at most k), which is precisely the difficult core of the NP-Hard problems. Although a Turing reduction can get around this issue by trying all values of k.

Approximations

A simple greedy algorithm

A simple greedy approximation algorithm that achieves an approximation factor of 2 builds ${\mathcal {C}}$ using a farthest-first traversal in k iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows:

Pick an arbitrary point ${\bar {c}}_{1}$ into $C_{1}$
For every point $v\in \mathbf {V}$ compute $d_{1}[v]$ from ${\bar {c}}_{1}$
Pick the point ${\bar {c}}_{2}$ with highest distance from ${\bar {c}}_{1}$ .
Add it to the set of centers and denote this expanded set of centers as $C_{2}$ . Continue this till k centers are found

Running time

The i^th iteration of choosing the i^th center takes ${\mathcal {O}}(n)$ time.
There are k such iterations.
Thus, overall the algorithm takes ${\mathcal {O}}(nk)$ time.^[3]

Proving the approximation factor

The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor.

Given a set of n points $\mathbf {V} \subseteq {\mathcal {X}}$ , belonging to a metric space ( ${\mathcal {X}}$ ,d), the greedy K-center algorithm computes a set K of k centers, such that K is a 2-approximation to the optimal k-center clustering of V.

i.e. $r^{\mathbf {K} }(\mathbf {V} )\leq 2r^{opt}(\mathbf {V} ,{\textit {k}})$ ^[1]

This theorem can be proven using two cases as follows,

Case 1: Every cluster of ${\mathcal {C}}_{opt}$ contains exactly one point of $\mathbf {K}$

Consider a point $v\in \mathbf {V}$
Let ${\bar {c}}$ be the center it belongs to in ${\mathcal {C}}_{opt}$
Let ${\bar {k}}$ be the center of $\mathbf {K}$ that is in $\Pi ({\mathcal {C}}_{opt},{\bar {c}})$
$d(v,{\bar {c}})=d(v,{\mathcal {C}}_{opt})\leq r^{opt}(\mathbf {V} ,k)$
Similarly, $d({\bar {k}},{\bar {c}})=d({\bar {k}},{\mathcal {C}}_{opt})\leq r^{opt}$
By the triangle inequality: $d(v,{\bar {k}})\leq d(v,{\bar {c}})+d({\bar {c}},{\bar {k}})\leq 2r^{opt}$

Case 2: There are two centers ${\bar {k}}$ and ${\bar {u}}$ of $\mathbf {K}$ that are both in $\Pi ({\mathcal {C}}_{opt},{\bar {c}})$ , for some ${\bar {c}}\in {\mathcal {C}}_{opt}$ (By pigeon hole principle, this is the only other possibility)

Assume, without loss of generality, that ${\bar {u}}$ was added later to the center set $\mathbf {K}$ by the greedy algorithm, say in i^th iteration.
But since the greedy algorithm always chooses the point furthest away from the current set of centers, we have that ${\bar {k}}\in {\mathcal {C}}_{i-1}$ and,

${\begin{aligned}r^{\mathbf {K} }(\mathbf {V} )\leq r^{{\mathcal {C}}_{i-1}}(\mathbf {V} )&=d({\bar {u}},{\mathcal {C}}_{i-1})\\&\leq d({\bar {u}},{\bar {k}})\\&\leq d({\bar {u}},{\bar {c}})+d({\bar {c}},{\bar {k}})\\&\leq 2r^{opt}\end{aligned}}$ ^[1]

Another 2-factor approximation algorithm

Another algorithm with the same approximation factor takes advantage of the fact that the k-Center problem is equivalent to finding the smallest index i such that G_i has a dominating set of size at most k and computes a maximal independent set of G_i, looking for the smallest index i that has a maximal independent set with a size of at least k. ^[4] It is not possible to find an approximation algorithm with an approximation factor of 2 − ε for any ε > 0, unless P = NP. ^[5] Furthermore, the distances of all edges in G must satisfy the triangle inequality if the k-center problem is to be approximated within any constant factor, unless P = NP. ^[6]

Parameterized approximations

It can be shown that the k-Center problem is W[2]-hard to approximate within a factor of 2 − ε for any ε > 0, when using k as the parameter.^[7] This is also true when parameterizing by the doubling dimension (in fact the dimension of a Manhattan metric), unless P=NP.^[8] When considering the combined parameter given by k and the doubling dimension, k-Center is still W[1]-hard but it is possible to obtain a parameterized approximation scheme.^[9] This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution.^[10]

Related Research Articles

The travelling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.

The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media, and technology mapping in FPGA semiconductor chip design.

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges are of the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems with provable guarantees on the distance of the returned solution to the optimal one. Approximation algorithms naturally arise in the field of theoretical computer science as a consequence of the widely believed P ≠ NP conjecture. Under this conjecture, a wide class of optimization problems cannot be solved exactly in polynomial time. The field of approximation algorithms, therefore, tries to understand how closely it is possible to approximate optimal solutions to such problems in polynomial time. In an overwhelming majority of the cases, the guarantee of such algorithms is a multiplicative one expressed as an approximation ratio or approximation factor i.e., the optimal solution is always guaranteed to be within a (predetermined) multiplicative factor of the returned solution. However, there are also many approximation algorithms that provide an additive guarantee on the quality of the returned solution. A notable example of an approximation algorithm that provides both is the classic approximation algorithm of Lenstra, Shmoys and Tardos for scheduling on unrelated parallel machines.

<span class="mw-page-title-main">Set cover problem</span> Classical problem in combinatorics

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory.

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

In graph theory, the metric dimension of a graph G is the minimum cardinality of a subset S of vertices such that all other vertices are uniquely determined by their distances to the vertices in S. Finding the metric dimension of a graph is an NP-hard problem; the decision version, determining whether the metric dimension is less than a given value, is NP-complete.

<span class="mw-page-title-main">Dominating set</span> Subset of a graphs nodes such that all other nodes link to at least one

In graph theory, a dominating set for a graph $G$ is a subset $D$ of its vertices, such that any vertex of $G$ is either in $D$ , or has a neighbor in $D$ . The domination number $γ(G)$ is the number of vertices in a smallest dominating set for $G$ .

Set packing is a classical NP-complete problem in computational complexity theory and combinatorics, and was one of Karp's 21 NP-complete problems. Suppose one has a finite set S and a list of subsets of S. Then, the set packing problem asks if some k subsets in the list are pairwise disjoint.

The Frank–Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method, reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956. In each iteration, the Frank–Wolfe algorithm considers a linear approximation of the objective function, and moves towards a minimizer of this linear function.

The study of facility location problems (FLP), also known as location analysis, is a branch of operations research and computational geometry concerned with the optimal placement of facilities to minimize transportation costs while considering factors like avoiding placing hazardous materials near housing, and competitors' facilities. The techniques also apply to cluster analysis.

In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

In mathematics, a submodular set function is a set function that, informally, describes the relationship between a set of inputs and an output, where adding more of one input has a decreasing additional benefit. The natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.

The geometric set cover problem is the special case of the set cover problem in geometric settings. The input is a range space $where is a universe of points in and is a family of subsets of called ranges, defined by the intersection of and geometric shapes such as disks and axis-parallel rectangles. The goal is to select a minimum-size subset of ranges such that every point in the universe is covered by some range in .$

The vertex k-center problem is a classical NP-hard problem in computer science. It has application in facility location and clustering. Basically, the vertex k-center problem models the following real problem: given a city with $facilities, find the best facilities where to build fire stations. Since firemen must attend any emergency as quickly as possible, the distance from the farthest facility to its nearest fire station has to be as small as possible. In other words, the position of the fire stations must be such that every possible fire is attended as quickly as possible.$

The strip packing problem is a 2-dimensional geometric minimization problem. Given a set of axis-aligned rectangles and a strip of bounded width and infinite height, determine an overlapping-free packing of the rectangles into the strip minimizing its height. This problem is a cutting and packing problem and is classified as an Open Dimension Problem according to Wäscher et al.

A central problem in algorithmic graph theory is the shortest path problem. One of the generalizations of the shortest path problem is known as the single-source-shortest-paths (SSSP) problem, which consists of finding the shortest paths from a source vertex $to all other vertices in the graph. There are classical sequential algorithms which solve this problem, such as Dijkstra's algorithm. In this article, however, we present two parallel algorithms solving this problem.$

In statistics and machine learning, Gaussian process approximation is a computational method that accelerates inference tasks in the context of a Gaussian process model, most commonly likelihood evaluation and prediction. Like approximations of other models, they can often be expressed as additional assumptions imposed on the model, which do not correspond to any actual feature, but which retain its key properties while simplifying calculations. Many of these approximation methods can be expressed in purely linear algebraic or functional analytic terms as matrix or function approximations. Others are purely algorithmic and cannot easily be rephrased as a modification of a statistical model.

In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the identical-machines scheduling problem. The problem is parametrized by a positive integer k, and called k-way number partitioning. The input to the problem is a multiset S of numbers, whose sum is k*T.

The highway dimension is a graph parameter modelling transportation networks, such as road networks or public transportation networks. It was first formally defined by Abraham et al. based on the observation by Bast et al. that any road network has a sparse set of "transit nodes", such that driving from a point A to a sufficiently far away point B along the shortest route will always pass through one of these transit nodes. It has also been proposed that the highway dimension captures the properties of public transportation networks well, given that longer routes using busses, trains, or airplanes will typically be serviced by larger transit hubs. This relates to the spoke–hub distribution paradigm in transport topology optimization.

References

1 2 3 Har-peled, Sariel (2011). Geometric Approximation Algorithms. Boston, MA, USA: American Mathematical Society. ISBN 978-0821849118.
↑ Vazirani, Vijay V. (2003), Approximation Algorithms, Berlin: Springer, pp. 47–48, ISBN 3-540-65367-8
↑ Gonzalez, Teofilo F. (1985), "Clustering to minimize the maximum intercluster distance", Theoretical Computer Science, vol. 38, Elsevier Science B.V., pp. 293–306, doi: 10.1016/0304-3975(85)90224-5
↑ Hochbaum, Dorit S.; Shmoys, David B. (1986), "A unified approach to approximation algorithms for bottleneck problems", Journal of the ACM, vol. 33, pp. 533–550, doi:10.1145/5925.5933, ISSN 0004-5411, S2CID 17975253
↑ Hochbaum, Dorit S. (1997), Approximation Algorithms for NP-Hard problems, Boston: PWS Publishing Company, pp. 346–398, ISBN 0-534-94968-1
↑ Crescenzi, Pierluigi; Kann, Viggo; Halldórsson, Magnús; Karpinski, Marek; Woeginger, Gerhard (2000), "Minimum k-center", A Compendium of NP Optimization Problems
↑ Feldmann, Andreas Emil (2019-03-01). "Fixed-Parameter Approximations for k-Center Problems in Low Highway Dimension Graphs" (PDF). Algorithmica. 81 (3): 1031–1052. doi:10.1007/s00453-018-0455-0. ISSN 1432-0541. S2CID 46886829.
↑ Feder, Tomás; Greene, Daniel (1988-01-01). "Optimal algorithms for approximate clustering". Proceedings of the twentieth annual ACM symposium on Theory of computing - STOC '88. New York, NY, USA: Association for Computing Machinery. pp. 434–444. doi:10.1145/62212.62255. ISBN 978-0-89791-264-8. S2CID 658151.
↑ Feldmann, Andreas Emil; Marx, Dániel (2020-07-01). "The Parameterized Hardness of the k-Center Problem in Transportation Networks" (PDF). Algorithmica. 82 (7): 1989–2005. doi:10.1007/s00453-020-00683-w. ISSN 1432-0541. S2CID 3532236.
↑ Feldmann, Andreas Emil; Vu, Tung Anh (2022). "Generalized $$k$$-Center: Distinguishing Doubling and Highway Dimension". In Bekos, Michael A.; Kaufmann, Michael (eds.). Graph-Theoretic Concepts in Computer Science. Lecture Notes in Computer Science. Vol. 13453. Cham: Springer International Publishing. pp. 215–229. arXiv: 2209.00675 . doi:10.1007/978-3-031-15914-5_16. ISBN 978-3-031-15914-5.