Metric k-center

Last updated

In graph theory, the metric k-center problem or vertex k-center problem is a classical combinatorial optimization problem studied in theoretical computer science that is NP-hard. Given n cities with specified distances, one wants to build k warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of k vertices for which the largest distance of any point to its closest vertex in the k-set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality. It has application in facility location and clustering. [1] [2]

Contents

Formal definition

The problem was first proposed by Hakimi in 1964. [3]

Let be a metric space where is a set and is a metric
A set , is provided together with a parameter . The goal is to find a subset with such that the maximum distance of a point in to the closest point in is minimized. The problem can be formally defined as follows:
For a metric space (,d),

That is, every point in a cluster is in distance at most from its respective center. [4]

The k-Center Clustering problem can also be defined on a complete undirected graph G = (V, E) as follows:
Given a complete undirected graph G = (V, E) with distances d(vi, vj)  N satisfying the triangle inequality, find a subset C  V with |C| = k while minimizing:

Computational complexity

In a complete undirected graph G = (V, E), if we sort the edges in non-decreasing order of the distances: d(e1)  d(e2)  ...  d(em) and let Gi = (V, Ei), where Ei = {e1, e2, ..., ei}. The k-center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k. [5]

Although Dominating Set is NP-complete, the k-center problem remains NP-hard. This is clear, since the optimality of a given feasible solution for the k-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution (i.e. the smallest index i such that Gi has a dominating set of size at most k), which is precisely the difficult core of the NP-Hard problems. Although a Turing reduction can get around this issue by trying all values of k.

Approximations

A simple greedy algorithm

A simple greedy approximation algorithm that achieves an approximation factor of 2 builds using a farthest-first traversal in k iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows:

Running time

  • The ith iteration of choosing the ith center takes time.
  • There are k such iterations.
  • Thus, overall the algorithm takes time. [6]

Proving the approximation factor

The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor.

Given a set of n points , belonging to a metric space (,d), the greedy K-center algorithm computes a set K of k centers, such that K is a 2-approximation to the optimal k-center clustering of V.

i.e. [4]

This theorem can be proven using two cases as follows,

Case 1: Every cluster of contains exactly one point of

  • Consider a point
  • Let be the center it belongs to in
  • Let be the center of that is in
  • Similarly,
  • By the triangle inequality:


Case 2: There are two centers and of that are both in , for some (By pigeon hole principle, this is the only other possibility)

  • Assume, without loss of generality, that was added later to the center set by the greedy algorithm, say in ith iteration.
  • But since the greedy algorithm always chooses the point furthest away from the current set of centers, we have that and,

[4]

Another 2-factor approximation algorithm

Another algorithm with the same approximation factor takes advantage of the fact that the k-Center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k and computes a maximal independent set of Gi, looking for the smallest index i that has a maximal independent set with a size of at least k. [7] It is not possible to find an approximation algorithm with an approximation factor of 2  ε for any ε > 0, unless P = NP. [8] Furthermore, the distances of all edges in G must satisfy the triangle inequality if the k-center problem is to be approximated within any constant factor, unless P = NP. [9]

Parameterized approximations

It can be shown that the k-Center problem is W[2]-hard to approximate within a factor of 2  ε for any ε > 0, when using k as the parameter. [10] This is also true when parameterizing by the doubling dimension (in fact the dimension of a Manhattan metric), unless P=NP. [11] When considering the combined parameter given by k and the doubling dimension, k-Center is still W[1]-hard but it is possible to obtain a parameterized approximation scheme. [12] This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution. [13]

Approximation algorithms

If , the vertex k-center problem can not be (optimally) solved in polynomial time. However, there are some polynomial time approximation algorithms that get near-optimal solutions. Specifically, 2-approximated solutions. Actually, if the best possible solution that can be achieved by a polynomial time algorithm is a 2-approximated one. [14] [15] [16] [17] In the context of a minimization problem, such as the vertex k-center problem, a 2-approximated solution is any solution such that , where is the size of an optimal solution. An algorithm that guarantees to generate 2-approximated solutions is known as a 2-approximation algorithm. The main 2-approximated algorithms for the vertex k-center problem reported in the literature are the Sh algorithm, [18] the HS algorithm, [17] and the Gon algorithm. [15] [16] Even though these algorithms are the (polynomial) best possible ones, their performance on most benchmark datasets is very deficient. Because of this, many heuristics and metaheuristics have been developed through the time. Contrary to common sense, one of the most practical (polynomial) heuristics for the vertex k-center problem is based on the CDS algorithm, which is a 3-approximation algorithm [19]

The Sh algorithm

Formally characterized by David Shmoys in 1995, [18] the Sh algorithm takes as input a complete undirected graph , a positive integer , and an assumption on what the optimal solution size is. The Sh algorithm works as follows: selects the first center at random. So far, the solution consists of only one vertex, . Next, selects center at random from the set containing all the vertices whose distance from is greater than . At this point, . Finally, selects the remaining centers the same way was selected. The complexity of the Sh algorithm is , where is the number of vertices.

The HS algorithm

Proposed by Dorit Hochbaum and David Shmoys in 1985, the HS algorithm takes the Sh algorithm as basis. [17] By noticing that the value of must equals the cost of some edge in , and since there are edges in , the HS algorithm basically repeats the Sh algorithm with every edge cost. The complexity of the HS algorithm is . However, by running a binary search over the ordered set of edge costs, its complexity is reduced to .

The Gon algorithm

Proposed independently by Teofilo Gonzalez, [15] and by Martin Dyer and Alan Frieze [16] in 1985, the Gon algorithm is basically a more powerful version of the Sh algorithm. While the Sh algorithm requires a guess on , the Gon algorithm prescinds from such guess by noticing that if any set of vertices at distance greater than exists, then the farthest vertex must be inside such set. Therefore, instead of computing at each iteration the set of vertices at distance greater than and then selecting a random vertex, the Gon algorithm simply selects the farthest vertex from every partial solution . The complexity of the Gon algorithm is , where is the number of vertices.

The CDS algorithm

Proposed by García Díaz et al. in 2017, [19] the CDS algorithm is a 3-approximation algorithm that takes ideas from the Gon algorithm (farthest point heuristic), the HS algorithm (parametric pruning), and the relationship between the vertex k-center problem and the Dominating Set problem. The CDS algorithm has a complexity of . However, by performing a binary search over the ordered set of edge costs, a more efficiente heuristic named CDSh is proposed. The CDSh algorithm complexity is . Despite the suboptimal performance of the CDS algorithm, and the heuristic performance of CDSh, both present a much better performance than the Sh, HS, and Gon algorithms.

Parameterized approximations

It can be shown that the k-Center problem is W[2]-hard to approximate within a factor of 2  ε for any ε > 0, when using k as the parameter. [20] This is also true when parameterizing by the doubling dimension (in fact the dimension of a Manhattan metric), unless P=NP. [21] When considering the combined parameter given by k and the doubling dimension, k-Center is still W[1]-hard but it is possible to obtain a parameterized approximation scheme. [22] This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution. [23]

Experimental comparison

Some of the most widely used benchmark datasets for the vertex k-center problem are the pmed instances from OR-Lib., [24] and some instances from TSP-Lib. [25] Table 1 shows the mean and standard deviation of the experimental approximation factors of the solutions generated by each algorithm over the 40 pmed instances from OR-Lib [19]

Table 1. Experimental approximation factor over pmed instances from OR-Lib
AlgorithmComplexity
HS1.5320.175
Gon1.5030.122
CDSh1.0350.031
CDS1.0200.027
Table 2. Experimental approximation factor over instances from TSP-Lib
AlgorithmAlgorithm
Gon1.3960.091
HS1.3180.108
CDSh1.1240.065
CDS1.0420.038

Polynomial heuristics

Greedy pure algorithm

The greedy pure algorithm (or Gr) follows the core idea of greedy algorithms: to take optimal local decisions. In the case of the vertex k-center problem, the optimal local decision consists in selecting each center in such a way that the size of the solution (covering radius) is minimum at each iteration. In other words, the first center selected is the one that solves the 1-center problem. The second center selected is the one that, along with the previous center, generates a solution with minimum covering radius. The remaining centers are selected the same way. The complexity of the Gr algorithm is . [26] The empirical performance of the Gr algorithm is poor on most benchmark instances.

Scoring algorithm

The Scoring algorithm (or Scr) was introduced by Jurij Mihelič and Borut Robič in 2005. [27] This algorithm takes advantage of the reduction from the vertex k-center problem to the minimum dominating set problem. The problem is solved by pruning the input graph with every possible value of the optimal solution size and then solving the minimum dominating set problem heuristically. This heuristic follows the lazy principle, which takes every decision as slow as possible (opossed to the greedy strategy). The complexity of the Scr algorithm is . The empirical performance of the Scr algorithm is very good on most benchmark instances. However, its running time rapidly becomes impractical as the input grows. So, it seems to be a good algorithm only for small instances.

See also

Related Research Articles

<span class="mw-page-title-main">Travelling salesman problem</span> NP-hard problem in combinatorial optimization

In the theory of computational complexity, the travelling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.

The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media, splitting a network prefix into multiple subnets, and technology mapping in FPGA semiconductor chip design.

An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints are linear.

<span class="mw-page-title-main">Graph coloring</span> Methodic assignment of colors to elements of a graph

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges are of the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

<span class="mw-page-title-main">Vertex cover</span> Subset of a graphs vertices, including at least one endpoint of every edge

In graph theory, a vertex cover of a graph is a set of vertices that includes at least one endpoint of every edge of the graph.

In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems with provable guarantees on the distance of the returned solution to the optimal one. Approximation algorithms naturally arise in the field of theoretical computer science as a consequence of the widely believed P ≠ NP conjecture. Under this conjecture, a wide class of optimization problems cannot be solved exactly in polynomial time. The field of approximation algorithms, therefore, tries to understand how closely it is possible to approximate optimal solutions to such problems in polynomial time. In an overwhelming majority of the cases, the guarantee of such algorithms is a multiplicative one expressed as an approximation ratio or approximation factor i.e., the optimal solution is always guaranteed to be within a (predetermined) multiplicative factor of the returned solution. However, there are also many approximation algorithms that provide an additive guarantee on the quality of the returned solution. A notable example of an approximation algorithm that provides both is the classic approximation algorithm of Lenstra, Shmoys and Tardos for scheduling on unrelated parallel machines.

<span class="mw-page-title-main">Set cover problem</span> Classical problem in combinatorics

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory.

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

<span class="mw-page-title-main">Dominating set</span> Subset of a graphs nodes such that all other nodes link to at least one

In graph theory, a dominating set for a graph G is a subset D of its vertices, such that any vertex of G is in D, or has a neighbor in D. The domination numberγ(G) is the number of vertices in a smallest dominating set for G.

Set packing is a classical NP-complete problem in computational complexity theory and combinatorics, and was one of Karp's 21 NP-complete problems. Suppose one has a finite set S and a list of subsets of S. Then, the set packing problem asks if some k subsets in the list are pairwise disjoint.

The Frank–Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method, reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956. In each iteration, the Frank–Wolfe algorithm considers a linear approximation of the objective function, and moves towards a minimizer of this linear function.

<span class="mw-page-title-main">Smoothed analysis</span>

In theoretical computer science, smoothed analysis is a way of measuring the complexity of an algorithm. Since its introduction in 2001, smoothed analysis has been used as a basis for considerable research, for problems ranging from mathematical programming, numerical analysis, machine learning, and data mining. It can give a more realistic analysis of the practical performance of the algorithm compared to analysis that uses worst-case or average-case scenarios.

<span class="mw-page-title-main">Maximum cut</span> Problem of finding a maximum cut in a graph

In a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets S and T, such that the number of edges between S and T is as large as possible. Finding such a cut is known as the max-cut problem.

In the mathematical fields of graph theory and combinatorial optimization, the bipartite dimension or biclique cover number of a graph G = (VE) is the minimum number of bicliques (that is complete bipartite subgraphs), needed to cover all edges in E. A collection of bicliques covering all edges in G is called a biclique edge cover, or sometimes biclique cover. The bipartite dimension of G is often denoted by the symbol d(G).

<span class="mw-page-title-main">NP-completeness</span> Complexity class

In computational complexity theory, a problem is NP-complete when:

  1. It is a decision problem, meaning that for any input to the problem, the output is either "yes" or "no".
  2. When the answer is "yes", this can be demonstrated through the existence of a short solution.
  3. The correctness of each solution can be verified quickly and a brute-force search algorithm can find a solution by trying all possible solutions.
  4. The problem can be used to simulate every other problem for which we can verify quickly that a solution is correct. In this sense, NP-complete problems are the hardest of the problems to which solutions can be verified quickly. If we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of every other problem to which a given solution can be easily verified.

The geometric set cover problem is the special case of the set cover problem in geometric settings. The input is a range space where is a universe of points in and is a family of subsets of called ranges, defined by the intersection of and geometric shapes such as disks and axis-parallel rectangles. The goal is to select a minimum-size subset of ranges such that every point in the universe is covered by some range in .

In statistics and machine learning, Gaussian process approximation is a computational method that accelerates inference tasks in the context of a Gaussian process model, most commonly likelihood evaluation and prediction. Like approximations of other models, they can often be expressed as additional assumptions imposed on the model, which do not correspond to any actual feature, but which retain its key properties while simplifying calculations. Many of these approximation methods can be expressed in purely linear algebraic or functional analytic terms as matrix or function approximations. Others are purely algorithmic and cannot easily be rephrased as a modification of a statistical model.

In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the identical-machines scheduling problem. The problem is parametrized by a positive integer k, and called k-way number partitioning. The input to the problem is a multiset S of numbers, whose sum is k*T.

The Recursive Largest First (RLF) algorithm is a heuristic for the NP-hard graph coloring problem. It was originally proposed by Frank Leighton in 1979.

The Karmarkar–Karp (KK) bin packing algorithms are several related approximation algorithm for the bin packing problem. The bin packing problem is a problem of packing items of different sizes into bins of identical capacity, such that the total number of bins is as small as possible. Finding the optimal solution is computationally hard. Karmarkar and Karp devised an algorithm that runs in polynomial time and finds a solution with at most bins, where OPT is the number of bins in the optimal solution. They also devised several other algorithms with slightly different approximation guarantees and run-time bounds.

References

  1. Pacheco, Joaquín A.; Casado, Silvia (December 2005). "Solving two location models with few facilities by using a hybrid heuristic: a real health resources case". Computers & Operations Research. 32 (12): 3075–3091. doi:10.1016/j.cor.2004.04.009. ISSN   0305-0548.
  2. Kaveh, A.; Nasr, H. (August 2011). "Solving the conditional and unconditional -center problem with modified harmony search: A real case study". Scientia Iranica. 18 (4): 867–877. doi: 10.1016/j.scient.2011.07.010 . ISSN   1026-3098.
  3. Hakimi, S. L. (1964). "Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph". Operations Research. 12 (3): 450–459. doi:10.1287/opre.12.3.450. JSTOR   168125.
  4. 1 2 3 Har-peled, Sariel (2011). Geometric Approximation Algorithms. Boston, MA, USA: American Mathematical Society. ISBN   978-0821849118.
  5. Vazirani, Vijay V. (2003), Approximation Algorithms, Berlin: Springer, pp. 47–48, ISBN   3-540-65367-8
  6. Gonzalez, Teofilo F. (1985), "Clustering to minimize the maximum intercluster distance", Theoretical Computer Science, vol. 38, Elsevier Science B.V., pp. 293–306, doi: 10.1016/0304-3975(85)90224-5
  7. Hochbaum, Dorit S.; Shmoys, David B. (1986), "A unified approach to approximation algorithms for bottleneck problems", Journal of the ACM, vol. 33, pp. 533–550, doi:10.1145/5925.5933, ISSN   0004-5411, S2CID   17975253
  8. Hochbaum, Dorit S. (1997), Approximation Algorithms for NP-Hard problems, Boston: PWS Publishing Company, pp. 346–398, ISBN   0-534-94968-1
  9. Crescenzi, Pierluigi; Kann, Viggo; Halldórsson, Magnús; Karpinski, Marek; Woeginger, Gerhard (2000), "Minimum k-center", A Compendium of NP Optimization Problems
  10. Feldmann, Andreas Emil (2019-03-01). "Fixed-Parameter Approximations for k-Center Problems in Low Highway Dimension Graphs" (PDF). Algorithmica. 81 (3): 1031–1052. doi:10.1007/s00453-018-0455-0. ISSN   1432-0541. S2CID   46886829.
  11. Feder, Tomás; Greene, Daniel (1988-01-01). "Optimal algorithms for approximate clustering". Proceedings of the twentieth annual ACM symposium on Theory of computing - STOC '88. New York, NY, USA: Association for Computing Machinery. pp. 434–444. doi:10.1145/62212.62255. ISBN   978-0-89791-264-8. S2CID   658151.
  12. Feldmann, Andreas Emil; Marx, Dániel (2020-07-01). "The Parameterized Hardness of the k-Center Problem in Transportation Networks" (PDF). Algorithmica. 82 (7): 1989–2005. doi:10.1007/s00453-020-00683-w. ISSN   1432-0541. S2CID   3532236.
  13. Feldmann, Andreas Emil; Vu, Tung Anh (2022). "Generalized $$k$$-Center: Distinguishing Doubling and Highway Dimension". In Bekos, Michael A.; Kaufmann, Michael (eds.). Graph-Theoretic Concepts in Computer Science. Lecture Notes in Computer Science. Vol. 13453. Cham: Springer International Publishing. pp. 215–229. arXiv: 2209.00675 . doi:10.1007/978-3-031-15914-5_16. ISBN   978-3-031-15914-5.
  14. Kariv, O.; Hakimi, S. L. (December 1979). "An Algorithmic Approach to Network Location Problems. I: The p-Centers". SIAM Journal on Applied Mathematics. 37 (3): 513–538. doi:10.1137/0137040. ISSN   0036-1399.
  15. 1 2 3 Gonzalez, Teofilo F. (1985). "Clustering to minimize the maximum intercluster distance". Theoretical Computer Science. 38: 293–306. doi: 10.1016/0304-3975(85)90224-5 . ISSN   0304-3975.
  16. 1 2 3 Dyer, M.E; Frieze, A.M (February 1985). "A simple heuristic for the p-centre problem". Operations Research Letters. 3 (6): 285–288. doi:10.1016/0167-6377(85)90002-1. ISSN   0167-6377.
  17. 1 2 3 Hochbaum, Dorit S.; Shmoys, David B. (May 1985). "A Best Possible Heuristic for the k-Center Problem". Mathematics of Operations Research . 10 (2): 180–184. doi:10.1287/moor.10.2.180. ISSN   0364-765X.
  18. 1 2 Shmoys, David B. (1995). "Computing near-optimal solutions to combinatorial optimization problems". Combinatorial Optimization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Vol. 20. pp. 355––397. CiteSeerX   10.1.1.33.1719 . doi:10.1090/dimacs/020/07. ISBN   9780821802397.
  19. 1 2 3 Garcia-Diaz, Jesus; Sanchez-Hernandez, Jairo; Menchaca-Mendez, Ricardo; Menchaca-Mendez, Rolando (2017-07-01). "When a worse approximation factor gives better performance: a 3-approximation algorithm for the vertex k-center problem". Journal of Heuristics. 23 (5): 349–366. doi:10.1007/s10732-017-9345-x. ISSN   1381-1231. S2CID   254500532.
  20. Feldmann, Andreas Emil (2019-03-01). "Fixed-Parameter Approximations for k-Center Problems in Low Highway Dimension Graphs". Algorithmica. 81 (3): 1031–1052. arXiv: 1605.02530 . doi:10.1007/s00453-018-0455-0. ISSN   1432-0541. S2CID   46886829.
  21. Feder, Tomás; Greene, Daniel (1988-01-01). "Optimal algorithms for approximate clustering". Proceedings of the twentieth annual ACM symposium on Theory of computing - STOC '88. New York, NY, USA: Association for Computing Machinery. pp. 434–444. doi:10.1145/62212.62255. ISBN   978-0-89791-264-8. S2CID   658151.
  22. Feldmann, Andreas Emil; Marx, Dániel (2020-07-01). "The Parameterized Hardness of the k-Center Problem in Transportation Networks". Algorithmica. 82 (7): 1989–2005. arXiv: 1802.08563 . doi:10.1007/s00453-020-00683-w. ISSN   1432-0541. S2CID   3532236.
  23. Feldmann, Andreas Emil; Vu, Tung Anh (2022). "Generalized $$k$$-Center: Distinguishing Doubling and Highway Dimension". In Bekos, Michael A.; Kaufmann, Michael (eds.). Graph-Theoretic Concepts in Computer Science. Lecture Notes in Computer Science. Vol. 13453. Cham: Springer International Publishing. pp. 215–229. arXiv: 2209.00675 . doi:10.1007/978-3-031-15914-5_16. ISBN   978-3-031-15914-5.
  24. Beasley, J. E. (1990). "OR-Library: Distributing Test Problems by Electronic Mail". The Journal of the Operational Research Society. 41 (11): 1069–1072. doi:10.2307/2582903. JSTOR   2582903.
  25. Reinelt, Gerhard (November 1991). "TSPLIB—A Traveling Salesman Problem Library". ORSA Journal on Computing. 3 (4): 376–384. doi:10.1287/ijoc.3.4.376. ISSN   0899-1499.
  26. Rana, Rattan; Garg, Deepak (March 2009). "Heuristic Approaches for K-Center Problem". 2009 IEEE International Advance Computing Conference. IEEE. pp. 332–335. doi:10.1109/iadcc.2009.4809031. ISBN   9781424429271. S2CID   12453616.
  27. Mihelič, Jurij; Robič, Borut (2005). "Solving the k-center Problem Efficiently with a Dominating Set Algorithm". Journal of Computing and Information Technology. 13 (3): 225. CiteSeerX   10.1.1.205.3118 . doi: 10.2498/cit.2005.03.05 . ISSN   1330-1136.

Further reading