Set cover problem

Last updated
Example of an instance of set cover problem. SetCover.svg
Example of an instance of set cover problem.

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory.

Contents

Given a set of elements {1, 2, …, n} (called the universe) and a collection S of m subsets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of S whose union equals the universe. For example, consider the universe U = {1, 2, 3, 4, 5} and the collection of sets S = { {1, 2, 3}, {2, 4}, {3, 4}, {4, 5} }. Clearly the union of S is U. However, we can cover all elements with only two sets: { {1, 2, 3}, {4, 5} }, see picture. Therefore, the solution to the set cover problem has size 2.

More formally, given a universe and a family of subsets of , a set cover is a subfamily of sets whose union is .

The decision version of set covering is NP-complete. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972. The optimization/search version of set cover is NP-hard. [1] It is a problem "whose study has led to the development of fundamental techniques for the entire field" of approximation algorithms. [2]

Variants

In the weighted set cover problem, each set is assigned a positive weight (representing its cost), and the goal is to find a set cover with a smallest weight. The usual (unweighted) set cover corresponds to all sets having a weight of 1.

In the fractional set cover problem, it is allowed to select fractions of sets, rather than entire sets. A fractional set cover is an assignment of a fraction (a number in [0,1]) to each set in , such that for each element x in the universe, the sum of fractions of sets that contain x is at least 1. The goal is to find a fractional set cover in which the sum of fractions is as small as possible. Note that a (usual) set cover is equivalent to a fractional set cover in which all fractions are either 0 or 1; therefore, the size of the smallest fractional cover is at most the size of the smallest cover, but may be smaller. For example, consider the universe U = {1, 2, 3} and the collection of sets S = { {1, 2}, {2, 3}, {3, 1} }. The smallest set cover has a size of 2, e.g. { {1, 2}, {2, 3} }. But there is a fractional set cover of size 1.5, in which a 0.5 fraction of each set is taken.

Linear program formulation

The set cover problem can be formulated as the following integer linear program (ILP). [3]

minimize(minimize the number of sets)
subject tofor all (cover every element of the universe)
for all .(every set is either in the set cover or not)

For a more compact representation of the covering constraint, one can define an incidence matrix , where each row corresponds to an element and each column corresponds to a set, and if element e is in set s, and otherwise. Then, the covering constraint can be written as .

Weighted set cover is described by a program identical to the one given above, except that the objective function to minimize is , where is the weight of set .

Fractional set cover is described by a program identical to the one given above, except that can be non-integer, so the last constraint is replaced by .

This linear program belongs to the more general class of LPs for covering problems, as all the coefficients in the objective function and both sides of the constraints are non-negative. The integrality gap of the ILP is at most (where is the size of the universe). It has been shown that its relaxation indeed gives a factor- approximation algorithm for the minimum set cover problem. [4] See randomized rounding#setcover for a detailed explanation.

Hitting set formulation

Set covering is equivalent to the hitting set problem. That is seen by observing that an instance of set covering can be viewed as an arbitrary bipartite graph, with the universe represented by vertices on the left, the sets represented by vertices on the right, and edges representing the membership of elements to sets. The task is then to find a minimum cardinality subset of left-vertices that has a non-trivial intersection with each of the right-vertices, which is precisely the hitting set problem.

In the field of computational geometry, a hitting set for a collection of geometrical objects is also called a stabbing set or piercing set. [5]

Greedy algorithm

There is a greedy algorithm for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements. This method can be implemented in time linear in the sum of sizes of the input sets, using a bucket queue to prioritize the sets. [6] It achieves an approximation ratio of , where is the size of the set to be covered. [7] In other words, it finds a covering that may be times as large as the minimum one, where is the -th harmonic number:

This greedy algorithm actually achieves an approximation ratio of where is the maximum cardinality set of . For dense instances, however, there exists a -approximation algorithm for every . [8]

Tight example for the greedy algorithm with k=3 SetCoverGreedy.gif
Tight example for the greedy algorithm with k=3

There is a standard example on which the greedy algorithm achieves an approximation ratio of . The universe consists of elements. The set system consists of pairwise disjoint sets with sizes respectively, as well as two additional disjoint sets , each of which contains half of the elements from each . On this input, the greedy algorithm takes the sets , in that order, while the optimal solution consists only of and . An example of such an input for is pictured on the right.

Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms (see Inapproximability results below), under plausible complexity assumptions. A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly . [9]

Low-frequency systems

If each element occurs in at most f sets, then a solution can be found in polynomial time that approximates the optimum to within a factor of f using LP relaxation.

If the constraint is replaced by for all S in in the integer linear program shown above, then it becomes a (non-integer) linear program L. The algorithm can be described as follows:

  1. Find an optimal solution O for the program L using some polynomial-time method of solving linear programs.
  2. Pick all sets S for which the corresponding variable xS has value at least 1/f in the solution O. [10]

Inapproximability results

When refers to the size of the universe, Lund & Yannakakis (1994) showed that set covering cannot be approximated in polynomial time to within a factor of , unless NP has quasi-polynomial time algorithms. Feige (1998) improved this lower bound to under the same assumptions, which essentially matches the approximation ratio achieved by the greedy algorithm. Raz & Safra (1997) established a lower bound of , where is a certain constant, under the weaker assumption that PNP. A similar result with a higher value of was recently proved by Alon, Moshkovitz & Safra (2006). Dinur & Steurer (2013) showed optimal inapproximability by proving that it cannot be approximated to unless PNP.

Weighted set cover

Relaxing the integer linear program for weighted set cover stated above, one may use randomized rounding to get an -factor approximation. Non weighted set cover can be adapted to the weighted case. [11]

Fractional set cover

Notes

  1. Korte & Vygen 2012, p. 414.
  2. Vazirani (2001 , p. 15)
  3. Vazirani (2001 , p. 108)
  4. Vazirani (2001 , pp. 110–112)
  5. Nielsen, Frank (2000-09-06). "Fast stabbing of boxes in high dimensions" (PDF). Theoretical Computer Science. 246 (1): 53–72. doi: 10.1016/S0304-3975(98)00336-3 . ISSN   0304-3975.
  6. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990], "Exercise 35.3-3", Introduction to Algorithms (3rd ed.), MIT Press and McGraw-Hill, p. 1122, ISBN   0-262-03384-4
  7. Chvatal, V. (August 1979), "A Greedy Heuristic for the Set-Covering Problem", Mathematics of Operations Research , 4 (3): 233–235, doi:10.1287/moor.4.3.233, JSTOR   3689577
  8. Karpinski & Zelikovsky 1998
  9. Slavík Petr A tight analysis of the greedy algorithm for set cover. STOC'96, Pages 435-441, doi : 10.1145/237814.237991
  10. Vazirani (2001 , pp. 118–119)
  11. Vazirani (2001 , Chapter 14)
  12. Information., Sandia National Laboratories. United States. Department of Energy. United States. Department of Energy. Office of Scientific and Technical (1999). On the Red-Blue Set Cover Problem. United States. Dept. of Energy. OCLC   68396743.
  13. Gainer-Dewar, Andrew; Vera-Licona, Paola (2017), "The minimal hitting set generation problem: algorithms and computation", SIAM Journal on Discrete Mathematics , 31 (1): 63–100, arXiv: 1601.02939 , doi:10.1137/15M1055024, MR   3590650, S2CID   9240467

Related Research Articles

<span class="mw-page-title-main">Assignment problem</span> Combinatorial optimization problem

The assignment problem is a fundamental combinatorial optimization problem. In its most general form, the problem is as follows:

<span class="mw-page-title-main">Graph coloring</span> Methodic assignment of colors to elements of a graph

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges are of the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

<span class="mw-page-title-main">Steiner tree problem</span> On short connecting networks with added vertices

In combinatorial mathematics, the Steiner tree problem, or minimum Steiner tree problem, named after Jakob Steiner, is an umbrella term for a class of problems in combinatorial optimization. While Steiner tree problems may be formulated in a number of settings, they all require an optimal interconnect for a given set of objects and a predefined objective function. One well-known variant, which is often used synonymously with the term Steiner tree problem, is the Steiner tree problem in graphs. Given an undirected graph with non-negative edge weights and a subset of vertices, usually referred to as terminals, the Steiner tree problem in graphs requires a tree of minimum weight that contains all terminals and minimizes the total weight of its edges. Further well-known variants are the Euclidean Steiner tree problem and the rectilinear minimum Steiner tree problem.

<span class="mw-page-title-main">Vertex cover</span> Subset of a graphs vertices, including at least one endpoint of every edge

In graph theory, a vertex cover of a graph is a set of vertices that includes at least one endpoint of every edge of the graph.

In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems with provable guarantees on the distance of the returned solution to the optimal one. Approximation algorithms naturally arise in the field of theoretical computer science as a consequence of the widely believed P ≠ NP conjecture. Under this conjecture, a wide class of optimization problems cannot be solved exactly in polynomial time. The field of approximation algorithms, therefore, tries to understand how closely it is possible to approximate optimal solutions to such problems in polynomial time. In an overwhelming majority of the cases, the guarantee of such algorithms is a multiplicative one expressed as an approximation ratio or approximation factor i.e., the optimal solution is always guaranteed to be within a (predetermined) multiplicative factor of the returned solution. However, there are also many approximation algorithms that provide an additive guarantee on the quality of the returned solution. A notable example of an approximation algorithm that provides both is the classic approximation algorithm of Lenstra, Shmoys and Tardos for scheduling on unrelated parallel machines.

In the mathematical discipline of graph theory, a matching or independent edge set in an undirected graph is a set of edges without common vertices. In other words, a subset of the edges is a matching if each vertex appears in at most one edge of that matching. Finding a matching in a bipartite graph can be treated as a network flow problem.

<span class="mw-page-title-main">Dominating set</span> Subset of a graphs nodes such that all other nodes link to at least one

In graph theory, a dominating set for a graph G is a subset D of its vertices, such that any vertex of G is in D, or has a neighbor in D. The domination numberγ(G) is the number of vertices in a smallest dominating set for G.

Set packing is a classical NP-complete problem in computational complexity theory and combinatorics, and was one of Karp's 21 NP-complete problems. Suppose one has a finite set S and a list of subsets of S. Then, the set packing problem asks if some k subsets in the list are pairwise disjoint.

In graph theory, a cut is a partition of the vertices of a graph into two disjoint subsets. Any cut determines a cut-set, the set of edges that have one endpoint in each subset of the partition. These edges are said to cross the cut. In a connected graph, each cut-set determines a unique cut, and in some cases cuts are identified with their cut-sets rather than with their vertex partitions.

In computer science, the shortest common supersequence of two sequences X and Y is the shortest sequence which has X and Y as subsequences. This is a problem closely related to the longest common subsequence problem. Given two sequences X = < x1,...,xm > and Y = < y1,...,yn >, a sequence U = < u1,...,uk > is a common supersequence of X and Y if items can be removed from U to produce X and Y.

In the mathematical fields of graph theory and combinatorial optimization, the bipartite dimension or biclique cover number of a graph G = (VE) is the minimum number of bicliques (that is complete bipartite subgraphs), needed to cover all edges in E. A collection of bicliques covering all edges in G is called a biclique edge cover, or sometimes biclique cover. The bipartite dimension of G is often denoted by the symbol d(G).

The maximum coverage problem is a classical question in computer science, computational complexity theory, and operations research. It is a problem that is widely taught in approximation algorithms.

In mathematics, the minimum k-cut is a combinatorial optimization problem that requires finding a set of edges whose removal would partition the graph to at least k connected components. These edges are referred to as k-cut. The goal is to find the minimum-weight k-cut. This partitioning can have applications in VLSI design, data-mining, finite elements and communication in parallel computing.

In graph theory, the metric k-center problem is a combinatorial optimization problem studied in theoretical computer science. Given n cities with specified distances, one wants to build k warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of k vertices for which the largest distance of any point to its closest vertex in the k-set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality.

In computer science and operations research, randomized rounding is a widely used approach for designing and analyzing approximation algorithms.

In theoretical computer science, the term isolation lemma refers to randomized algorithms that reduce the number of solutions to a problem to one, should a solution exist. This is achieved by constructing random constraints such that, with non-negligible probability, exactly one solution satisfies these additional constraints if the solution space is not empty. Isolation lemmas have important applications in computer science, such as the Valiant–Vazirani theorem and Toda's theorem in computational complexity theory.

In the mathematical field of graph theory, the intersection number of a graph is the smallest number of elements in a representation of as an intersection graph of finite sets. In such a representation, each vertex is represented as a set, and two vertices are connected by an edge whenever their sets have a common element. Equivalently, the intersection number is the smallest number of cliques needed to cover all of the edges of .

<span class="mw-page-title-main">Dense subgraph</span> Highly connected subgraph

In graph theory and computer science, a dense subgraph is a subgraph with many edges per vertex. This is formalized as follows: let G = (V, E) be an undirected graph and let S = (VS, ES) be a subgraph of G. Then the density of S is defined to be:

The geometric set cover problem is the special case of the set cover problem in geometric settings. The input is a range space where is a universe of points in and is a family of subsets of called ranges, defined by the intersection of and geometric shapes such as disks and axis-parallel rectangles. The goal is to select a minimum-size subset of ranges such that every point in the universe is covered by some range in .

The welfare maximization problem is an optimization problem studied in economics and computer science. Its goal is to partition a set of items among agents with different utility functions, such that the welfare – defined as the sum of the agents' utilities – is as high as possible. In other words, the goal is to find an item allocation satisfying the utilitarian rule.

References