In computational geometry, a maximum disjoint set (MDS) is a largest set of non-overlapping geometric shapes selected from a given set of candidate shapes.
Every set of non-overlapping shapes is an independent set in the intersection graph of the shapes. Therefore, the MDS problem is a special case of the maximum independent set (MIS) problem. Both problems are NP complete, but finding a MDS may be easier than finding a MIS in two respects:
Finding an MDS is important in applications such as automatic label placement, VLSI circuit design, and cellular frequency division multiplexing.
The MDS problem can be generalized by assigning a different weight to each shape and searching for a disjoint set with a maximum total weight.
In the following text, MDS(C) denotes the maximum disjoint set in a set C.
Given a set C of shapes, an approximation to MDS(C) can be found by the following greedy algorithm:
For every shape x that we add to S, we lose the shapes in N(x), because they are intersected by x and thus cannot be added to S later on. However, some of these shapes themselves intersect each other, and thus in any case it is not possible that they all be in the optimal solution MDS(S). The largest subset of shapes that can all be in the optimal solution is MDS(N(x)). Therefore, selecting an x that minimizes |MDS(N(x))| minimizes the loss from adding x to S.
In particular, if we can guarantee that there is an x for which |MDS(N(x))| is bounded by a constant (say, M), then this greedy algorithm yields a constant M-factor approximation, as we can guarantee that:
Such an upper bound M exists for several interesting cases:
When C is a set of intervals on a line, M=1, and thus the greedy algorithm finds the exact MDS. To see this, assume w.l.o.g. that the intervals are vertical, and let x be the interval with the highest bottom endpoint. All other intervals intersected by x must cross its bottom endpoint. Therefore, all intervals in N(x) intersect each other, and MDS(N(x)) has a size of at most 1 (see figure).
Therefore, in the 1-dimensional case, the MDS can be found exactly in time O(n log n): [2]
This algorithm is analogous to the earliest deadline first scheduling solution to the interval scheduling problem.
In contrast to the 1-dimensional case, in 2 or more dimensions the MDS problem becomes NP-complete, and thus has either exact super-polynomial algorithms or approximate polynomial algorithms.
When C is a set of unit disks, M=3, [3] because the leftmost disk (the disk whose center has the smallest x coordinate) intersects at most 3 other disjoint disks (see figure). Therefore the greedy algorithm yields a 3-approximation, i.e., it finds a disjoint set with a size of at least MDS(C)/3.
Similarly, when C is a set of axis-parallel unit squares, M=2.
When C is a set of arbitrary-size disks, M=5, because the disk with the smallest radius intersects at most 5 other disjoint disks (see figure).
Similarly, when C is a set of arbitrary-size axis-parallel squares, M=4.
Other constants can be calculated for other regular polygons. [3]
The most common approach to finding a MDS is divide-and-conquer. A typical algorithm in this approach looks like the following:
The main challenge with this approach is to find a geometric way to divide the set into subsets. This may require to discard a small number of shapes that do not fit into any one of the subsets, as explained in the following subsections.
Let C be a set of n axis-parallel rectangles in the plane, all with the same height H but with varying lengths. The following algorithm finds a disjoint set with a size of at least |MDS(C)|/2 in time O(n log n): [2]
Let C be a set of n axis-parallel rectangles in the plane, all with the same height but with varying lengths. There is an algorithm that finds a disjoint set with a size of at least |MDS(C)|/(1 + 1/k) in time O(n2k−1), for every constant k > 1. [2]
The algorithm is an improvement of the above-mentioned 2-approximation, by combining dynamic programming with the shifting technique of Hochbaum and Maass. [4]
This algorithm can be generalized to d dimensions. If the labels have the same size in all dimensions except one, it is possible to find a similar approximation by applying dynamic programming along one of the dimensions. This also reduces the time to n^O(1/e). [5]
Let C be a set of n axis-parallel rectangles in the plane. The following algorithm finds a disjoint set with a size of at least in time : [2]
It is provable by induction that, at the last step, either or have a cardinality of at least .
Chalermsookk and Chuzoy [6] have improved the factor to .
Chalermsook and Walczak [7] have presented an -approximation algorithm to the more general setting, in which each rectangle has a weight, and the goal is to find an independent set of maximum total weight.
For a long time, it was not known whether a constant-factor approximation exists for axis-parallel rectangles of different lengths and heights. It was conjectured that such an approximation could be found using guillotine cuts. Particularly, if there exists a guillotine separation of axes-parallel rectangles in which rectangles are separated, then it can be used in a dynamic programming approach to find a constant-factor approximation to the MDS. [8] : sub.1.2
To date, it is not known whether such a guillotine separation exists. However, there are constant-factor approximation algorithms using non-guillotine cuts:
Let C be a set of n squares or circles of identical size. Hochbaum and Maass [4] presented a polynomial-time approximation scheme for finding an MDS using a simple shifted-grid strategy. It finds a solution within (1 − ε) of the maximum in time nO(1/ε2) time and linear space. The strategy generalizes to any collection of fat objects of roughly the same size (i.e., when the maximum-to-minimum size ratio is bounded by a constant).
Let C be a set of n fat objects, such as squares or circles, of arbitrary sizes. There is a PTAS for finding an MDS based on multi-level grid alignment. It has been discovered by two groups in approximately the same time, and described in two different ways.
An algorithm of Erlebach, Jansen and Seidel [12] finds a disjoint set with a size of at least (1 − 1/k)2 · |MDS(C)| in time nO(k2), for every constant k > 1. It works in the following way.
Scale the disks so that the smallest disk has diameter 1. Partition the disks to levels, based on the logarithm of their size. I.e., the j-th level contains all disks with diameter between (k + 1)j and (k + 1)j+1, for j ≤ 0 (the smallest disk is in level 0).
For each level j, impose a grid on the plane that consists of lines that are (k + 1)j+1 apart from each other. By construction, every disk can intersect at most one horizontal line and one vertical line from its level.
For every r, s between 0 and k, define D(r,s) as the subset of disks that are not intersected by any horizontal line whose index modulo k is r, nor by any vertical line whose index modulu k is s. By the pigeonhole principle, there is at least one pair (r,s) such that , i.e., we can find the MDS only in D(r,s) and miss only a small fraction of the disks in the optimal solution:
An algorithm of Chan [5] finds a disjoint set with a size of at least (1 − 2/k)·|MDS(C)| in time nO(k), for every constant k > 1.
The algorithm uses shifted quadtrees. The key concept of the algorithm is alignment to the quadtree grid. An object of size r is called k-aligned (where k ≥ 1 is a constant) if it is inside a quadtree cell of size at most kr (R ≤ kr).
By definition, a k-aligned object that intersects the boundary of a quatree cell of size R must have a size of at least R/k (r > R/k). The boundary of a cell of size R can be covered by 4k squares of size R/k; hence the number of disjoint fat objects intersecting the boundary of that cell is at most 4kc, where c is a constant measuring the fatness of the objects.
Therefore, if all objects are fat and k-aligned, it is possible to find the exact maximum disjoint set in time nO(kc) using a divide-and-conquer algorithm. Start with a quadtree cell that contains all objects. Then recursively divide it to smaller quadtree cells, find the maximum in each smaller cell, and combine the results to get the maximum in the larger cell. Since the number of disjoint fat objects intersecting the boundary of every quadtree cell is bounded by 4kc, we can simply "guess" which objects intersect the boundary in the optimal solution, and then apply divide-and-conquer to the objects inside.
If almost all objects are k-aligned, we can just discard the objects that are not k-aligned, and find a maximum disjoint set of the remaining objects in time nO(k). This results in a (1 − e) approximation, where e is the fraction of objects that are not k-aligned.
If most objects are not k-aligned, we can try to make them k-aligned by shifting the grid in multiples of (1/k,1/k). First, scale the objects such that they are all contained in the unit square. Then, consider k shifts of the grid: (0,0), (1/k,1/k), (2/k,2/k), ..., ((k − 1)/k,(k − 1)/k). I.e., for each j in {0,...,k − 1}, consider a shift of the grid in (j/k,j/k). It is possible to prove that every label will be 2k-aligned for at least k − 2 values of j. Now, for every j, discard the objects that are not k-aligned in the (j/k,j/k) shift, and find a maximum disjoint set of the remaining objects. Call that set A(j). Call the real maximum disjoint set is A*. Then:
Therefore, the largest A(j) has a size of at least: (1 − 2/k)|A*|. The return value of the algorithm is the largest A(j); the approximation factor is (1 − 2/k), and the run time is nO(k). We can make the approximation factor as small as we want, so this is a PTAS.
Both versions can be generalized to d dimensions (with different approximation ratios) and to the weighted case.
Several divide-and-conquer algorithms are based on a certain geometric separator theorem. A geometric separator is a line or shape that separates a given set of shapes to two smaller subsets, such that the number of shapes lost during the division is relatively small. This allows both PTASs and sub-exponential exact algorithms, as explained below.
Let C be a set of n fat objects, such as squares or circles, of arbitrary sizes. Chan [5] described an algorithm finds a disjoint set with a size of at least (1 − O(√b))·|MDS(C)| in time nO(b), for every constant b > 1.
The algorithm is based on the following geometric separator theorem, which can be proved similarly to the proof of the existence of geometric separator for disjoint squares:
where a and c are constants. If we could calculate MDS(C) exactly, we could make the constant a as low as 2/3 by a proper selection of the separator rectangle. But since we can only approximate MDS(C) by a constant factor, the constant a must be larger. Fortunately, a remains a constant independent of |C|.
This separator theorem allows to build the following PTAS:
Select a constant b. Check all possible combinations of up to b + 1 labels.
Let E(m) be the error of the above algorithm when the optimal MDS size is MDS(C) = m. When m ≤ b, the error is 0 because the maximum disjoint set is calculated exactly; when m > b, the error increases by at most c√m the number of labels intersected by the separator. The worst case for the algorithm is when the split in each step is in the maximum possible ratio which is a:(1 − a). Therefore the error function satisfies the following recurrence relation:
The solution to this recurrence is:
i.e., . We can make the approximation factor as small as we want by a proper selection of b.
This PTAS is more space-efficient than the PTAS based on quadtrees, and can handle a generalization where the objects may slide, but it cannot handle the weighted case.
Let C be a set of n disks, such that the ratio between the largest radius and the smallest radius is at most r. The following algorithm finds MDS(C) exactly in time . [13]
The algorithm is based on a width-bounded geometric separator on the set Q of the centers of all disks in C. This separator theorem allows to build the following exact algorithm:
The run time of this algorithm satisfies the following recurrence relation:
The solution to this recurrence is:
A pseudo-disks-set is a set of objects in which the boundaries of every pair of objects intersect at most twice (Note that this definition relates to a whole collection, and does not say anything about the shapes of the specific objects in the collection). A pseudo-disks-set has a bounded union complexity, i.e., the number of intersection points on the boundary of the union of all objects is linear in the number of objects. For example, a set of squares or circles of arbitrary sizes is a pseudo-disks-set.
Let C be a pseudo-disks-set with n objects. A local search algorithm by Chan and Har-Peled [14] finds a disjoint set of size at least in time , for every integer constant :
Every exchange in the search step increases the size of S by at least 1, and thus can happen at most n times.
The algorithm is very simple; the difficult part is to prove the approximation ratio. [14]
See also. [15]
Let C be a pseudo-disks-set with n objects and union complexity u. Using linear programming relaxation, it is possible to find a disjoint set of size at least . This is possible either with a randomized algorithm that has a high probability of success and run time , or a deterministic algorithm with a slower run time (but still polynomial). This algorithm can be generalized to the weighted case. [14]
In complex analysis, the Riemann mapping theorem states that if is a non-empty simply connected open subset of the complex number plane which is not all of , then there exists a biholomorphic mapping from onto the open unit disk
The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media, and technology mapping in FPGA semiconductor chip design.
In graph theory, an independent set, stable set, coclique or anticlique is a set of vertices in a graph, no two of which are adjacent. That is, it is a set of vertices such that for every two vertices in , there is no edge connecting the two. Equivalently, each edge in the graph has at most one endpoint in . A set is independent if and only if it is a clique in the graph's complement. The size of an independent set is the number of vertices it contains. Independent sets have also been called "internally stable sets", of which "stable set" is a shortening.
In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems with provable guarantees on the distance of the returned solution to the optimal one. Approximation algorithms naturally arise in the field of theoretical computer science as a consequence of the widely believed P ≠ NP conjecture. Under this conjecture, a wide class of optimization problems cannot be solved exactly in polynomial time. The field of approximation algorithms, therefore, tries to understand how closely it is possible to approximate optimal solutions to such problems in polynomial time. In an overwhelming majority of the cases, the guarantee of such algorithms is a multiplicative one expressed as an approximation ratio or approximation factor i.e., the optimal solution is always guaranteed to be within a (predetermined) multiplicative factor of the returned solution. However, there are also many approximation algorithms that provide an additive guarantee on the quality of the returned solution. A notable example of an approximation algorithm that provides both is the classic approximation algorithm of Lenstra, Shmoys and Tardos for scheduling on unrelated parallel machines.
The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972.
In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.
In graph theory, a dominating set for a graph G is a subset D of its vertices, such that any vertex of G is either in D, or has a neighbor in D. The domination numberγ(G) is the number of vertices in a smallest dominating set for G.
In graph theory and graph algorithms, a feedback arc set or feedback edge set in a directed graph is a subset of the edges of the graph that contains at least one edge out of every cycle in the graph. Removing these edges from the graph breaks all of the cycles, producing a directed acyclic graph, an acyclic subgraph of the given graph. The feedback arc set with the fewest possible edges is the minimum feedback arc set and its removal leaves the maximum acyclic subgraph; weighted versions of these optimization problems are also used. If a feedback arc set is minimal, meaning that removing any edge from it produces a subset that is not a feedback arc set, then it has an additional property: reversing all of its edges, rather than removing them, produces a directed acyclic graph.
Set packing is a classical NP-complete problem in computational complexity theory and combinatorics, and was one of Karp's 21 NP-complete problems. Suppose one has a finite set S and a list of subsets of S. Then, the set packing problem asks if some k subsets in the list are pairwise disjoint.
In graph theory, a cut is a partition of the vertices of a graph into two disjoint subsets. Any cut determines a cut-set, the set of edges that have one endpoint in each subset of the partition. These edges are said to cross the cut. In a connected graph, each cut-set determines a unique cut, and in some cases cuts are identified with their cut-sets rather than with their vertex partitions.
Guillotine cutting is the process of producing small rectangular items of fixed dimensions from a given large rectangular sheet, using only guillotine-cuts. A guillotine-cut is a straight bisecting line going from one edge of an existing rectangle to the opposite edge, similarly to a paper guillotine.
In geometry, the hyperplane separation theorem is a theorem about disjoint convex sets in n-dimensional Euclidean space. There are several rather similar versions. In one version of the theorem, if both these sets are closed and at least one of them is compact, then there is a hyperplane in between them and even two parallel hyperplanes in between them separated by a gap. In another version, if both disjoint convex sets are open, then there is a hyperplane in between them, but not necessarily any gap. An axis which is orthogonal to a separating hyperplane is a separating axis, because the orthogonal projections of the convex bodies onto the axis are disjoint.
Interval scheduling is a class of problems in computer science, particularly in the area of algorithm design. The problems consider a set of tasks. Each task is represented by an interval describing the time in which it needs to be processed by some machine. For instance, task A might run from 2:00 to 5:00, task B might run from 4:00 to 10:00 and task C might run from 9:00 to 11:00. A subset of intervals is compatible if no two intervals overlap on the machine/resource. For example, the subset {A,C} is compatible, as is the subset {B}; but neither {A,B} nor {B,C} are compatible subsets, because the corresponding intervals within each subset overlap.
In mathematics, the Vitali covering lemma is a combinatorial and geometric result commonly used in measure theory of Euclidean spaces. This lemma is an intermediate step, of independent interest, in the proof of the Vitali covering theorem. The covering theorem is credited to the Italian mathematician Giuseppe Vitali. The theorem states that it is possible to cover, up to a Lebesgue-negligible set, a given subset E of Rd by a disjoint family extracted from a Vitali covering of E.
In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of vertices from an n-vertex graph can partition the graph into disjoint subgraphs each of which has at most vertices.
In computer science, data stream clustering is defined as the clustering of data that arrive continuously such as telephone records, multimedia data, financial transactions etc. Data stream clustering is usually studied as a streaming algorithm and the objective is, given a sequence of points, to construct a good clustering of the stream, using a small amount of memory and time.
In computational geometry, an ε-net is the approximation of a general set by a collection of simpler subsets. In probability theory it is the approximation of one probability distribution by another.
A geometric separator is a line that partitions a collection of geometric shapes into two subsets, such that proportion of shapes in each subset is bounded, and the number of shapes that do not belong to any subset is small.
The geometric set cover problem is the special case of the set cover problem in geometric settings. The input is a range space where is a universe of points in and is a family of subsets of called ranges, defined by the intersection of and geometric shapes such as disks and axis-parallel rectangles. The goal is to select a minimum-size subset of ranges such that every point in the universe is covered by some range in .
Maximin share (MMS) is a criterion of fair item allocation. Given a set of items with different values, the 1-out-of-n maximin-share is the maximum value that can be gained by partitioning the items into parts and taking the part with the minimum value. An allocation of items among agents with different valuations is called MMS-fair if each agent gets a bundle that is at least as good as his/her 1-out-of-n maximin-share. MMS fairness is a relaxation of the criterion of proportionality - each agent gets a bundle that is at least as good as the equal split ( of every resource). Proportionality can be guaranteed when the items are divisible, but not when they are indivisible, even if all agents have identical valuations. In contrast, MMS fairness can always be guaranteed to identical agents, so it is a natural alternative to proportionality even when the agents are different.