Activity selection problem

Last updated June 13, 2021

The activity selection problem is a combinatorial optimization problem concerning the selection of non-conflicting activities to perform within a given time frame, given a set of activities each marked by a start time (s_i) and finish time (f_i). The problem is to select the maximum number of activities that can be performed by a single person or machine, assuming that a person can only work on a single activity at a time. The activity selection problem is also known as the Interval scheduling maximization problem (ISMP), which is a special type of the more general Interval Scheduling problem.

Formal definition

Assume there exist n activities with each of them being represented by a start time s_i and finish time f_i. Two activities i and j are said to be non-conflicting if s_i ≥ f_j or s_j ≥ f_i. The activity selection problem consists in finding the maximal solution set (S) of non-conflicting activities, or more precisely there must exist no solution set S' such that |S'| > |S| in the case that multiple maximal solutions have equal sizes.

Optimal solution

The activity selection problem is notable in that using a greedy algorithm to find a solution will always result in an optimal solution. A pseudocode sketch of the iterative version of the algorithm and a proof of the optimality of its result are included below.

Algorithm

Greedy-Iterative-Activity-Selector(A,s,f):SortAbyfinishtimesstoredinS={A[1]}k=1n=A.lengthfori=2ton:ifs[i]≥f[k]:S=SU{A[i]}k=ireturnS

Explanation

Line 1: This algorithm is called Greedy-Iterative-Activity-Selector, because it is first of all a greedy algorithm, and then it is iterative. There's also a recursive version of this greedy algorithm.

$A$ is an array containing the activities.
$s$ is an array containing the start times of the activities in $A$ .
$f$ is an array containing the finish times of the activities in $A$ .

Note that these arrays are indexed starting from 1 up to the length of the corresponding array.

Line 3: Sorts in increasing order of finish times the array of activities $A$ by using the finish times stored in the array $f$ . This operation can be done in $O(n\cdot \log n)$ time, using for example merge sort, heap sort, or quick sort algorithms.

Line 4: Creates a set $S$ to store the selected activities, and initialises it with the activity $A[1]$ that has the earliest finish time.

Line 5: Creates a variable $k$ that keeps track of the index of the last selected activity.

Line 9: Starts iterating from the second element of that array $A$ up to its last element.

Lines 10,11: If the start time $s[i]$ of the $ith$ activity ( $A[i]$ ) is greater or equal to the finish time $f[k]$ of the last selected activity ( $A[k]$ ), then $A[i]$ is compatible to the selected activities in the set $S$ , and thus it can be added to $S$ .

Line 12: The index of the last selected activity is updated to the just added activity $A[i]$ .

Proof of optimality

Let $S=\{1,2,\ldots ,n\}$ be the set of activities ordered by finish time. Assume that $A\subseteq S$ is an optimal solution, also ordered by finish time; and that the index of the first activity in A is $k\neq 1$ , i.e., this optimal solution does not start with the greedy choice. We will show that $B=(A\setminus \{k\})\cup \{1\}$ , which begins with the greedy choice (activity 1), is another optimal solution. Since $f_{1}\leq f_{k}$ , and the activities in A are disjoint by definition, the activities in B are also disjoint. Since B has the same number of activities as A, that is, $|A|=|B|$ , B is also optimal.

Once the greedy choice is made, the problem reduces to finding an optimal solution for the subproblem. If A is an optimal solution to the original problem S containing the greedy choice, then $A^{\prime }=A\setminus \{1\}$ is an optimal solution to the activity-selection problem $S'=\{i\in S:s_{i}\geq f_{1}\}$ .

Why? If this were not the case, pick a solution B′ to S′ with more activities than A′ containing the greedy choice for S′. Then, adding 1 to B′ would yield a feasible solution B to S with more activities than A, contradicting the optimality.

Weighted activity selection problem

The generalized version of the activity selection problem involves selecting an optimal set of non-overlapping activities such that the total weight is maximized. Unlike the unweighted version, there is no greedy solution to the weighted activity selection problem. However, a dynamic programming solution can readily be formed using the following approach:^[1]

Consider an optimal solution containing activity $k$ . We now have non-overlapping activities on the left and right of $k$ . We can recursively find solutions for these two sets because of optimal sub-structure. As we don't know $k$ , we can try each of the activities. This approach leads to an $O(n^{3})$ solution. This can be optimized further considering that for each set of activities in $(i,j)$ , we can find the optimal solution if we had known the solution for $(i,t)$ , where $t$ is the last non-overlapping interval with $j$ in $(i,j)$ . This yields an $O(n^{2})$ solution. This can be further optimized considering the fact that we do not need to consider all ranges $(i,j)$ but instead just $(1,j)$ . The following algorithm thus yields an $O(n\log n)$ solution:

Weighted-Activity-Selection(S):// S = list of activitiessortSbyfinishtimeopt[0]=0// opt[j] represents optimal solution (sum of weights of selected activities) for S[1,2..,j]fori=1ton:t=binarysearchtofindactivitywithfinishtime<=starttimefori// if there are more than one such activities, choose the one with last finish timeopt[i]=MAX(opt[i-1],opt[t]+w(i))returnopt[n]

Related Research Articles

Knapsack problem Problem in combinatorial optimization

The knapsack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. The problem often arises in resource allocation where the decision makers have to choose from a set of non-divisible projects or tasks under a fixed budget or time constraint, respectively.

Merge sort A divide and combine sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same in the input and output. Merge sort is a divide and conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

Merge algorithms are a family of algorithms that take multiple sorted lists as input and produce a single list as output, containing all the elements of the inputs lists in sorted order. These algorithms are used as subroutines in various sorting algorithms, most famously merge sort.

The travelling salesman problem asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.

The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset $of integers and a target-sum, and the question is to decide whether any subset of the integers sum to precisely . The problem is known to be NP-complete. Moreover, some restricted variants of it are NP-complete too, for example:$

A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but, nonetheless, a greedy heuristic can yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time.

Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics.

In the bin packing problem, items of different volumes must be packed into a finite number of bins or containers each of a fixed given volume in a way that minimizes the number of bins used. In computational complexity theory, it is a combinatorial NP-hard problem. The decision problem is NP-complete.

A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random determined by the random bits; thus either the running time, or the output are random variables.

In computational complexity theory, the 3SUM problem asks if a given set of $real numbers contains three elements that sum to zero. A generalized version, k -SUM, asks the same question on k numbers. 3SUM can be easily solved in time, and matching lower bounds are known in some specialized models of computation.$

In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list or array; such a number is called the kth order statistic. This includes the cases of finding the minimum, maximum, and median elements. There are O(n)-time selection algorithms, and sublinear performance is possible for structured data; in the extreme, O(1) for an array of sorted data. Selection is a subproblem of more complex problems like the nearest neighbor and shortest path problems. Many selection algorithms are derived by generalizing a sorting algorithm, and conversely some sorting algorithms can be derived as repeated application of selection.

In probability theory and machine learning, the multi-armed bandit problem is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. The name comes from imagining a gambler at a row of slot machines, who has to decide which machines to play, how many times to play each machine and in which order to play them, and whether to continue with the current machine or try a different machine. The multi-armed bandit problem also falls into the broad category of stochastic scheduling.

Interval scheduling is a class of problems in computer science, particularly in the area of algorithm design. The problems consider a set of tasks. Each task is represented by an interval describing the time in which it needs to be executed. For instance, task A might run from 2:00 to 5:00, task B might run from 4:00 to 10:00 and task C might run from 9:00 to 11:00. A subset of intervals is compatible if no two intervals overlap. For example, the subset {A,C} is compatible, as is the subset {B}; but neither {A,B} nor {B,C} are compatible subsets, because the corresponding intervals within each subset overlap.

In graph theory, the metric k-center or metric facility location problem is a combinatorial optimization problem studied in theoretical computer science. Given n cities with specified distances, one wants to build k warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory this means finding a set of k vertices for which the largest distance of any point to its closest vertex in the k-set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality.

In computer science, k-way merge algorithms or multiway merges are a specific type of sequence merge algorithms that specialize in taking in k sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two. 2-way merges are also referred to as binary merges.

Quantum optimization algorithms are quantum algorithms that are used to solve optimization problems. Mathematical optimization deals with finding the best solution to a problem from a set of possible solutions. Mostly, the optimization problem is formulated as a minimization problem, where one tries to minimize an error which depends on the solution: the optimal solution has the minimal error. Different optimization techniques are applied in various fields such as mechanics, economics and engineering, and as the complexity and amount of data involved rise, more efficient ways of solving optimization problems are needed. The power of quantum computing may allow problems which are not practically feasible on classical computers to be solved, or suggest a considerable speed up with respect to the best known classical algorithm.

The vertex k-center problem is a classical NP-hard problem in computer science. It has application in facility location and clustering. Basically, the vertex k-center problem models the following real problem: given a city with $facilities, find the best facilities where to build fire stations. Since firemen must attend any emergency as quickly as possible, the distance from the farthest facility to its nearest fire station has to be as small as possible. In other words, the position of the fire stations must be such that every possible fire is attended as quickly as possible.$

In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the multiprocessor scheduling problem. The problem is parametrized by a positive integer k, and called k-way number partitioning. The input to the problem is a multiset S of numbers, whose sum is k*T.

In computer science, greedy number partitioning is a greedy algorithm for multiway number partitioning. It imitates the way children choose teams for a game. It was first analyzed by Ronald Graham in the 1960s in the context of the multiprocessor scheduling problem. In this context, it is often called Longest Processing Time (LPT).

In computer science, the largest differencing method is an algorithm for solving the partition problem and the multiway number partitioning. It is also called the Karmarkar–Karp algorithm after its inventors, Narendra Karmarkar and Richard M. Karp. It is often abbreviated as LDM.

References

↑ Dynamic Programming with introduction to Weighted Activity Selection

External links

Activity Selection Problem

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Dynamic Programming with introduction to Weighted Activity Selection

[1]