List update problem

Last updated

The List Update or the List Access problem is a simple model used in the study of competitive analysis of online algorithms. Given a set of items in a list where the cost of accessing an item is proportional to its distance from the head of the list, e.g. a Linked List, and a request sequence of accesses, the problem is to come up with a strategy of reordering the list so that the total cost of accesses is minimized. The reordering can be done at any time but incurs a cost. The standard model includes two reordering actions:

Contents

An online algorithm for this problem has to reorder the elements and serve requests based only on the knowledge of previously requested items and hence its strategy may not have the optimum cost as compared to an offline algorithm that gets to see the entire request sequence and devise a complete strategy before serving the first request.

Along with its original uses, this problem has been suggested to have a strong similarity to problems of improving global context and compressibility following a Burrows–Wheeler transform. Following this transform, files tend to have large regions with locally high frequencies, and compression efficiency is greatly improved by techniques that tend to move frequently-occurring characters toward zero, or the front of the "list". Due to this, methods and variants of Move-to-Front and frequency counts often follow the BWT algorithm to improve compressibility.

Adversary models

An adversary is an entity that gets to choose the request sequence for an algorithm ALG. Depending on whether can be changed based on the strategy of ALG, adversaries are given various powers, and the performance of ALG is measured against these adversaries.

An oblivious adversary has to construct the entire request sequence before running ALG, and pays the optimal offline price, which is compared against

An adaptive online adversary gets to make the next request based on the previous results of the online algorithm, but pays for the request optimally and online.

An adaptive offline adversary gets to make the next request based on the previous results of the online algorithm, but pays the optimal offline cost.

Offline algorithms

Competitive analysis for many list update problems were carried out without any specific knowledge of the exact nature of the optimum offline algorithm (OPT). There exist algorithm runs in O(n2l(l-1)!) time and O(l!) space where n is the length of the request sequence and l is the length of the list. [1] The best known optimal offline algorithm dependent on request sequence length runs in O(l^2(l−1)!n) time published by Dr Srikrishnan Divakaran in 2014. [2]

Paid transpositions are in general necessary for optimum algorithms. Consider a list (a,b,c) where a is at the head of the list, and a request sequence c,b,c,b. An optimal offline algorithm using only free exchanges would cost 9 (3+3+2+1), whereas an optimal offline algorithm using only paid exchanges would cost 8. So, we cannot get away with just using free transpositions for the optimum offline algorithm.

The optimum list update problem was proven to be NP-hard by ( Ambühl 2000 ).

Online algorithm

An online algorithm ALG has a competitive ratio c if for any input it performs at least as good as c times worse than OPT. i.e. if there exists an such that for all finite length request sequences , . Online algorithms can either be deterministic or randomized and it turns out that randomization in this case can truly help against oblivious adversaries.

Deterministic

Most deterministic algorithms are variants of these three algorithms :

MTF (Move to front)
After accessing an item move it to the front of the list without changing the order of other items
TRANS (Transpose)
After accessing an item, transpose it with the immediately preceding item.
FC (Frequency Count)
For each item maintain a frequency count of the number of accesses to it - when an element is accessed increase its frequency count and reorder the list in the decreasing order of frequencies.

Observe that all these use just free transpositions. It turns out that both TRANS and FC are not competitive. In a classic result using Potential method analysis ( Sleator & Tarjan 1985 ) proved that MTF is 2-competitive. The proof does not require the explicit knowledge of OPT but instead counts the number of inversions i.e. elements occurring in opposite order in the lists of MTF and OPT.

Any deterministic algorithm has a lower bound of for a list of length l, and MTF is actually the optimum deterministic list update algorithm. The type of adversary doesn't matter in the case of deterministic algorithms, because the adversary can run a copy of the deterministic algorithm on their own to precompute the most disastrous sequence.

Randomized

Consider the following simple randomized algorithm :

BIT
For every item in the list, maintain a bit. Initialize all the bits uniformly and randomly to 0 or 1. When an item is accessed, flip the bit, and if it is 1 move it to the front, else don't.

This algorithm is barely random - it makes all its random choices in the beginning and not during the run. It turns out that BIT breaks the deterministic bound - it is better than MTF against oblivious adversaries. It is 7/4-competitive. There are other randomized algorithms, like COMB, that perform better than BIT. Boris Teia proved a lower bound of 1.5 for any randomized list update algorithm. [3]

The list update problem where elements maybe inserted and deleted is called the dynamic list update problem, as opposed to the static list update problem where only accessing list elements are allowed. The upper bound of holds for the dynamic model as well.

There are different cost models as well. In the usual full cost model, an access to an element located at a position i costs i, but the last comparison is inevitable for any algorithm, i.e. there are i-1 elements standing in the way of i. In the partial cost model, these final comparison costs totaling to the number of elements in the request sequence are ignored. For the costs of paid transpositions other than unity, Pd models are used.

See also

Notes

  1. N. Reingold and J. Westbrook. Optimum Offline algorithms for the list update and paging rules. Technical Report YALE/DcS/TR-805, Yale University, New Haven, Conn, August 1990
  2. Divakaran, Srikrishnan (2014-04-30). "An Optimal Offline Algorithm for List Update". arXiv: 1404.7638 [cs.DS].
  3. Teia, Boris, A lower bound for randomized list update algorithms, Inf. Process. Lett. (1993), pp. 5--9

Related Research Articles

In computer science, an online algorithm is one that can process its input piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start.

<span class="mw-page-title-main">Permutation</span> Mathematical version of an order change

In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or process of changing the linear order of an ordered set.

The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media, and technology mapping in FPGA semiconductor chip design.

<span class="mw-page-title-main">Deterministic finite automaton</span> Finite-state machine

In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automaton (DFSA)—is a finite-state machine that accepts or rejects a given string of symbols, by running through a state sequence uniquely determined by the string. Deterministic refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines, Warren McCulloch and Walter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.

In a computer operating system that uses paging for virtual memory management, page replacement algorithms decide which memory pages to page out, sometimes called swap out, or write to disk, when a page of memory needs to be allocated. Page replacement happens when a requested page is not in memory and a free page cannot be used to satisfy the allocation, either because there are none, or because the number of free pages is lower than some threshold.

Probabilistic encryption is the use of randomness in an encryption algorithm, so that when encrypting the same message several times it will, in general, yield different ciphertexts. The term "probabilistic encryption" is typically used in reference to public key encryption algorithms; however various symmetric key encryption algorithms achieve a similar property, and stream ciphers such as Freestyle which are inherently random. To be semantically secure, that is, to hide even partial information about the plaintext, an encryption algorithm must be probabilistic.

In computing, a cache-oblivious algorithm is an algorithm designed to take advantage of a processor cache without having the size of the cache as an explicit parameter. An optimal cache-oblivious algorithm is a cache-oblivious algorithm that uses the cache optimally. Thus, a cache-oblivious algorithm is designed to perform well, without modification, on multiple machines with different cache sizes, or for a memory hierarchy with different levels of cache having different sizes. Cache-oblivious algorithms are contrasted with explicit loop tiling, which explicitly breaks a problem into blocks that are optimally sized for a given cache.

In information theory and computer science, the Damerau–Levenshtein distance is a string metric for measuring the edit distance between two sequences. Informally, the Damerau–Levenshtein distance between two words is the minimum number of operations required to change one word into the other.

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

A self-organizing list is a list that reorders its elements based on some self-organizing heuristic to improve average access time. The aim of a self-organizing list is to improve efficiency of linear search by moving more frequently accessed items towards the head of the list. A self-organizing list achieves near constant time for element access in the best case. A self-organizing list uses a reorganizing algorithm to adapt to various query distributions at runtime.

In mathematics, the theory of optimal stopping or early stopping is concerned with the problem of choosing a time to take a particular action, in order to maximise an expected reward or minimise an expected cost. Optimal stopping problems can be found in areas of statistics, economics, and mathematical finance. A key example of an optimal stopping problem is the secretary problem. Optimal stopping problems can often be written in the form of a Bellman equation, and are therefore often solved using dynamic programming.

The k-server problem is a problem of theoretical computer science in the category of online algorithms, one of two abstract problems on metric spaces that are central to the theory of competitive analysis. In this problem, an online algorithm must control the movement of a set of kservers, represented as points in a metric space, and handle requests that are also in the form of points in the space. As each request arrives, the algorithm must determine which server to move to the requested point. The goal of the algorithm is to keep the total distance all servers move small, relative to the total distance the servers could have moved by an optimal adversary who knows in advance the entire sequence of requests.

In computer science, an online algorithm measures its competitiveness against different adversary models. For deterministic algorithms, the adversary is the same as the adaptive offline adversary. For randomized online algorithms competitiveness can depend upon the adversary model used.

Competitive analysis is a method invented for analyzing online algorithms, in which the performance of an online algorithm is compared to the performance of an optimal offline algorithm that can view the sequence of requests in advance. An algorithm is competitive if its competitive ratio—the ratio between its performance and the offline algorithm's performance—is bounded. Unlike traditional worst-case analysis, where the performance of an algorithm is measured only for "hard" inputs, competitive analysis requires that an algorithm perform well both on hard and easy inputs, where "hard" and "easy" are defined by the performance of the optimal offline algorithm.

Task systems are mathematical objects used to model the set of possible configuration of online algorithms. They were introduced by Borodin, Linial and Saks (1992) to model a variety of online problems. A task system determines a set of states and costs to change states. Task systems obtain as input a sequence of requests such that each request assigns processing times to the states. The objective of an online algorithm for task systems is to create a schedule that minimizes the overall cost incurred due to processing the tasks with respect to the states and due to the cost to change states.

Job-shop scheduling, the job-shop problem (JSP) or job-shop scheduling problem (JSSP) is an optimization problem in computer science and operations research. It is a variant of optimal job scheduling. In a general job scheduling problem, we are given n jobs J1J2, ..., Jn of varying processing times, which need to be scheduled on m machines with varying processing power, while trying to minimize the makespan – the total length of the schedule. In the specific variant known as job-shop scheduling, each job consists of a set of operationsO1O2, ..., On which need to be processed in a specific order. Each operation has a specific machine that it needs to be processed on and only one operation in a job can be processed at a given time. A common relaxation is the flexible job shop, where each operation can be processed on any machine of a given set.

In computer science, the ski rental problem is a name given to a class of problems in which there is a choice between continuing to pay a repeating cost or paying a one-time cost which eliminates or reduces the repeating cost.

In computer science, one approach to the dynamic optimality problem on online algorithms for binary search trees involves reformulating the problem geometrically, in terms of augmenting a set of points in the plane with as few additional points as possible in order to avoid rectangles with only two points on their boundary.

The deterministic rendezvous problem is a problem in computer science and robotics that involves two or more robots or players that must find each other by following a predetermined set of instructions. The goal is for the robots to meet at a specific location, or rendezvous point, without knowing the location of the other robot or robots.

In computer science, the list-labeling problem involves maintaining a totally ordered set S supporting the following operations:

References