The Complexity of Songs

Last updated

"The Complexity of Songs" is a scholarly article by computer scientist Donald Knuth in 1977, [1] as an in-joke about computational complexity theory. The article capitalizes on the tendency of popular songs to devolve from long and content-rich ballads to highly repetitive texts with little or no meaningful content. [2] The article notes that a song of length N words may be produced remembering, e.g., only O(log N) words ("space complexity" of the song) or even less.

Contents

Article summary

Knuth writes that "our ancient ancestors invented the concept of refrain" to reduce the space complexity of songs, which becomes crucial when a large number of songs is to be committed to one's memory. Knuth's Lemma 1 states that if N is the length of a song, then the refrain decreases the song complexity to cN, where the factor c < 1. [1]

Knuth further demonstrates a way of producing songs with O() complexity, an approach "further improved by a Scottish farmer named O. MacDonald". [1]

More ingenious approaches yield songs of complexity O(), a class known as "m bottles of beer on the wall".

Finally, the progress during the 20th century—stimulated by the fact that "the advent of modern drugs has led to demands for still less memory"—leads to the ultimate improvement: Arbitrarily long songs with space complexity O(1) exist, e.g. a song defined by the recurrence relation [1]

'That's the way,' 'I like it,' , for all
'uh huh,' 'uh huh'

Further developments

Prof. Kurt Eisemann of San Diego State University in his letter to the Communications of the ACM [3] further improves the latter seemingly unbeatable estimate. He begins with an observation that for practical applications the value of the "hidden constant" c in the Big Oh notation may be crucial in making the difference between the feasibility and unfeasibility: for example a constant value of 1080 would exceed the capacity of any known device. He further notices that a technique has already been known in Mediaeval Europe whereby textual content of an arbitrary tune can be recorded basing on the recurrence relation , where , yielding the value of the big-Oh constant c equal to 2. However it turns out that another culture achieved the absolute lower bound of O(0). As Prof. Eisemann puts it:

"When the Mayflower voyagers first descended on these shores, the native Americans proud of their achievement in the theory of information storage and retrieval, at first welcomed the strangers with the complete silence. This was meant to convey their peak achievement in the complexity of songs, namely the demonstration that a limit as low as c = 0 is indeed obtainable."

However the Europeans were unprepared to grasp this notion, and the chiefs, in order to establish a common ground to convey their achievements later proceeded to demonstrate an approach described by the recurrent relation , where , with a suboptimal complexity given by c = 1. [2] [3]

The O(1) space complexity result was also implemented by Guy L. Steele, Jr., perhaps challenged by Knuth's article. [4] Dr. Steele's TELNET Song used a completely different algorithm based on exponential recursion, a parody on some implementations of TELNET. [5] [6] [7]

Darrah Chavey suggested that the complexity analysis of human songs can be a useful pedagogic device for teaching students complexity theory. [8]

The article "On Superpolylogarithmic Subexponential Functions" by Prof. Alan Sherman [9] writes that Knuth's article was seminal for analysis of a special class of functions.

Related Research Articles

<span class="mw-page-title-main">Binary search algorithm</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

Big <i>O</i> notation Describes limiting behavior of a function

Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by German mathematicians Paul Bachmann, Edmund Landau, and others, collectively called Bachmann–Landau notation or asymptotic notation. The letter O was chosen by Bachmann to stand for Ordnung, meaning the order of approximation.

<span class="mw-page-title-main">Shellsort</span> Sorting algorithm which uses multiple comparison intervals

Shellsort, also known as Shell sort or Shell's method, is an in-place comparison sort. It can be seen as either a generalization of sorting by exchange or sorting by insertion. The method starts by sorting pairs of elements far apart from each other, then progressively reducing the gap between elements to be compared. By starting with far apart elements, it can move some out-of-place elements into position faster than a simple nearest neighbor exchange. Donald Shell published the first version of this sort in 1959. The running time of Shellsort is heavily dependent on the gap sequence it uses. For many practical variants, determining their time complexity remains an open problem.

In computer science, the Floyd–Warshall algorithm is an algorithm for finding shortest paths in a directed weighted graph with positive or negative edge weights. A single execution of the algorithm will find the lengths of shortest paths between all pairs of vertices. Although it does not return details of the paths themselves, it is possible to reconstruct the paths with simple modifications to the algorithm. Versions of the algorithm can also be used for finding the transitive closure of a relation , or widest paths between all pairs of vertices in a weighted graph.

<span class="mw-page-title-main">Time complexity</span> Estimate of time taken for running an algorithm

In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random determined by the random bits; thus either the running time, or the output are random variables.

<span class="mw-page-title-main">Vertex cover</span> Subset of a graphs vertices, including at least one endpoint of every edge

In graph theory, a vertex cover of a graph is a set of vertices that includes at least one endpoint of every edge of the graph.

In computational complexity theory, the Cook–Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem is NP-complete. That is, it is in NP, and any problem in NP can be reduced in polynomial time by a deterministic Turing machine to the Boolean satisfiability problem.

<span class="mw-page-title-main">Suffix tree</span> Tree containing all suffixes of a given text

In computer science, a suffix tree is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations.

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972.

Jensen's device is a computer programming technique that exploits call by name. It was devised by Danish computer scientist Jørn Jensen, who worked with Peter Naur at Regnecentralen. They worked on the GIER ALGOL compiler, one of the earliest correct implementations of ALGOL 60. ALGOL 60 used call by name. During his Turing Award speech, Naur mentions his work with Jensen on GIER ALGOL.

In computer science, the shortest common supersequence of two sequences X and Y is the shortest sequence which has X and Y as subsequences. This is a problem closely related to the longest common subsequence problem. Given two sequences X = < x1,...,xm > and Y = < y1,...,yn >, a sequence U = < u1,...,uk > is a common supersequence of X and Y if items can be removed from U to produce X and Y.

In computational complexity theory, the parallel computation thesis is a hypothesis which states that the time used by a (reasonable) parallel machine is polynomially related to the space used by a sequential machine. The parallel computation thesis was set forth by Chandra and Stockmeyer in 1976.

<span class="mw-page-title-main">Knuth Prize</span>

The Donald E. Knuth Prize is a prize for outstanding contributions to the foundations of computer science, named after the American computer scientist Donald E. Knuth.

Richard Jay Lipton is an American computer scientist who is Associate Dean of Research, Professor, and the Frederick G. Storey Chair in Computing in the College of Computing at the Georgia Institute of Technology. He has worked in computer science theory, cryptography, and DNA computing.

In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes, typically just one. These algorithms are designed to operate with limited memory, generally logarithmic in the size of the stream and/or in the maximum value in the stream, and may also have limited processing time per item.

<span class="mw-page-title-main">NP-completeness</span> Complexity class

In computational complexity theory, a problem is NP-complete when:

  1. It is a decision problem, meaning that for any input to the problem, the output is either "yes" or "no".
  2. When the answer is "yes", this can be demonstrated through the existence of a short solution.
  3. The correctness of each solution can be verified quickly and a brute-force search algorithm can find a solution by trying all possible solutions.
  4. The problem can be used to simulate every other problem for which we can verify quickly that a solution is correct. In this sense, NP-complete problems are the hardest of the problems to which solutions can be verified quickly. If we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of every other problem to which a given solution can be easily verified.

In computational complexity theory, the exponential time hypothesis is an unproven computational hardness assumption that was formulated by Impagliazzo & Paturi (1999). It states that satisfiability of 3-CNF Boolean formulas cannot be solved in subexponential time, i.e., for all constant , where n is the number of variables in the formula. The exponential time hypothesis, if true, would imply that P ≠ NP, but it is a stronger statement. It implies that many computational problems are equivalent in complexity, in the sense that if one of them has a subexponential time algorithm then they all do, and that many known algorithms for these problems have optimal or near-optimal time complexity.

<span class="mw-page-title-main">Dense subgraph</span> Highly connected subgraph

In graph theory and computer science, a dense subgraph is a subgraph with many edges per vertex. This is formalized as follows: let G = (V, E) be an undirected graph and let S = (VS, ES) be a subgraph of G. Then the density of S is defined to be:

In mathematical logic, fixed-point logics are extensions of classical predicate logic that have been introduced to express recursion. Their development has been motivated by descriptive complexity theory and their relationship to database query languages, in particular to Datalog.

References

  1. 1 2 3 4 Knuth, Donald (Summer 1977). "The Complexity of Songs". SIGACT News. 9 (2): 17–24. doi: 10.1145/1008354.1008355 . S2CID   17533775. Reprinted in: Knuth, Donald (1984). "The Complexity of Songs". Communications of the ACM . 27 (4): 344–346. doi: 10.1145/358027.358042 . MR   0784131.
  2. 1 2 Steven Krantz (2005) "Mathematical Apocrypha Redux", ISBN   0-88385-554-2, pp.2, 3.
  3. 1 2 Kurt Eisemann, "Further Results on the Complexity of Songs", Communications of the ACM, vol 28 (1985), no. 3, p. 235.
  4. Peter G. Neumann, "A further view of the first quarter century" ,Communications of the ACM, Volume 27, Issue 4, April 1984, p. 343
  5. Guy L. Steele, Jr., "The Telnet Song", Communications of the ACM , April 1984
  6. Text of the TELNET Song (retrieved January 5, 2012)
  7. Telnet song in MIDI format
  8. Chavey, Darrah (1996). "Songs and the analysis of algorithms". Proceedings of the twenty-seventh SIGCSE technical symposium on Computer science education. pp. 4–8. doi:10.1145/236452.236475. ISBN   089791757X. S2CID   148247 . Retrieved 7 January 2013.{{cite book}}: |journal= ignored (help)
  9. Alan Sherman, "On Superpolylogarithmic Subexponential Functions" (PostScript), ACM SIGACT News, vol. 22, no. 1, 1991, p. 65