Memory-bound function

Last updated

Memory bound refers to a situation in which the time to complete a given computational problem is decided primarily by the amount of free memory required to hold the working data. This is in contrast to algorithms that are compute-bound, where the number of elementary computation steps is the deciding factor.

Contents

Memory and computation boundaries can sometimes be traded against each other, e.g. by saving and reusing preliminary results or using lookup tables.

Memory-bound functions and memory functions

Memory-bound functions and memory functions are related in that both involve extensive memory access, but a distinction exists between the two.

Memory functions use a dynamic programming technique called memoization in order to relieve the inefficiency of recursion that might occur. It is based on the simple idea of calculating and storing solutions to subproblems so that the solutions can be reused later without recalculating the subproblems again. The best known example that takes advantage of memoization is an algorithm that computes the Fibonacci numbers. The following pseudocode uses recursion and memoization, and runs in linear CPU time:

Fibonacci(n){fori=0ton-1results[i]=-1// -1 means undefinedreturnFibonacci_Results(results,n);}Fibonacci_Results(results,n){if(results[n]!=-1)// If it has been solved before,returnresults[n]// look it up.if(n==0)val=0elseif(n==1)val=1elseval=Fibonacci_Results(results,n-2)+Fibonacci_Results(results,n-1)results[n]=val// Save this result for re-use.returnval}

Compare the above to an algorithm that uses only recursion, and runs in exponential CPU time:

Recursive_Fibonacci(n){if(n==0)return0if(n==1)return1returnRecursive_Fibonacci(n-1)+Recursive_Fibonacci(n-2)}

While the recursive-only algorithm is simpler and more elegant than the algorithm that uses recursion and memoization, the latter has a significantly lower time complexity than the former.

The term "memory-bound function" has surfaced only recently and is used principally to describe a function that uses XOR and consists of a series of computations in which each computation depends on the previous computation. Whereas memory functions have long been an important actor in improving time complexity, memory-bound functions have seen far fewer applications. Recently[ when? ], however, scientists have proposed a method using memory-bound functions as a means to discourage spammers from abusing resources, which could be a major breakthrough in that area.

Using memory-bound functions to prevent spam

Memory-bound functions might be useful in a proof-of-work system that could deter spam, which has become a problem of epidemic proportions on the Internet.

In 1992, IBM research scientists Cynthia Dwork and Moni Naor published a paper at CRYPTO 1992 titled Pricing via Processing or Combatting Junk Mail, [1] suggesting a possibility of using CPU-bound functions to deter abusers from sending spam. The scheme was based on the idea that computer users are much more likely to abuse a resource if the cost of abusing the resource is negligible: the underlying reason spam has become so rampant is that sending an e-mail has minuscule cost for spammers.

Dwork and Naor proposed that spamming might be reduced by injecting an additional cost in the form of an expensive CPU computation: CPU-bound functions would consume CPU resources at the sender's machine for each message, thus preventing huge amounts of spam from being sent in a short period.

The basic scheme that protects against abuses is as follows:
Let S be sender, R be recipient, and M be an e-mail. If R has agreed beforehand to receive e-mail from S, then M is transmitted in the usual way. Otherwise, S computes some function G(M) and sends (M, G(M)) to R. R checks if what it receives from S is of the form (M, G(M)). If yes, R accepts M. Otherwise, R rejects M. The figure on the right depicts cases in which there were no prior agreements.

The function G() is selected such that the verification by R is relatively fast (taking a millisecond) and such that the computation by S is somewhat slow (involving at least several seconds). Therefore, S will be discouraged from sending M to multiple recipients with no prior agreements: the cost in terms of both time and computing resources of computing G() repeatedly will become very prohibitive for a spammer who intends to send many millions of e-mails.

The major problem of using the above scheme is that fast CPUs compute much faster than slow CPUs. Further, higher-end computer systems also have sophisticated pipelines and other advantageous features that facilitate computations. As a result, a spammer with a state-of-the-art system will hardly be affected by such deterrence while a typical user with a mediocre system will be adversely affected. If a computation takes a few seconds on a new PC, it may take a minute on an old PC, and several minutes on a PDA, which might be a nuisance for users of old PCs, but probably unacceptable for users of PDAs. The disparity in client CPU speed constitutes one of the prominent roadblocks to widespread adoption of any scheme based on a CPU-bound function. Therefore, researchers are concerned with finding functions that most computer systems will evaluate at about the same speed, so that high-end systems might evaluate these functions somewhat faster than low-end systems (2–10 times faster, but not 10–100 times faster) as CPU disparities might imply. These ratios are "egalitarian" enough for the intended applications: the functions are effective in discouraging abuses and do not add a prohibitive delay on legitimate interactions, across a wide range of systems.

The new egalitarian approach is to rely on memory-bound functions. As stated before, a memory-bound function is a function whose computation time is dominated by the time spent accessing memory. A memory-bound function accesses locations in a large region of memory in an unpredictable way, in such a way that using caches are not effective. In recent years, the speed of CPU has grown drastically, but there has been comparatively small progress in developing faster main memory. Since the ratios of memory latencies of machines built in the last five years is typically no greater than two, and almost always less than four, the memory-bound function will be egalitarian to most systems for the foreseeable future.

See also

Related Research Articles

In computability theory, the Church–Turing thesis is a thesis about the nature of computable functions. It states that a function on the natural numbers can be calculated by an effective method if and only if it is computable by a Turing machine. The thesis is named after American mathematician Alonzo Church and the British mathematician Alan Turing. Before the precise definition of computable function, mathematicians often used the informal term effectively calculable to describe functions that are computable by paper-and-pencil methods. In the 1930s, several independent attempts were made to formalize the notion of computability:

In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluations.

<span class="mw-page-title-main">Recursion</span> Process of repeating items in a self-similar way

Recursion occurs when the definition of a concept or process depends on a simpler version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics and computer science, where a function being defined is applied within its own definition. While this apparently defines an infinite number of instances, it is often done in such a way that no infinite loop or infinite chain of references can occur.

<span class="mw-page-title-main">Theory of computation</span> Academic subfield of computer science

In theoretical computer science and mathematics, the theory of computation is the branch that deals with what problems can be solved on a model of computation, using an algorithm, how efficiently they can be solved or to what degree. The field is divided into three major branches: automata theory and formal languages, computability theory, and computational complexity theory, which are linked by the question: "What are the fundamental capabilities and limitations of computers?".

In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algorithm can be measured based on the usage of different resources. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.

Computability theory, also known as recursion theory, is a branch of mathematical logic, computer science, and the theory of computation that originated in the 1930s with the study of computable functions and Turing degrees. The field has since expanded to include the study of generalized computability and definability. In these areas, computability theory overlaps with proof theory and effective descriptive set theory.

In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.

Hashcash is a proof-of-work system used to limit E-mail spam and denial-of-service attacks. Hashcash was proposed in 1997 by Adam Back and described more formally in Back's 2002 paper "Hashcash - A Denial of Service Counter-Measure".

The Penny Black Project is a Microsoft Research project that tries to find effective and practical ways of fighting spam. Because identifying spams consumes a recipient's time, the idea is to make the sender of emails "pay" a certain amount for sending them. The currency or the mode of payment could be CPU cycles, Turing tests or memory cycles. Such a payment would limit spammers' ability to send out large quantities of emails quickly.

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again. Memoization has also been used in other contexts, such as in simple mutually recursive descent parsing. It is a type of caching, distinct from other forms of caching such as buffering and page replacement. In the context of some logic programming languages, memoization is also known as tabling.

Computable functions are the basic objects of study in computability theory. Computable functions are the formalized analogue of the intuitive notion of algorithms, in the sense that a function is computable if there exists an algorithm that can do the job of the function, i.e. given an input of the function domain it can return the corresponding output. Computable functions are used to discuss computability without referring to any concrete model of computation such as Turing machines or register machines. Any definition, however, must make reference to some specific model of computation but all valid definitions yield the same class of functions. Particular models of computability that give rise to the set of computable functions are the Turing-computable functions and the general recursive functions.

In computer science, a problem is said to have overlapping subproblems if the problem can be broken down into subproblems which are reused several times or a recursive algorithm for the problem solves the same subproblem over and over rather than always generating new subproblems.

In computer science, corecursion is a type of operation that is dual to recursion. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, corecursion works synthetically, starting from a base case and building it up, iteratively producing data further removed from a base case. Put simply, corecursive algorithms use the data that they themselves produce, bit by bit, as they become available, and needed, to produce further bits of data. A similar but distinct concept is generative recursion which may lack a definite "direction" inherent in corecursion and recursion.

Proof of work (PoW) is a form of cryptographic proof in which one party proves to others that a certain amount of a specific computational effort has been expended. Verifiers can subsequently confirm this expenditure with minimal effort on their part. The concept was invented by Moni Naor and Cynthia Dwork in 1993 as a way to deter denial-of-service attacks and other service abuses such as spam on a network by requiring some work from a service requester, usually meaning processing time by a computer. The term "proof of work" was first coined and formalized in a 1999 paper by Markus Jakobsson and Ari Juels.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

Algorithm characterizations are attempts to formalize the word algorithm. Algorithm does not have a generally accepted formal definition. Researchers are actively working on this problem. This article will present some of the "characterizations" of the notion of "algorithm" in more detail.

In computability theory, super-recursive algorithms are a generalization of ordinary algorithms that are more powerful, that is, compute more than Turing machines. The term was introduced by Mark Burgin, whose book "Super-recursive algorithms" develops their theory and presents several mathematical models. Turing machines and other mathematical models of conventional algorithms allow researchers to find properties of recursive algorithms and their computations. In a similar way, mathematical models of super-recursive algorithms, such as inductive Turing machines, allow researchers to find properties of super-recursive algorithms and their computations.

<span class="mw-page-title-main">Cynthia Dwork</span> American computer scientist

Cynthia Dwork is an American computer scientist best known for her contributions to cryptography, distributed computing, and algorithmic fairness. She is one of the inventors of differential privacy and proof-of-work.

References

  1. Dwork, Cynthia; Naor, Moni (1992). "Pricing via Processing or Combatting Junk Mail". Advances in Cryptology — CRYPTO' 92. Lecture Notes in Computer Science. Vol. 740. pp. 139–147. doi: 10.1007/3-540-48071-4_10 . ISBN   978-3-540-57340-1. (updated version of same)