History
In the late 1920s, the mathematicians Gabriel Sudan and Wilhelm Ackermann, students of David Hilbert, were studying the foundations of computation. Both Sudan and Ackermann are credited with discovering total computable functions (termed simply "recursive" in some references) that are not primitive recursive. Sudan published the lesser-known Sudan function, then shortly afterwards and independently, in 1928, Ackermann published his function
(from Greek, the letter phi ). Ackermann's three-argument function,
, is defined such that for
, it reproduces the basic operations of addition, multiplication, and exponentiation as

and for
it extends these basic operations in a way that can be compared to the hyperoperations:

(Aside from its historic role as a total-computable-but-not-primitive-recursive function, Ackermann's original function is seen to extend the basic arithmetic operations beyond exponentiation, although not as seamlessly as do variants of Ackermann's function that are specifically designed for that purpose—such as Goodstein's hyperoperation sequence.)
In On the Infinite, David Hilbert hypothesized that the Ackermann function was not primitive recursive, but it was Ackermann, Hilbert's personal secretary and former student, who actually proved the hypothesis in his paper On Hilbert's Construction of the Real Numbers.
Rózsa Péter and Raphael Robinson later developed a two-variable version of the Ackermann function that became preferred by almost all authors.
The generalized hyperoperation sequence, e.g.
, is a version of the Ackermann function as well.
In 1963 R.C. Buck based an intuitive two-variable [n 1] variant
on the hyperoperation sequence:

Compared to most other versions, Buck's function has no unessential offsets:

Many other versions of Ackermann function have been investigated.
Computation
The recursive definition of the Ackermann function can naturally be transposed to a term rewriting system (TRS).
TRS, based on 2-ary function
The definition of the 2-ary Ackermann function leads to the obvious reduction rules

Example
Compute 
The reduction sequence is [n 3]
Leftmost-outermost (one-step) strategy: | Leftmost-innermost (one-step) strategy: |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
|
To compute
one can use a stack, which initially contains the elements
.
Then repeatedly the two top elements are replaced according to the rules [n 4]

Schematically, starting from
:
WHILE stackLength <> 1 { POP 2 elements; PUSH 1 or 2 or 3 elements, applying the rules r1, r2, r3 }
The pseudocode is published in Grossman & Zeitman (1988).
For example, on input
,
the stack configurations | reflect the reduction [n 5] |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
Remarks
- The leftmost-innermost strategy is implemented in 225 computer languages on Rosetta Code.
- For all
the computation of
takes no more than
steps. - Grossman & Zeitman (1988) pointed out that in the computation of
the maximum length of the stack is
, as long as
. Their own algorithm, inherently iterative, computes
within
time and within
space.
TRS, based on iterated 1-ary function
The definition of the iterated 1-ary Ackermann functions leads to different reduction rules

As function composition is associative, instead of rule r6 one can define

Like in the previous section the computation of
can be implemented with a stack.
Initially the stack contains the three elements
.
Then repeatedly the three top elements are replaced according to the rules [n 4]

Schematically, starting from
:
WHILE stackLength <> 1 { POP 3 elements; PUSH 1 or 3 or 5 elements, applying the rules r4, r5, r6; }
Example
On input
the successive stack configurations are

The corresponding equalities are

When reduction rule r7 is used instead of rule r6, the replacements in the stack will follow

The successive stack configurations will then be

The corresponding equalities are

Remarks
- On any given input the TRSs presented so far converge in the same number of steps. They also use the same reduction rules (in this comparison the rules r1, r2, r3 are considered "the same as" the rules r4, r5, r6/r7 respectively). For example, the reduction of
converges in 14 steps: 6 × r1, 3 × r2, 5 × r3. The reduction of
converges in the same 14 steps: 6 × r4, 3 × r5, 5 × r6/r7. The TRSs differ in the order in which the reduction rules are applied. - When
is computed following the rules {r4, r5, r6}, the maximum length of the stack stays below
. When reduction rule r7 is used instead of rule r6, the maximum length of the stack is only
. The length of the stack reflects the recursion depth. As the reduction according to the rules {r4, r5, r7} involves a smaller maximum depth of recursion, [n 6] this computation is more efficient in that respect.
TRS, based on hyperoperators
As Sundblad (1971) — or Porto & Matos (1980) — showed explicitly, the Ackermann function can be expressed in terms of the hyperoperation sequence:

or, after removal of the constant 2 from the parameter list, in terms of Buck's function

Buck's function
, a variant of Ackermann function by itself, can be computed with the following reduction rules:
Instead of rule b6 one can define the rule
To compute the Ackermann function it suffices to add three reduction rules

These rules take care of the base case A(0,n), the alignment (n+3) and the fudge (-3).
Example
Compute 
using reduction rule : [n 5] | using reduction rule : [n 5] |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
|  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
 |  |
The matching equalities are
- when the TRS with the reduction rule
is applied:

- when the TRS with the reduction rule
is applied:
Remarks
- The computation of
according to the rules {b1 - b5, b6, r8 - r10} is deeply recursive. The maximum depth of nested
s is
. The culprit is the order in which iteration is executed:
. The first
disappears only after the whole sequence is unfolded. - The computation according to the rules {b1 - b5, b7, r8 - r10} is more efficient in that respect. The iteration
simulates the repeated loop over a block of code. [n 7] The nesting is limited to
, one recursion level per iterated function. Meyer & Ritchie (1967) showed this correspondence. - These considerations concern the recursion depth only. Either way of iterating leads to the same number of reduction steps, involving the same rules (when the rules b6 and b7 are considered "the same"). The reduction of
for instance converges in 35 steps: 12 × b1, 4 × b2, 1 × b3, 4 × b5, 12 × b6/b7, 1 × r9, 1 × r10. The modus iterandi only affects the order in which the reduction rules are applied. - A real gain of execution time can only be achieved by not recalculating subresults over and over again. Memoization is an optimization technique where the results of function calls are cached and returned when the same inputs occur again. See for instance Ward (1993). Grossman & Zeitman (1988) published a cunning algorithm which computes
within
time and within
space.
Huge numbers
To demonstrate how the computation of
results in many steps and in a large number: [n 5]
