Shifting nth root algorithm

Last updated June 16, 2024

The shifting nth root algorithm is an algorithm for extracting the nth root of a positive real number which proceeds iteratively by shifting in n digits of the radicand, starting with the most significant, and produces one digit of the root on each iteration, in a manner similar to long division.

Algorithm

Notation

Let $B$ be the base of the number system you are using, and $n$ be the degree of the root to be extracted. Let $x$ be the radicand processed thus far, $y$ be the root extracted thus far, and $r$ be the remainder. Let $\alpha$ be the next $n$ digits of the radicand, and $\beta$ be the next digit of the root. Let $x'$ be the new value of $x$ for the next iteration, $y'$ be the new value of $y$ for the next iteration, and $r'$ be the new value of $r$ for the next iteration. These are all integers.

Invariants

At each iteration, the invariant $y^{n}+r=x$ will hold. The invariant $(y+1)^{n}>x$ will hold. Thus $y$ is the largest integer less than or equal to the $n$ th root of $x$ , and $r$ is the remainder.

Initialization

The initial values of $x,y$ , and $r$ should be 0. The value of $\alpha$ for the first iteration should be the most significant aligned block of $n$ digits of the radicand. An aligned block of $n$ digits means a block of digits aligned so that the decimal point falls between blocks. For example, in 123.4 the most significant aligned block of two digits is 01, the next most significant is 23, and the third most significant is 40.

Main loop

On each iteration we shift in $n$ digits of the radicand, so we have $x'=B^{n}x+\alpha$ and we produce one digit of the root, so we have $y'=By+\beta$ . The first invariant implies that $r'=x'-y'^{n}$ . We want to choose $\beta$ so that the invariants described above hold. It turns out that there is always exactly one such choice, as will be proved below.

Proof of existence and uniqueness of $\beta$

By definition of a digit, $0\leq \beta <B$ , and by definition of a block of digits, $0\leq \alpha <B^{n}$

The first invariant says that:

x'=y'^{n}+r'

or

B^{n}x+\alpha =(By+\beta )^{n}+r'.

So, pick the largest integer $\beta$ such that

(By+\beta )^{n}\leq B^{n}x+\alpha .

Such a $\beta$ always exists, since $0\leq \beta <B$ and if $\beta =0$ then $B^{n}y^{n}\leq B^{n}x+\alpha$ , but since $y^{n}\leq x$ , this is always true for $\beta =0$ . Thus, there will always be a $\beta$ that satisfies the first invariant

Now consider the second invariant. It says:

(y'+1)^{n}>x'

or

(By+\beta +1)^{n}>B^{n}x+\alpha .

Now, if $\beta$ is not the largest admissible $\beta$ for the first invariant as described above, then $\beta +1$ is also admissible, and we have

(By+\beta +1)^{n}\leq B^{n}x+\alpha .

This violates the second invariant, so to satisfy both invariants we must pick the largest $\beta$ allowed by the first invariant. Thus we have proven the existence and uniqueness of $\beta$ .

To summarize, on each iteration:

Let $\alpha$ be the next aligned block of digits from the radicand
Let $x'=B^{n}x+\alpha$
Let $\beta$ be the largest $\beta$ such that $(By+\beta )^{n}\leq B^{n}x+\alpha$
Let $y'=By+\beta$
Let $r'=x'-y'^{n}$

Now, note that $x=y^{n}+r$ , so the condition

(By+\beta )^{n}\leq B^{n}x+\alpha

is equivalent to

(By+\beta )^{n}-B^{n}y^{n}\leq B^{n}r+\alpha

and

r'=x'-y'^{n}=B^{n}x+\alpha -(By+\beta )^{n}

is equivalent to

r'=B^{n}r+\alpha -((By+\beta )^{n}-B^{n}y^{n}).

Thus, we do not actually need $x$ , and since $r=x-y^{n}$ and $x<(y+1)^{n}$ , $r<(y+1)^{n}-y^{n}$ or $r<ny^{n-1}+O(y^{n-2})$ , or $r<nx^{{n-1} \over n}+O(x^{{n-2} \over n})$ , so by using $r$ instead of $x$ we save time and space by a factor of 1/ $n$ . Also, the $B^{n}y^{n}$ we subtract in the new test cancels the one in $(By+\beta )^{n}$ , so now the highest power of $y$ we have to evaluate is $y^{n-1}$ rather than $y^{n}$ .

Summary

Initialize $r$ and $y$ to 0.
Repeat until desired precision is obtained:
1. Let $\alpha$ be the next aligned block of digits from the radicand.
2. Let $\beta$ be the largest $\beta$ such that $(By+\beta )^{n}-B^{n}y^{n}\leq B^{n}r+\alpha .$
3. Let $y'=By+\beta$ .
4. Let $r'=B^{n}r+\alpha -((By+\beta )^{n}-B^{n}y^{n}).$
5. Assign $y\leftarrow y'$ and $r\leftarrow r'.$
$y$ is the largest integer such that $y^{n}<xB^{k}$ , and $y^{n}+r=xB^{k}$ , where $k$ is the number of digits of the radicand after the decimal point that have been consumed (a negative number if the algorithm has not reached the decimal point yet).

Paper-and-pencil nth roots

As noted above, this algorithm is similar to long division, and it lends itself to the same notation:

1.  44224     —————————————————————— _ 3/ 3.000000000000000  \/  1                        = 3(10×0)²×1     +3(10×0)×1²     +1³      —      2 000      1 744                    = 3(10×1)²×4     +3(10×1)×4²     +4³      —————        256 000        241 984                = 3(10×14)²×4    +3(10×14)×4²    +4³        ———————         14 016 000         12 458 888            = 3(10×144)²×2   +3(10×144)×2²   +2³         ——————————          1 557 112 000          1 247 791 448        = 3(10×1442)²×2  +3(10×1442)×2²  +2³          —————————————            309 320 552 000            249 599 823 424    = 3(10×14422)²×4 +3(10×14422)×4² +4³            ———————————————             59 720 728 576

Note that after the first iteration or two the leading term dominates the $(By+\beta )^{n}-B^{n}y^{n}$ , so we can get an often correct first guess at $\beta$ by dividing $B^{n}r+\alpha$ by $nB^{n-1}y^{n-1}$ .

Performance

On each iteration, the most time-consuming task is to select $\beta$ . We know that there are $B$ possible values, so we can find $\beta$ using $O(\log(B))$ comparisons. Each comparison will require evaluating $(By+\beta )^{n}-B^{n}y^{n}$ . In the kth iteration, $y$ has $k$ digits, and the polynomial can be evaluated with $2n-4$ multiplications of up to $k(n-1)$ digits and $n-2$ additions of up to $k(n-1)$ digits, once we know the powers of $y$ and $\beta$ up through $n-1$ for $y$ and $n$ for $\beta$ . $\beta$ has a restricted range, so we can get the powers of $\beta$ in constant time. We can get the powers of $y$ with $n-2$ multiplications of up to $k(n-1)$ digits. Assuming $n$ -digit multiplication takes time $O(n^{2})$ and addition takes time $O(n)$ , we take time $O(k^{2}n^{2})$ for each comparison, or time $O(k^{2}n^{2}\log(B))$ to pick $\beta$ . The remainder of the algorithm is addition and subtraction that takes time $O(k)$ , so each iteration takes $O(k^{2}n^{2}\log(B))$ . For all $k$ digits, we need time $O(k^{3}n^{2}\log(B))$ .

The only internal storage needed is $r$ , which is $O(k)$ digits on the kth iteration. That this algorithm does not have bounded memory usage puts an upper bound on the number of digits which can be computed mentally, unlike the more elementary algorithms of arithmetic. Unfortunately, any bounded memory state machine with periodic inputs can only produce periodic outputs, so there are no such algorithms which can compute irrational numbers from rational ones, and thus no bounded memory root extraction algorithms.

Note that increasing the base increases the time needed to pick $\beta$ by a factor of $O(\log {B})$ , but decreases the number of digits needed to achieve a given precision by the same factor, and since the algorithm is cubic time in the number of digits, increasing the base gives an overall speedup of $O(\log ^{2}{B})$ . When the base is larger than the radicand, the algorithm degenerates to binary search, so it follows that this algorithm is not useful for computing roots with a computer, as it is always outperformed by much simpler binary search, and has the same memory complexity.

Examples

Square root of 2 in binary

      1. 0  1  1  0  1     ------------------ _  / 10.00 00 00 00 00     1  \/   1                  + 1      -----               ----       1 00                100          0               +  0      --------            -----       1 00 00             1001         10 01            +   1      -----------         ------          1 11 00          10101          1 01 01         +    1          ----------      -------             1 11 00       101100                   0      +     0             ----------   --------             1 11 00 00    1011001             1 01 10 01          1             ----------                1 01 11 remainder

Square root of 3

     1. 7  3  2  0  5     ---------------------- _  / 3.00 00 00 00 00  \/  1 = 20×0×1+1^2      -      2 00      1 89 = 20×1×7+7^2 (27 x 7)      ----        11 00        10 29 = 20×17×3+3^2  (343 x 3)        -----           71 00           69 24 = 20×173×2+2^2 (3462 x 2)           -----            1 76 00                  0 = 20×1732×0+0^2 (34640 x 0)            -------            1 76 00 00            1 73 20 25 = 20×17320×5+5^2 (346405 x 5)            ----------               2 79 75

Cube root of 5

     1.  7   0   9   9   7     ---------------------- _ 3/ 5. 000 000 000 000 000  \/  1 = 300×(0^2)×1+30×0×(1^2)+1^3      -      4 000      3 913 = 300×(1^2)×7+30×1×(7^2)+7^3      -----         87 000              0 = 300×(17^2)×0+30×17×(0^2)+0^3        -------         87 000 000         78 443 829 = 300×(170^2)×9+30×170×(9^2)+9^3         ----------          8 556 171 000          7 889 992 299 = 300×(1709^2)×9+30×1709×(9^2)+9^3          -------------            666 178 701 000            614 014 317 973 = 300×(17099^2)×7+30×17099×(7^2)+7^3            ---------------             52 164 383 027

Fourth root of 7

     1.   6    2    6    5    7     --------------------------- _ 4/ 7.0000 0000 0000 0000 0000  \/  1 = 4000×(0^3)×1+600×(0^2)×(1^2)+40×0×(1^3)+1^4      -      6 0000      5 5536 = 4000×(1^3)×6+600×(1^2)×(6^2)+40×1×(6^3)+6^4      ------        4464 0000        3338 7536 = 4000×(16^3)×2+600×(16^2)×(2^2)+40×16×(2^3)+2^4        ---------        1125 2464 0000        1026 0494 3376 = 4000×(162^3)×6+600×(162^2)×(6^2)+40×162×(6^3)+6^4        --------------          99 1969 6624 0000          86 0185 1379 0625 = 4000×(1626^3)×5+600×(1626^2)×(5^2)+          -----------------   40×1626×(5^3)+5^4          13 1784 5244 9375 0000          12 0489 2414 6927 3201 = 4000×(16265^3)×7+600×(16265^2)×(7^2)+          ----------------------   40×16265×(7^3)+7^4           1 1295 2830 2447 6799

External links

Why the square root algorithm works "Home School Math". Also related pages giving examples of the long-division-like pencil and paper method for square roots.
Reflections on The Square Root of Two "Medium". With an example of a C++ implementation.

Related Research Articles

In arithmetic, long division is a standard division algorithm suitable for dividing multi-digit Hindu-Arabic numerals that is simple enough to perform by hand. It breaks down a division problem into a series of easier steps.

Lagrange's four-square theorem, also known as Bachet's conjecture, states that every natural number can be represented as a sum of four non-negative integer squares. That is, the squares form an additive basis of order four.

Multi-index notation is a mathematical notation that simplifies formulas used in multivariable calculus, partial differential equations and the theory of distributions, by generalising the concept of an integer index to an ordered tuple of indices.

In geometry, Euler's rotation theorem states that, in three-dimensional space, any displacement of a rigid body such that a point on the rigid body remains fixed, is equivalent to a single rotation about some axis that runs through the fixed point. It also means that the composition of two rotations is also a rotation. Therefore the set of rotations has a group structure, known as a rotation group.

In mathematics, the Smith normal form is a normal form that can be defined for any matrix with entries in a principal ideal domain (PID). The Smith normal form of a matrix is diagonal, and can be obtained from the original matrix by multiplying on the left and right by invertible square matrices. In particular, the integers are a PID, so one can always calculate the Smith normal form of an integer matrix. The Smith normal form is very useful for working with finitely generated modules over a PID, and in particular for deducing the structure of a quotient of a free module. It is named after the Irish mathematician Henry John Stephen Smith.

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

In the field of mathematics, norms are defined for elements within a vector space. Specifically, when the vector space comprises matrices, such norms are referred to as matrix norms. Matrix norms differ from vector norms in that they must also interact with matrix multiplication.

Exponential smoothing or exponential moving average (EMA) is a rule of thumb technique for smoothing time series data using the exponential window function. Whereas in the simple moving average the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time. It is an easily learned and easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of time-series data.

In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines the descent direction by preconditioning the gradient with curvature information. It does so by gradually improving an approximation to the Hessian matrix of the loss function, obtained only from gradient evaluations via a generalized secant method.

In mathematics, the resultant of two polynomials is a polynomial expression of their coefficients that is equal to zero if and only if the polynomials have a common root, or, equivalently, a common factor. In some older texts, the resultant is also called the eliminant.

In mathematics, a real or complex-valued function f on d-dimensional Euclidean space satisfies a Hölder condition, or is Hölder continuous, when there are real constants C ≥ 0, $> 0, such that$

In computational complexity theory, PostBQP is a complexity class consisting of all of the computational problems solvable in polynomial time on a quantum Turing machine with postselection and bounded error.

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In statistics, the generalized Dirichlet distribution (GD) is a generalization of the Dirichlet distribution with a more general covariance structure and almost twice the number of parameters. Random vectors with a GD distribution are completely neutral.

A self-concordant function is a function satisfying a certain differential inequality, which makes it particularly easy for optimization using Newton's method A self-concordant barrier is a particular self-concordant function, that is also a barrier function for a particular convex set. Self-concordant barriers are important ingredients in interior point methods for optimization.

The factorization of a linear partial differential operator (LPDO) is an important issue in the theory of integrability, due to the Laplace-Darboux transformations, which allow construction of integrable LPDEs. Laplace solved the factorization problem for a bivariate hyperbolic operator of the second order, constructing two Laplace invariants. Each Laplace invariant is an explicit polynomial condition of factorization; coefficients of this polynomial are explicit functions of the coefficients of the initial LPDO. The polynomial conditions of factorization are called invariants because they have the same form for equivalent operators.

In coding theory, list decoding is an alternative to unique decoding of error-correcting codes in the presence of many errors. If a code has relative distance $, then it is possible in principle to recover an encoded message when up to fraction of the codeword symbols are corrupted. But when error rate is greater than , this will not in general be possible. List decoding overcomes that issue by allowing the decoder to output a short list of messages that might have been encoded. List decoding can correct more than fraction of errors.$

In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed information about all its substrings. The suffix automaton of a string $is the smallest directed acyclic graph with a dedicated initial vertex and a set of "final" vertices, such that paths from the initial vertex to final vertices represent the suffixes of the string.$

In computational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (PAC) learning, where the learner is evaluated on its predictive power of a test set.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.