Elliptic curve point multiplication

Last updated

Elliptic curve scalar multiplication is the operation of successively adding a point along an elliptic curve to itself repeatedly. It is used in elliptic curve cryptography (ECC). The literature presents this operation as scalar multiplication, as written in Hessian form of an elliptic curve. A widespread name for this operation is also elliptic curve point multiplication, but this can convey the wrong impression of being a multiplication between two points.

Contents

Basics

Given a curve, E, defined by some equation in a finite field (such as E: y2 = x3 + ax + b), point multiplication is defined as the repeated addition of a point along that curve. Denote as nP = P + P + P + … + P for some scalar (integer) n and a point P = (x, y) that lies on the curve, E. This type of curve is known as a Weierstrass curve.

The security of modern ECC depends on the intractability of determining n from Q = nP given known values of Q and P if n is large (known as the elliptic curve discrete logarithm problem by analogy to other cryptographic systems). This is because the addition of two points on an elliptic curve (or the addition of one point to itself) yields a third point on the elliptic curve whose location has no immediately obvious relationship to the locations of the first two, and repeating this many times over yields a point nP that may be essentially anywhere. Intuitively, this is not dissimilar to the fact that if you had a point P on a circle, adding 42.57 degrees to its angle may still be a point "not too far" from P, but adding 1000 or 1001 times 42.57 degrees will yield a point that requires a bit more complex calculation to find the original angle. Reversing this process, i.e., given Q=nP and P, and determining n, can only be done by trying out all possible n—an effort that is computationally intractable if n is large.

Point operations

Elliptic curve point operations: Addition (shown in facet 1), doubling (facets 2 and 4) and negation (facet 3). ECClines.svg
Elliptic curve point operations: Addition (shown in facet 1), doubling (facets 2 and 4) and negation (facet 3).

There are three commonly defined operations for elliptic curve points: addition, doubling and negation.

Point at infinity

Point at infinity is the identity element of elliptic curve arithmetic. Adding it to any point results in that other point, including adding point at infinity to itself. That is:

Point at infinity is also written as 0.

Point negation

Point negation is finding such a point, that adding it to itself will result in point at infinity ().

For elliptic curves of the form E: y2 = x3 + ax + b, negation is a point with the same x coordinate but negated y coordinate:

Point addition

With 2 distinct points, P and Q, addition is defined as the negation of the point resulting from the intersection of the curve, E, and the straight line defined by the points P and Q, giving the point, R. [1]

Assuming the elliptic curve, E, is given by y2 = x3 + ax + b, this can be calculated as:

These equations are correct when neither point is the point at infinity, , and if the points have different x coordinates (they're not mutual inverses). This is important for the ECDSA verification algorithm where the hash value could be zero.

Point doubling

Where the points P and Q are coincident (at the same coordinates), addition is similar, except that there is no well-defined straight line through P, so the operation is closed using a limiting case, the tangent to the curve, E, at P.

This is calculated as above, taking derivatives (dE/dx)/(dE/dy): [2]

where a is from the defining equation of the curve, E, above.

Point multiplication

The straightforward way of computing a point multiplication is through repeated addition. However, there are more efficient approaches to computing the multiplication.

Double-and-add

The simplest method is the double-and-add method, [3] similar to square-and-multiply in modular exponentiation. The algorithm works as follows:

To compute sP, start with the binary representation for s: , where .

   let bits = bit_representation(s) # the vector of bits (from LSB to MSB) representing s    let res =  # point at infinity    let temp = P # track doubled P val    for bit in bits:         if bit == 1:                        res = res + temp # point add        temp = temp + temp # double    return res
   let bits = bit_representation(s) # the vector of bits (from LSB to MSB) representing s    let i = length(bits) - 2    let res = P    while (i >= 0): # traversing from second MSB to LSB        res = res + res # double        if bits[i] == 1:                        res = res + P # add        i = i - 1    return res

Note that both of the iterative methods above are vulnerable to timing analysis. See Montgomery Ladder below for an alternative approach.

  f(P, d) is      if d = 0 then          return 0                         # computation complete      else if d = 1 then          return P      else if d mod 2 = 1 then          return point_add(P, f(P, d - 1)) # addition when d is odd      else          return f(point_double(P), d / 2)   # doubling when d is even

where f is the function for multiplying, P is the coordinate to multiply, d is the number of times to add the coordinate to itself. Example: 100P can be written as 2(2[P + 2(2[2(P + 2P)])]) and thus requires six point double operations and two point addition operations. 100P would be equal to f(P, 100).

This algorithm requires log2(d) iterations of point doubling and addition to compute the full point multiplication. There are many variations of this algorithm such as using a window, sliding window, NAF, NAF-w, vector chains, and Montgomery ladder.

Windowed method

In the windowed version of this algorithm, [3] one selects a window size w and computes all values of for . The algorithm now uses the representation and becomes

  Q ← 0   for i from m to 0 do       Q ← point_double_repeat(Q, w)       if di> 0 then           Q ← point_add(Q, diP) # using pre-computed value of diP   return Q

This algorithm has the same complexity as the double-and-add approach with the benefit of using fewer point additions (which in practice are slower than doubling). Typically, the value of w is chosen to be fairly small making the pre-computation stage a trivial component of the algorithm. For the NIST recommended curves, is usually the best selection. The entire complexity for a n-bit number is measured as point doubles and point additions.

Sliding-window method

In the sliding-window version, we look to trade off point additions for point doubles. We compute a similar table as in the windowed version except we only compute the points for . Effectively, we are only computing the values for which the most significant bit of the window is set. The algorithm then uses the original double-and-add representation of .

  Q ← 0   for i from m downto 0 do       if di = 0 then           Q ← point_double(Q)       else            t ← extract j (up to w − 1) additional bits from d (including di)           i ← i − j           if j < w then               Perform double-and-add using t                return Q           else                Q ← point_double_repeat(Q, w)               Q ← point_add(Q, tP)   return Q

This algorithm has the benefit that the pre-computation stage is roughly half as complex as the normal windowed method while also trading slower point additions for point doublings. In effect, there is little reason to use the windowed method over this approach, except that the former can be implemented in constant time. The algorithm requires point doubles and at most point additions.

w-ary non-adjacent form (wNAF) method

In the non-adjacent form we aim to make use of the fact that point subtraction is just as easy as point addition to perform fewer (of either) as compared to a sliding-window method. The NAF of the multiplicand must be computed first with the following algorithm

   i ← 0    while (d > 0) do        if (d mod 2) = 1 then             di ← d mods 2w            d ← d − di        else            di = 0        d ← d/2        i ← i + 1    return (di−1, di-2, …, d0)

Where the signed modulo function mods is defined as

   if (d mod 2w) >= 2w−1        return (d mod 2w) − 2w    else        return d mod 2w

This produces the NAF needed to now perform the multiplication. This algorithm requires the pre-computation of the points and their negatives, where is the point to be multiplied. On typical Weierstrass curves, if then . So in essence the negatives are cheap to compute. Next, the following algorithm computes the multiplication :

   Q ← 0    for j ← i − 1 downto 0 do        Q ← point_double(Q)        if (dj != 0)            Q ← point_add(Q, djP)    return Q

The wNAF guarantees that on average there will be a density of point additions (slightly better than the unsigned window). It requires 1 point doubling and point additions for precomputation. The algorithm then requires point doublings and point additions for the rest of the multiplication.

One property of the NAF is that we are guaranteed that every non-zero element is followed by at least additional zeroes. This is because the algorithm clears out the lower bits of with every subtraction of the output of the mods function. This observation can be used for several purposes. After every non-zero element the additional zeroes can be implied and do not need to be stored. Secondly, the multiple serial divisions by 2 can be replaced by a division by after every non-zero element and divide by 2 after every zero.

It has been shown that through application of a FLUSH+RELOAD side-channel attack on OpenSSL, the full private key can be revealed after performing cache-timing against as few as 200 signatures performed. [4]

Montgomery ladder

The Montgomery ladder [5] approach computes the point multiplication in a fixed number of operations. This can be beneficial when timing, power consumption, or branch measurements are exposed to an attacker performing a side-channel attack. The algorithm uses the same representation as from double-and-add.

  R0 ← 0   R1 ← P   for i from m downto 0 do       if di = 0 then           R1 ← point_add(R0, R1)           R0 ← point_double(R0)       else           R0 ← point_add(R0, R1)           R1 ← point_double(R1)        // invariant property to maintain correctness       assert R1 == point_add(R0, P)   return R0

This algorithm has in effect the same speed as the double-and-add approach except that it computes the same number of point additions and doubles regardless of the value of the multiplicand d. This means that at this level the algorithm does not leak any information through branches or power consumption.

However, it has been shown that through application of a FLUSH+RELOAD side-channel attack on OpenSSL, the full private key can be revealed after performing cache-timing against only one signature at a very low cost. [6]

Rust code for Montgomery Ladder: [7]

/// Constant operation point multiplication. /// NOTE: not memory safe./// * `s`: scalar value to multiply by/// * multiplication is defined to be P₀ + P₁ + ... Pₛfnsec_mul(&mutself,s: big)-> E521{letmutr0=get_e521_id_point();letmutr1=self.clone();foriin(0..=s.significant_bits()).rev(){ifs.get_bit(i){r0=r0.add(&r1);r1=r1.add(&r1.clone());}else{r1=r0.add(&r1);r0=r0.add(&r0.clone());}}r0// r0 = P * s}

Constant time Montgomery ladder

The security of a cryptographic implementation is likely to face the threat of the so called timing attacks which exploits the data-dependent timing characteristics of the implementation. Machines running cryptographic implementations consume variable amounts of time to process different inputs and so the timings vary based on the encryption key. To resolve this issue, cryptographic algorithms are implemented in a way which removes data dependent variable timing characteristic from the implementation leading to the so called constant-time implementations. Software implementations are considered to be constant-time in the following sense as stated in: [8] avoids all input-dependent branches, all input-dependent arrayindices, and other instructions with input-dependent timings.” The GitHub page [9] lists coding rules for implementations of cryptographic operations, and more generally for operations involving secret or sensitive values.

The Montgomery ladder is an -coordinate only algorithm for elliptic curve point multiplication and is based on the double and add rules over a specific set of curves known as Montgomery curve. The algorithm has a conditional branching such that the condition depends on a secret bit. So a straightforward implementation of the ladder won't be constant time and has the potential to leak the secret bit. This problem has been addressed in literature [10] [11] and several constant time implementations are known. The constant time Montgomery ladder algorithm is as given below which uses two functions CSwap and Ladder-Step. In the return value of the algorithm Z2p-2 is the value of Z2-1 computed using the Fermat's little theorem.

Algorithm Montgomery-Ladder(xP,n)   input: An -bit scalar  and the -coordinate  of a point .   output: -coordinate of , the -times scalar multiple of .  
X1 ← xP; X2 ← 1; Z2 ← 0; X3 ← xP; Z3 ← 1 prevbit ← 0 for from downto 0 do bit ← bit-value at index of b ← bit prevbit prevbit ← bit (X2,Z2,X3,Z3) ← CSwap(X2,Z2,X3,Z3,b) (X2,Z2,X3,Z3) ← Ladder-Step(X2,Z2,X3,Z3,X1)
return X2Z2p-2

The Ladder-Step function (given below) used within the ladder is the core of the algorithm and is a combined form of the differential add and doubling operations. The field constant a24 is defined as a24 = , where is a parameter of the underlying Montgomery curve.

Function Ladder-Step(X2,Z2,X3,Z3,X1)
T1 ← X2 + Z2 T2 ← X2 - Z2 T3 ← X3 + Z3 T4 ← X3 - Z3 T5 ← T12 T6 ← T22 T2 ← T2 · T3 T1 ← T1 · T4 T1 ← T1 + T2 T2 ← T1 - T2 X3 ← T12 T2 ← T22 Z3 ← T2 · X1 X2 ← T5 · T6 T5 ← T5 - T6 T1 ← a24 · T5 T6 ← T6 + T1 Z2 ← T5 · T6
return (X2,Z2,X3,Z3)

The CSwap function manages the conditional branching and helps the ladder to run following the requirements of a constant time implementation. The function swaps the pair of field elements X2,Z2 and X3,Z3 only if = 1 and this is done without leaking any information about the secret bit. Various methods of implementing CSwap have been proposed in literature. [10] [11] A lesser costly option to manage the constant time requirement of the Montgomery ladder is conditional select which is formalised through a function CSelect. This function has been used in various optimisations and has been formally discussed in [12]

Since the inception of the standard Montgomery curve Curve25519 at 128-bit security level, there has been various software implementations to compute the ECDH on various architectures and to achieve best possible performance cryptographic developers have resorted to write the implementations using assembly language of the underlying architecture. The work [13] provided a couple of 64-bit assembly implementations targeting the AMD64 architecture. The implementations were developed using a tool known as qhasm [14] which can generate high-speed assembly language cryptographic programs. It is to be noted that the function CSwap was used in the implementations of these ladders. After that there has been several attempts to optimise the ladder implementation through hand-written assembly programs out of which the notion of CSelect was first used in [15] and then in. [16] Apart from using sequential instructions, vector instructions have also been used to optimise the ladder computation through various works. [17] [18] [19] [20] Along with AMD64, attempts have also been made to achieve efficient implementations on other architectures like ARM. The works [21] and [22] provides efficient implementations targeting the ARM architecture. The libraries lib25519 [23] and [24] are two state-of-art libraries containing efficient implementations of the Montgomery ladder for Curve25519. Nevertheless, the libraries have implementations of other cryptographic primitives as well.

Apart from Curve25519, there have been several attempts to compute the ladder over other curves at various security levels. Efficient implementations of the ladder over the standard curve Curve448 at 224-bit security level have also been studied in literature. [15] [18] [20] A curve named Curve41417 providing security just over 200 bits was proposed [25] in which a variant of Karatsuba strategy was used to implement the field multiplication needed for the related ECC software. In pursuit of searching Montgomery curves that are competitive to Curve25519 and Curve448 research has been done and couple of curves were proposed along with efficient sequential [16] and vectorised implementations [20] of the corresponding ladders. At 256-bit security level efficient implementations of the ladder have also been addressed through three different Montgomery curves. [26]

Related Research Articles

Elliptic-curve cryptography (ECC) is an approach to public-key cryptography based on the algebraic structure of elliptic curves over finite fields. ECC allows smaller keys compared to non-EC cryptography to provide equivalent security.

<span class="mw-page-title-main">Elliptic curve</span> Algebraic curve

In mathematics, an elliptic curve is a smooth, projective, algebraic curve of genus one, on which there is a specified point O. An elliptic curve is defined over a field K and describes points in K2, the Cartesian product of K with itself. If the field's characteristic is different from 2 and 3, then the curve can be described as a plane algebraic curve which consists of solutions (x, y) for:

In mathematics and computer programming, exponentiating by squaring is a general method for fast computation of large positive integer powers of a number, or more generally of an element of a semigroup, like a polynomial or a square matrix. Some variants are commonly referred to as square-and-multiply algorithms or binary exponentiation. These can be of quite general use, for example in modular arithmetic or powering of matrices. For semigroups for which additive notation is commonly used, like elliptic curves used in cryptography, this method is also referred to as double-and-add.

In arithmetic and computer programming, the extended Euclidean algorithm is an extension to the Euclidean algorithm, and computes, in addition to the greatest common divisor (gcd) of integers a and b, also the coefficients of Bézout's identity, which are integers x and y such that

The Lenstra elliptic-curve factorization or the elliptic-curve factorization method (ECM) is a fast, sub-exponential running time, algorithm for integer factorization, which employs elliptic curves. For general-purpose factoring, ECM is the third-fastest known factoring method. The second-fastest is the multiple polynomial quadratic sieve, and the fastest is the general number field sieve. The Lenstra elliptic-curve factorization is named after Hendrik Lenstra.

In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix A into a product A = QR of an orthonormal matrix Q and an upper triangular matrix R. QR decomposition is often used to solve the linear least squares (LLS) problem and is the basis for a particular eigenvalue algorithm, the QR algorithm.

In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, "easy" and "hard" are to be understood in the sense of computational complexity theory, specifically the theory of polynomial time problems. Not being one-to-one is not considered sufficient for a function to be called one-way.

In cryptography, the Elliptic Curve Digital Signature Algorithm (ECDSA) offers a variant of the Digital Signature Algorithm (DSA) which uses elliptic-curve cryptography.

KCDSA is a digital signature algorithm created by a team led by the Korea Internet & Security Agency (KISA). It is an ElGamal variant, similar to the Digital Signature Algorithm and GOST R 34.10-94. The standard algorithm is implemented over , but an elliptic curve variant (EC-KCDSA) is also specified.

In geometry, the Hessian curve is a plane curve similar to folium of Descartes. It is named after the German mathematician Otto Hesse. This curve was suggested for application in elliptic curve cryptography, because arithmetic in this curve representation is faster and needs less memory than arithmetic in standard Weierstrass form.

Schoof's algorithm is an efficient algorithm to count points on elliptic curves over finite fields. The algorithm has applications in elliptic curve cryptography where it is important to know the number of points to judge the difficulty of solving the discrete logarithm problem in the group of points on an elliptic curve.

In cryptography, XTR is an algorithm for public-key encryption. XTR stands for 'ECSTR', which is an abbreviation for Efficient and Compact Subgroup Trace Representation. It is a method to represent elements of a subgroup of a multiplicative group of a finite field. To do so, it uses the trace over to represent elements of a subgroup of .

An important aspect in the study of elliptic curves is devising effective ways of counting points on the curve. There have been several approaches to do so, and the algorithms devised have proved to be useful tools in the study of various fields such as number theory, and more recently in cryptography and Digital Signature Authentication. While in number theory they have important consequences in the solving of Diophantine equations, with respect to cryptography, they enable us to make effective use of the difficulty of the discrete logarithm problem (DLP) for the group , of elliptic curves over a finite field , where q = pk and p is a prime. The DLP, as it has come to be known, is a widely used approach to public key cryptography, and the difficulty in solving this problem determines the level of security of the cryptosystem. This article covers algorithms to count points on elliptic curves over fields of large characteristic, in particular p > 3. For curves over fields of small characteristic more efficient algorithms based on p-adic methods exist.

<span class="mw-page-title-main">Edwards curve</span>

In mathematics, the Edwards curves are a family of elliptic curves studied by Harold Edwards in 2007. The concept of elliptic curves over finite fields is widely used in elliptic curve cryptography. Applications of Edwards curves to cryptography were developed by Daniel J. Bernstein and Tanja Lange: they pointed out several advantages of the Edwards form in comparison to the more well known Weierstrass form.

In mathematics, the Twisted Hessian curve represents a generalization of Hessian curves; it was introduced in elliptic curve cryptography to speed up the addition and doubling formulas and to have strongly unified arithmetic. In some operations, it is close in speed to Edwards curves.

<span class="mw-page-title-main">Doubling-oriented Doche–Icart–Kohel curve</span>

In mathematics, the doubling-oriented Doche–Icart–Kohel curve is a form in which an elliptic curve can be written. It is a special case of Weierstrass form and it is also important in elliptic-curve cryptography because the doubling speeds up considerably. It has been introduced by Christophe Doche, Thomas Icart, and David R. Kohel in Efficient Scalar Multiplication by Isogeny Decompositions.

The tripling-oriented Doche–Icart–Kohel curve is a form of an elliptic curve that has been used lately in cryptography; it is a particular type of Weierstrass curve. At certain conditions some operations, as adding, doubling or tripling points, are faster to compute using this form. The Tripling oriented Doche–Icart–Kohel curve, often called with the abbreviation 3DIK has been introduced by Christophe Doche, Thomas Icart, and David R. Kohel in

In mathematics, elliptic curve primality testing techniques, or elliptic curve primality proving (ECPP), are among the quickest and most widely used methods in primality proving. It is an idea put forward by Shafi Goldwasser and Joe Kilian in 1986 and turned into an algorithm by A. O. L. Atkin the same year. The algorithm was altered and improved by several collaborators subsequently, and notably by Atkin and François Morain, in 1993. The concept of using elliptic curves in factorization had been developed by H. W. Lenstra in 1985, and the implications for its use in primality testing followed quickly.

In cryptography, a public key exchange algorithm is a cryptographic algorithm which allows two parties to create and share a secret key, which they can use to encrypt messages between themselves. The ring learning with errors key exchange (RLWE-KEX) is one of a new class of public key exchange algorithms that are designed to be secure against an adversary that possesses a quantum computer. This is important because some public key algorithms in use today will be easily broken by a quantum computer if such computers are implemented. RLWE-KEX is one of a set of post-quantum cryptographic algorithms which are based on the difficulty of solving certain mathematical problems involving lattices. Unlike older lattice based cryptographic algorithms, the RLWE-KEX is provably reducible to a known hard problem in lattices.

In cryptography, FourQ is an elliptic curve developed by Microsoft Research. It is designed for key agreements schemes and digital signatures (Schnorr), and offers about 128 bits of security. It is equipped with a reference implementation made by the authors of the original paper. The open source implementation is called FourQlib and runs on Windows and Linux and is available for x86, x64, and ARM. It is licensed under the MIT License and the source code is available on GitHub.

References

  1. "Elliptic Curves - Explicit Addition Formulae".
  2. "Elliptic Curves - Explicit Addition Formulae".
  3. 1 2 Hankerson, Darrel; Vanstone, Scott; Menezes, Alfred (2004). Guide to Elliptic Curve Cryptography. Springer Professional Computing. New York: Springer-Verlag. doi:10.1007/b97644. ISBN   0-387-95273-X. S2CID   720546.
  4. Benger, Naomi; van de Pol, Joop; Smart, Nigel P.; Yarom, Yuval (2014). Batina, Lejla; Robshaw, Matthew (eds.). "Ooh Aah... Just a Little Bit" : A Small Amount of Side Channel Can Go a Long Way (PDF). Cryptographic Hardware and Embedded Systems – CHES 2014. Lecture Notes in Computer Science. Vol. 8731. Springer. pp. 72–95. doi:10.1007/978-3-662-44709-3_5. ISBN   978-3-662-44708-6.
  5. Montgomery, Peter L. (1987). "Speeding the Pollard and elliptic curve methods of factorization". Math. Comp. 48 (177): 243–264. doi: 10.2307/2007888 . JSTOR   2007888. MR   0866113.
  6. Yarom, Yuval; Benger, Naomi (2014). "Recovering OpenSSL ECDSA Nonces Using the FLUSH+RELOAD Cache Side-channel Attack". IACR Cryptology ePrint Archive.
  7. Ray, Dustin. "E521" . Retrieved 25 Feb 2023.
  8. Bernstein, Daniel J. "Curve25519: New Diffie-Hellman Speed Records". In: Yung, M., Dodis, Y., Kiayias, A., Malkin, T. (eds) Public Key Cryptography - PKC 2006. Lecture Notes in Computer Science, vol 3958. Springer, Berlin, Heidelberg.
  9. Aumasson, Jean-Philippe. "Guidelines for low-level cryptography Software". GitHub. Retrieved March 26, 2024.
  10. 1 2 Bernstein, Daniel J.; Lange, Tanja. "Montgomery curves and the Montgomery ladder". In Joppe W. Bos and Arjen K. Lenstra, editors, Topics in Computational Number Theory inspired by Peter L. Montgomery, pages 82–115. Cambridge University Press, 2017.
  11. 1 2 Costello, Craig; Smith, Benjamin. "Montgomery curves and their arithmetic - the case of large characteristic fields". J. Cryptographic Engineering, 8(3):227–240, 2018.
  12. Nath, Kaushik; Sarkar, Palash. "Constant Time Montgomery Ladder". Cryptology ePrint Archive, Paper 2020/956.
  13. Bernstein, Daniel J.; Duif, Niels; Lange, Tanja; Schwabe, Peter; Yang, Bo-Yin. "High-speed high-security signatures". J. Cryptographic Engineering, 2(2):77–89, 2012.
  14. Bernstein, Daniel J. "qhasm: tools to help write high-speed software".
  15. 1 2 Oliveira, Thomaz; López, Julio; Hisil, Hüseyin; Faz-Hernández, Armando; Rodrı́guez-Henrı́quez, Francisco. "How to (Pre-)Compute a Ladder: Improving the Performance of X25519 and X448". In Carlisle Adams and Jan Camenisch, editors, Selected Areas in Cryptography - SAC 2017 - 24th International Conference, Ottawa, ON, Canada, August 16-18, 2017, Revised Selected Papers, volume 10719 of Lecture Notes in Computer Science, pages 172–191. Springer, 2017., Code available at https://github.com/armfazh/rfc7748_precomputed.
  16. 1 2 Nath, Kaushik; Sarkar, Palash. "Security and Efficiency Trade-offs for Elliptic Curve Diffie-Hellman at the 128- and 224-bit Security Levels". J Cryptogr Eng 12, 107–121 (2022)., Code available at https://github.com/kn-cs/x25519
  17. Chou, Tung. "Sandy2x: New curve25519 speed records". In Orr Dunkelman and Liam Keliher, editors, Selected Areas in Cryptography - SAC 2015 - 22nd International Confer- ence, Sackville, NB, Canada, August 12-14, 2015, Revised Selected Papers, volume 9566 of Lecture Notes in Computer Science, pages 145–160. Springer, 2015., Code available at https://tungchou.github.io/sandy2x/
  18. 1 2 Faz-Hernández, Armando; López, Julio; Dahab, Ricardo. "High-performance Implementation of Elliptic Curve Cryptography Using Vector Instructions". ACM Trans. Math. Softw., 45(3):25:1–25:35, 2019., Coce available at https://github.com/armfazh/fld-ecc-vec
  19. Hisil, Hüseyin; Egrice, Berkan; Yassi, Mert. "Fast 4 Way Vectorized Ladder for the Complete Set of Montgomery Curves". International Journal of Information Security Science, Vol. 11, No. 2, pp. 12-24., Code available at https://github.com/crypto-ninjaturtles/montgomery4x
  20. 1 2 3 Nath, Kaushik; Sarkar, Palash. "Efficient 4-way Vectorizations of the Montgomery Ladder". IEEE Transactions on Computers, vol. 71, no. 3, pp. 712-723, 1 March 2022., Code available at https://github.com/kn-cs/vec-ladder
  21. Bernstein, Daniel J.; Schwabe, Peter. "NEON crypto". In Emmanuel Prouff and Patrick Schaumont, editors, Cryptographic Hardware and Embedded Systems - CHES 2012 - 14th International Workshop, Leuven, Belgium, September 9-12, 2012. Proceedings, volume 7428 of Lecture Notes in Computer Science, pages 320–339. Springer, 2012.
  22. Lenngren, Emil. "AArch64 optimized implementation for X25519" (PDF). GitHub.
  23. Nath, Kaushik; Bernstein, Daniel J. "lib25519".
  24. Harrison, John R. "s2n-bignum".
  25. Bernstein, Daniel J.; Chitchanok, Chuengsatiansup; Tanja, Lange. "Curve41417: Karatsuba Revisited". In: Batina, L., Robshaw, M. (eds) Cryptographic Hardware and Embedded Systems – CHES 2014. CHES 2014. Lecture Notes in Computer Science, vol 8731. Springer, Berlin, Heidelberg.
  26. Nath, Kaushik; Sarkar, Palash. "Efficient Elliptic Curve Diffie-Hellman Computation at the 256-bit Security Level". IET Information Security, 14(6):633640, 2020., Code available at https://github.com/kn-cs/mont256-dh and https://github.com/kn-cs/mont256-vec