LSH (hash function)

Last updated

LSH is a cryptographic hash function designed in 2014 by South Korea to provide integrity in general-purpose software environments such as PCs and smart devices. [1] LSH is one of the cryptographic algorithms approved by the Korean Cryptographic Module Validation Program (KCMVP). And it is the national standard of South Korea (KS X 3262).

Contents

Specification

The overall structure of the hash function LSH is shown in the following figure.

Overall structure of LSH LSH structure.png
Overall structure of LSH

The hash function LSH has the wide-pipe Merkle-Damgård structure with one-zeros padding. The message hashing process of LSH consists of the following three stages.

  1. Initialization:
    • One-zeros padding of a given bit string message.
    • Conversion to 32-word array message blocks from the padded bit string message.
    • Initialization of a chaining variable with the initialization vector.
  2. Compression:
    • Updating of chaining variables by iteration of a compression function with message blocks.
  3. Finalization:
    • Generation of an -bit hash value from the final chaining variable.
  • function Hash function LSH
  • input: Bit string message
  • output: Hash value
  • procedure

One-zeros padding of

Generation of message blocks , where from the padded bit string

fortodo

end for

return

The specifications of the hash function LSH are as follows.

Hash function LSH specifications
AlgorithmDigest size in bits ()Number of step functions ()Chaining variable size in bitsMessage block size in bitsWord size in bits ()
LSH-256-22422426512102432
LSH-256-256256
LSH-512-224224281024204864
LSH-512-256256
LSH-512-384384
LSH-512-512512

Initialization

Let be a given bit string message. The given is padded by one-zeros, i.e., the bit ‘1’ is appended to the end of , and the bit ‘0’s are appended until a bit length of a padded message is , where and is the smallest integer not less than .

Let be the one-zeros-padded -bit string of . Then is considered as a -byte array , where for all . The -byte array converts into a -word array as follows.

From the word array , we define the 32-word array message blocks as follows.

The 16-word array chaining variable is initialized to the initialization vector .

The initialization vector is as follows. In the following tables, all values are expressed in hexadecimal form.

LSH-256-224 initialization vector
068608D362D8F7A7D76652AB4C600A43BDC40AA81ECA0B68DA1A89BE3147D354
707EB4F9F65B38626B0B2ABE56B8EC0ACF237286EE0D1727336365958BB8D05F
LSH-256-256 initialization vector
46A10F1FFDDCE486B41443A8198E6B9D3304388DB0F5A3C7B36061C47ADBD553
105D53782F74DE545C2F2D95F2553FBE8051357A138668C847AA4484E01AFB41
LSH-512-224 initialization vector
0C401E9FE8813A554A5F446268FD3D35FF13E452334F612AF8227661037E354A
A5F223723C9CA29D95D965A11AED397901E23835B9AB02CC52D49CBAD5B30616
9E5C2027773F4ED366A5C8801925B70122BBC85B4C6779D9C13171A42C559C23
31E2B67D25BE3813D522C4DEED8E4D83A79F5509B43FBAFEE00D2CD88B4B6C6A
LSH-512-256 initialization vector
6DC57C33DF989423D8EA7F6E8342C19976DF8356F8603AC440F1B44DE838223A
39FFE7CFC31484CD39C4326CC52815488A2FF85A346045D8FF202AA46DBDD61E
CF785B3CD5FCDB8B1F0323B64A8150BFFF75D972F29EA3552E567F30BF1CA9E1
B596875BF8FF6DBAFCCA39B089EF4615ECFF4017D020B4B67E77384C772ED802
LSH-512-384 initialization vector
53156A66292808F6B2C4F362B204C2BCB84B7213BFA05C4E976CEB7C1B299F73
DF0CC63C0570AE97DA4441BAA486CE3F6559F5D9B5F2ACC222DACF19B4B52A16
BBCDACEFDE80953AC9891A2879725B3E7C9FE6330237E440A30BA550553F7431
BB08043FB34E3E30A0DEC48D54618EAD150317267464BC5732D1501FDE63DC93
LSH-512-512 initialization vector
ADD50F3C7F07094EE3F3CEE8F9418A4FB527ECDE5B3D0AE92EF6DEC68076F501
8CB994CAE5ACA216FBB9EAE4BBA48CC7650A526174725FEA1F9A61A73F8D8085
B6607378173B539B1BC99853B0C0B9EDDF727FC19B182D47DBEF360CF893A457
4981F5E570147E80D00C4490CA7D3E305D73940C0E4AE1EC894085E2EDB2D819

Compression

In this stage, the 32-word array message blocks , which are generated from a message in the initialization stage, are compressed by iteration of compression functions. The compression function has two inputs; the -th 16-word chaining variable and the -th 32-word message block . And it returns the -th 16-word chaining variable . Here and subsequently, denotes the set of all -word arrays for .

The following four functions are used in a compression function:

The overall structure of the compression function is shown in the following figure.

Compression function of LSH LSH structure of the compression function.png
Compression function of LSH

In a compression function, the message expansion function generates 16-word array sub-messages from given . Let be a temporary 16-word array set to the -th chaining variable . The -th step function having two inputs and updates , i.e., . All step functions are proceeded in order . Then one more operation by is proceeded, and the -th chaining variable is set to . The process of a compression function in detail is as follows.

  • function Compression function
  • input: The -th chaining variable and the -th message block
  • output: The -th chaining variable
  • procedure

fortodo

end for

return

Here the -th step function is as follows.

The following figure shows the -th step function of a compression function.

The
j
{\displaystyle j}
-th step function
Step
j
{\displaystyle {\textrm {Step}}_{j}} LSH step function.png
The -th step function

Message Expansion Function MsgExp

Let be the -th 32-word array message block. The message expansion function generates 16-word array sub-messages from a message block . The first two sub-messages and are defined as follows.

The next sub-messages are generated as follows.

Here is the permutation over defined as follows.

The permutation
0123456789101112131415
3201745611108915121314

Message Addition Function MsgAdd

For two 16-word arrays and , the message addition function is defined as follows.

Mix Function Mix

The -th mix function updates the 16-word array by mixing every two-word pair; and for . For , the mix function proceeds as follows.

Here is a two-word mix function. Let and be words. The two-word mix function is defined as follows.

  • function Two-word mix function
  • input: Words and
  • output: Words and
  • procedure

;;

;

;;

;;

return, ;

The two-word mix function is shown in the following figure.

Two-word mix function
Mix
j
,
l
(
X
,
Y
)
{\displaystyle {\textrm {Mix}}_{j,l}(X,Y)} LSH mix function.png
Two-word mix function

The bit rotation amounts , , used in are shown in the following table.

Bit rotation amounts , , and
32even291081624241680
odd517
64even235901632488244056
odd73

The -th 8-word array constant used in for is defined as follows. The initial 8-word array constant is defined in the following table. For , the -th constant is generated by for .

Initial 8-word array constant
917caf9097884283c938982a
6c1b10a2ba1fca93533e2355
6f352943c519a2e87aeb1c03
cf7782439a0fc95462af17b1
2ceb7472fc3dda8ab019a82b
29e96ff202825d079a895407
8a9ba42879f2d0a7ee06a6f7
2eeb2642d76d15eed9fdf5fe

Word-Permutation Function WordPerm

Let be a 16-word array. The word-permutation function is defined as follows.

Here is the permutation over defined by the following table.

The permutation
0123456789101112131415
6457121514132013811109

Finalization

The finalization function returns -bit hash value from the final chaining variable . When is an 8-word variable and is a -byte variable, the finalization function performs the following procedure.

Here, denotes , the sub-bit string of a word for . And denotes , the sub-bit string of a -bit string for .

Security

LSH is secure against known attacks on hash functions up to now. LSH is collision-resistant for and preimage-resistant and second-preimage-resistant for in the ideal cipher model, where is a number of queries for LSH structure. [1] LSH-256 is secure against all the existing hash function attacks when the number of steps is 13 or more, while LSH-512 is secure if the number of steps is 14 or more. Note that the steps which work as security margin are 50% of the compression function. [1]

Performance

LSH outperforms SHA-2/3 on various software platforms. The following table shows the speed performance of 1MB message hashing of LSH on several platforms.

1MB message hashing speed of LSH (cycles/byte) [1]
PlatformP1 [lower-alpha 1] P2 [lower-alpha 2] P3 [lower-alpha 3] P4 [lower-alpha 4] P5 [lower-alpha 5] P6 [lower-alpha 6] P7 [lower-alpha 7] P8 [lower-alpha 8]
LSH-256-3.603.865.263.8911.1715.0315.2814.84
LSH-512-2.395.047.765.528.9418.7619.0018.10
  1. Intel Core i7-4770K @ 3.5GHz (Haswell), Ubuntu 12.04 64-bit, GCC 4.8.1 with “-m64 -mavx2 -O3”
  2. Intel Core i7-2600K @ 3.40GHz (Sandy Bridge), Ubuntu 12.04 64-bit, GCC 4.8.1 with “-m64 -msse4 -O3”
  3. Intel Core 2 Quad Q9550 @ 2.83GHz (Yorkfield), Windows 7 32-bit, Visual studio 2012
  4. AMD FX-8350 @ 4GHz (Piledriver), Ubuntu 12.04 64-bit, GCC 4.8.1 with “-m64 -mxop -O3”
  5. Samsung Exynos 5250 ARM Cortex-A15 @ 1.7GHz dual core (Huins ACHRO 5250), Android 4.1.1
  6. Qualcomm Snapdragon 800 Krait 400 @ 2.26GHz quad core (LG G2), Android 4.4.2
  7. Qualcomm Snapdragon 800 Krait 400 @ 2.3GHz quad core (Samsung Galaxy S4), Android 4.2.2
  8. Qualcomm Snapdragon 400 Krait 300 @ 1.7GHz dual core (Samsung Galaxy S4 mini), Android 4.2.2

The following table is the comparison at the platform based on Haswell, LSH is measured on Intel Core i7-4770k @ 3.5 GHz quad core platform, and others are measured on Intel Core i5-4570S @ 2.9 GHz quad core platform.

Speed benchmark of LSH, SHA-2 and the SHA-3 finalists at the platform based on Haswell CPU (cycles/byte) [1]
AlgorithmMessage size in bytes
long4,0961,536576648
LSH-256-2563.603.713.904.088.1965.37
Skein-512-2565.015.585.866.4913.12104.50
Blake-2566.617.637.879.0516.5872.50
Grøstl-2569.4810.6812.1813.7137.94227.50
Keccak-25610.5610.529.9011.9923.38187.50
SHA-25610.8211.9112.2613.5124.88106.62
JH-25614.7015.5015.9417.0631.94257.00
LSH-512-5122.392.542.793.3110.8185.62
Skein-512-5124.675.515.806.4413.59108.25
Blake-5124.966.176.827.3814.81116.50
SHA-5127.658.248.699.0317.22138.25
Grøstl-51212.7815.4417.3017.9951.72417.38
JH-51214.2515.6616.1417.3432.69261.00
Keccak-51216.3617.8618.4620.3521.56171.88

The following table is measured on Samsung Exynos 5250 ARM Cortex-A15 @ 1.7 GHz dual core platform.

Speed benchmark of LSH, SHA-2 and the SHA-3 finalists at the platform based on Exynos 5250 ARM Cortex-A15 CPU (cycles/byte) [1]
AlgorithmMessage size in bytes
long4,0961,536576648
LSH-256-25611.1711.5312.1612.6322.42192.68
Skein-512-25615.6416.7218.3322.6875.75609.25
Blake-25617.9419.1120.8825.4483.94542.38
SHA-25619.9121.1423.0328.1390.89578.50
JH-25634.6636.0638.1043.51113.92924.12
Keccak-25636.0338.0140.5448.13125.001000.62
Grøstl-25640.7042.7646.0354.94167.521020.62
LSH-512-5128.949.5610.5512.2838.82307.98
Blake-51213.4614.8216.8820.9877.53623.62
Skein-512-51215.6116.7318.3522.5675.59612.88
JH-51234.8836.2638.3644.01116.41939.38
SHA-51244.1346.4149.9754.55135.591088.38
Keccak-51263.3164.5967.8577.21121.28968.00
Grøstl-512131.35138.49150.15166.54446.533518.00

Test vectors

Test vectors for LSH for each digest length are as follows. All values are expressed in hexadecimal form.

LSH-256-224("abc") = F7 C5 3B A4 03 4E 70 8E 74 FB A4 2E 55 99 7C A5 12 6B B7 62 36 88 F8 53 42 F7 37 32

LSH-256-256("abc") = 5F BF 36 5D AE A5 44 6A 70 53 C5 2B 57 40 4D 77 A0 7A 5F 48 A1 F7 C1 96 3A 08 98 BA 1B 71 47 41

LSH-512-224("abc") = D1 68 32 34 51 3E C5 69 83 94 57 1E AD 12 8A 8C D5 37 3E 97 66 1B A2 0D CF 89 E4 89

LSH-512-256("abc") = CD 89 23 10 53 26 02 33 2B 61 3F 1E C1 1A 69 62 FC A6 1E A0 9E CF FC D4 BC F7 58 58 D8 02 ED EC

LSH-512-384("abc") = 5F 34 4E FA A0 E4 3C CD 2E 5E 19 4D 60 39 79 4B 4F B4 31 F1 0F B4 B6 5F D4 5E 9D A4 EC DE 0F 27 B6 6E 8D BD FA 47 25 2E 0D 0B 74 1B FD 91 F9 FE

LSH-512-512("abc") = A3 D9 3C FE 60 DC 1A AC DD 3B D4 BE F0 A6 98 53 81 A3 96 C7 D4 9D 9F D1 77 79 56 97 C3 53 52 08 B5 C5 72 24 BE F2 10 84 D4 20 83 E9 5A 4B D8 EB 33 E8 69 81 2B 65 03 1C 42 88 19 A1 E7 CE 59 6D

Implementations

LSH is free for any use public or private, commercial or non-commercial. The source code for distribution of LSH implemented in C, Java, and Python can be downloaded from KISA's cryptography use activation webpage. [2]

KCMVP

LSH is one of the cryptographic algorithms approved by the Korean Cryptographic Module Validation Program (KCMVP). [3]

Standardization

LSH is included in the following standard.

Related Research Articles

<span class="mw-page-title-main">Independence (probability theory)</span> When the occurrence of one event does not affect the likelihood of another

Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

In mathematics, the Lp spaces are function spaces defined using a natural generalization of the p-norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue, although according to the Bourbaki group they were first introduced by Frigyes Riesz.

The Mersenne Twister is a general-purpose pseudorandom number generator (PRNG) developed in 1997 by Makoto Matsumoto and Takuji Nishimura. Its name derives from the fact that its period length is chosen to be a Mersenne prime.

Distributions, also known as Schwartz distributions or generalized functions, are objects that generalize the classical notion of functions in mathematical analysis. Distributions make it possible to differentiate functions whose derivatives do not exist in the classical sense. In particular, any locally integrable function has a distributional derivative.

In the calculus of variations and classical mechanics, the Euler–Lagrange equations are a system of second-order ordinary differential equations whose solutions are stationary points of the given action functional. The equations were discovered in the 1750s by Swiss mathematician Leonhard Euler and Italian mathematician Joseph-Louis Lagrange.

<span class="mw-page-title-main">Vapnik–Chervonenkis theory</span> Branch of statistical computational learning theory

Vapnik–Chervonenkis theory was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In abstract algebra and multilinear algebra, a multilinear form on a vector space over a field is a map

A Dynkin system, named after Eugene Dynkin, is a collection of subsets of another universal set satisfying a set of axioms weaker than those of 𝜎-algebra. Dynkin systems are sometimes referred to as 𝜆-systems or d-system. These set families have applications in measure theory and probability.

In mathematics, Doob's martingale inequality, also known as Kolmogorov’s submartingale inequality is a result in the study of stochastic processes. It gives a bound on the probability that a submartingale exceeds any given value over a given interval of time. As the name suggests, the result is usually given in the case that the process is a martingale, but the result is also valid for submartingales.

Linear Programming Boosting (LPBoost) is a supervised classifier from the boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes and hence also belongs to the class of margin-maximizing supervised classification algorithms. Consider a classification function

In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

In set theory, a mathematical discipline, the Jensen hierarchy or J-hierarchy is a modification of Gödel's constructible hierarchy, L, that circumvents certain technical difficulties that exist in the constructible hierarchy. The J-Hierarchy figures prominently in fine structure theory, a field pioneered by Ronald Jensen, for whom the Jensen hierarchy is named. Rudimentary functions describe a method for iterating through the Jensen hierarchy.

Uncertainty theory is a branch of mathematics based on normality, monotonicity, self-duality, countable subadditivity, and product measure axioms.

ACE is the collection of units, implementing both a public key encryption scheme and a digital signature scheme. Corresponding names for these schemes — «ACE Encrypt» and «ACE Sign». Schemes are based on Cramer-Shoup public key encryption scheme and Cramer-Shoup signature scheme. Introduced variants of these schemes are intended to achieve a good balance between performance and security of the whole encryption system.

Input-to-state stability (ISS) is a stability notion widely used to study stability of nonlinear control systems with external inputs. Roughly speaking, a control system is ISS if it is globally asymptotically stable in the absence of external inputs and if its trajectories are bounded by a function of the size of the input for all sufficiently large times. The importance of ISS is due to the fact that the concept has bridged the gap between input–output and state-space methods, widely used within the control systems community.

Dynamic epistemic logic (DEL) is a logical framework dealing with knowledge and information change. Typically, DEL focuses on situations involving multiple agents and studies how their knowledge changes when events occur. These events can change factual properties of the actual world : for example a red card is painted in blue. They can also bring about changes of knowledge without changing factual properties of the world : for example a card is revealed publicly to be red. Originally, DEL focused on epistemic events. We only present in this entry some of the basic ideas of the original DEL framework; more details about DEL in general can be found in the references.

A central problem in algorithmic graph theory is the shortest path problem. One of the generalizations of the shortest path problem is known as the single-source-shortest-paths (SSSP) problem, which consists of finding the shortest paths from a source vertex to all other vertices in the graph. There are classical sequential algorithms which solve this problem, such as Dijkstra's algorithm. In this article, however, we present two parallel algorithms solving this problem.

The finite promise games are a collection of mathematical games developed by American mathematician Harvey Friedman in 2009 which are used to develop a family of fast-growing functions , and . The greedy clique sequence is a graph theory concept, also developed by Friedman in 2010, which are used to develop fast-growing functions , and .

The method of (hypergraph) containers is a powerful tool that can help characterize the typical structure and/or answer extremal questions about families of discrete objects with a prescribed set of local constraints. Such questions arise naturally in extremal graph theory, additive combinatorics, discrete geometry, coding theory, and Ramsey theory; they include some of the most classical problems in the associated fields.

References

  1. 1 2 3 4 5 6 Kim, Dong-Chan; Hong, Deukjo; Lee, Jung-Keun; Kim, Woo-Hwan; Kwon, Daesung (2015). "LSH: A New Fast Secure Hash Function Family". Information Security and Cryptology - ICISC 2014. Lecture Notes in Computer Science. Vol. 8949. Springer International Publishing. pp. 286–313. doi:10.1007/978-3-319-15943-0_18. ISBN   978-3-319-15943-0.
  2. "KISA 암호이용활성화 - 암호알고리즘 소스코드". seed.kisa.or.kr.
  3. "KISA 암호이용활성화 - 개요". seed.kisa.or.kr.
  4. "Korean Standards & Certifications (in Korean)".