PURB (cryptography)

Last updated

In cryptography, a padded uniform random blob or PURB is a discipline for encrypted data formats designed to minimize unintended information leakage either from its encryption format metadata or from its total length. [1]

Contents

Properties of PURBs

When properly created, a PURB's content is indistinguishable from a uniform random bit string to any observer without a relevant decryption key. A PURB therefore leaks no information through headers or other cleartext metadata associated with the encrypted data format. This leakage minimization "hygiene" practice contrasts with traditional encrypted data formats such as Pretty Good Privacy, which include cleartext metadata encoding information such as the application that created the data, the data format version, the number of recipients the data is encrypted for, the identities or public keys of the recipients, and the ciphers or suites that were used to encrypt the data. While such encryption metadata was considered non-sensitive when these encrypted formats were designed, modern attack techniques have found numerous ways to employ such incidentally-leaked metadata in facilitating attacks, such as by identifying data encrypted with weak ciphers or obsolete algorithms, fingerprinting applications to track users or identify software versions with known vulnerabilities, or traffic analysis techniques such as identifying all users, groups, and associated public keys involved in a conversation from an encrypted message observed between only two of them.

In addition, a PURB is padded to a constrained set of possible lengths, in order to minimize the amount of information the encrypted data could potentially leak to observers via its total length. Without padding, encrypted objects such as files or bit strings up to bits in length can leak up to bits of information to an observer - namely the number of bits required to represent the length exactly. A PURB is padded to a length representable in a floating point number whose mantissa is no longer (i.e., contains no more significant bits) than its exponent. This constraint limits the maximum amount of information a PURB's total length can leak to bits, a significant asymptotic reduction and the best achievable in general for variable-length encrypted formats whose multiplicative overhead is limited to a constant factor of the unpadded payload size. This asymptotic leakage is the same as one would obtain by padding encrypted objects to a power of some base, such as to a power of two. Allowing some significant mantissa bits in the length's representation rather than just an exponent, however, significantly reduces the overhead of padding. For example, padding to the next power of two can impose up to 100% overhead by nearly doubling the object's size, while a PURB's padding imposes overhead of at most 12% for small strings and decreasing gradually (to 6%, 3%, etc.) as objects get larger.

Experimental evidence indicate that on data sets comprising objects such as files, software packages, and online videos, leaving objects unpadded or padding to a constant block size often leaves them uniquely identifiable by total length alone. [2] [3] [1] Padding objects to a power of two or to a PURB length, in contrast, ensures that most objects are indistinguishable from at least some other objects and thus have a nontrivial anonymity set. [1]

Encoding and decoding PURBs

Because a PURB is a discipline for designing encrypted formats and not a particular encrypted format, there is no single prescribed method for encoding or decoding PURBs. Applications may use any encryption and encoding scheme provided it produces a bit string that appears uniformly random to an observer without an appropriate key, provided the appropriate hardness assumptions are satisfied of course, and provided the PURB is padded to one of the allowed lengths. Correctly-encoded PURBs therefore do not identify the application that created them in their ciphertext. A decoding application, therefore, cannot readily tell before decryption whether a PURB was encrypted for that application or its user, other than by trying to decrypt it with any available decryption keys.

Encoding and decoding a PURB presents technical efficiency challenges, in that traditional parsing techniques are not applicable because a PURB by definition has no metadata markers that a traditional parser could use to discern the PURB's structure before decrypting it. Instead, a PURB must be decrypted first obliviously to its internal structure, and then parsed only after the decoder has used an appropriate decryption key to find a suitable cryptographic entrypoint into the PURB.

Encoding and decoding PURBs intended to be decrypted by several different recipients, public keys, and/or ciphers presents the additional technical challenge that each recipient must find a different entrypoint at a distinct location in the PURB non-overlapping with those of the other recipients, but the PURB presents no cleartext metadata indicating the positions of those entrypoints or even the total number of them. The paper that proposed PURBs [1] also included algorithms for encrypting objects to multiple recipients using multiple cipher suites. With these algorithms, recipients can find their respective entrypoints into the PURB with only a logarithmic number of trial decryptions using symmetric-key cryptography and only one expensive public-key operation per cipher suite.

A third technical challenge is representing the public-key cryptographic material that needs to be encoded into each entrypoint in a PURB, such as the ephemeral Diffie-Hellman public key a recipient needs to derive the shared secret, in an encoding indistinguishable from uniformly random bits. Because the standard encodings of elliptic-curve points are readily distinguishable from random bits, for example, special indistinguishable encoding algorithms must be used for this purpose, such as Elligator [4] and its successors. [5] [6]

Tradeoffs and limitations

The primary privacy advantage that PURBs offer is a strong assurance that correctly-encrypted data leaks nothing incidental via internal metadata that observers might readily use to identify weaknesses in the data or software used to produce it, or to fingerprint the application or user that created the PURB. This privacy advantage can translate into a security benefit for data encrypted with weak or obsolete ciphers, or by software with known vulnerabilities that an attacker might exploit based on trivially-observable information gleaned from cleartext metadata.

A primary disadvantage of the PURB encryption discipline is the complexity of encoding and decoding, because the decoder cannot rely on conventional parsing techniques before decryption. A secondary disadvantage is the overhead that padding adds, although the padding scheme proposed for PURBs incurs at most only a few percent overhead for objects of significant size.

One critique of incurring the complexity and overhead costs of PURB encryption is that the context in which a PURB is stored or transmitted may often leak metadata about the encrypted content anyway, and such metadata is outside of the encryption format's purview or control and thus cannot be addressed by the encryption format alone. For example, an application's or user's choice of filename and directory in which to store a PURB on disk may indicate allow an observer to infer the application that likely created it and to what purpose, even if the PURB's data content itself does not. Similarly, encrypting an E-mail's body as a PURB instead of with traditional PGP or S/MIME format may eliminate the encryption format's metadata leakage, but cannot prevent information leakage from the cleartext E-mail headers, or from the endpoint hosts and E-mail servers involved in the exchange. Nevertheless, separate but complementary disciplines are typically available to limit such contextual metadata leakage, such as appropriate file naming conventions or use of pseudonymous E-mail addresses for sensitive communications.

Related Research Articles

In cryptography, a block cipher is a deterministic algorithm operating on fixed-length groups of bits, called blocks. It uses an unvarying transformation, that is, it uses a symmetric key. They are specified elementary components in the design of many cryptographic protocols and are widely used to implement the encryption of large amounts of data, including data exchange protocols.

Cipher Algorithm for encrypting and decrypting information

In cryptography, a cipher is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is encipherment. To encipher or encode is to convert information into cipher or code. In common parlance, "cipher" is synonymous with "code", as they are both a set of steps that encrypt a message; however, the concepts are distinct in cryptography, especially classical cryptography.

In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can decipher a ciphertext back to plaintext and access the original information. Encryption does not itself prevent interference but denies the intelligible content to a would-be interceptor.

A key in cryptography is a piece of information, usually a string of numbers or letters that are stored in a file, which, when processed through a cryptographic algorithm, can encode or decode cryptographic data. They can be used for both encryption and decryption in symmetric cryptography or can only be used for either encryption or decryption with asymmetric cryptography. Based on the method, the key can be different sizes and varieties, but in all cases, the strength of the encryption relies on the security of the key being maintained. A key’s security strength is dependent on its algorithm, the size of the key, the generation of the key, and the process of key exchange.

Symmetric-key algorithms are algorithms for cryptography that use the same cryptographic keys for both the encryption of plaintext and the decryption of ciphertext. The keys may be identical, or there may be a simple transformation to go between the two keys. The keys, in practice, represent a shared secret between two or more parties that can be used to maintain a private information link. The requirement that both parties have access to the secret key is one of the main drawbacks of symmetric-key encryption, in comparison to public-key encryption.

In cryptography, an initialization vector (IV) or starting variable (SV) is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to be unpredictable or unique. Randomization is crucial for some encryption schemes to achieve semantic security, a property whereby repeated usage of the scheme under the same key does not allow an attacker to infer relationships between segments of the encrypted message. For block ciphers, the use of an IV is described by the modes of operation.

In cryptography, a block cipher mode of operation is an algorithm that uses a block cipher to provide information security such as confidentiality or authenticity. A block cipher by itself is only suitable for the secure cryptographic transformation of one fixed-length group of bits called a block. A mode of operation describes how to repeatedly apply a cipher's single-block operation to securely transform amounts of data larger than a block.

Ciphertext

In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext that is unreadable by a human or computer without the proper cipher to decrypt it. Thus preventing loss of sensitive information via hacking. Decryption, the inverse of encryption, is the process of turning ciphertext into readable plaintext. Ciphertext is not to be confused with codetext because the latter is a result of a code, not a cipher.

A cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG) is a pseudorandom number generator (PRNG) with properties that make it suitable for use in cryptography. It is also loosely known as a cryptographic random number generator (CRNG).

Articles related to cryptography include:

In cryptography, a Feistel cipher is a symmetric structure used in the construction of block ciphers, named after the German-born physicist and cryptographer Horst Feistel who did pioneering research while working for IBM (USA); it is also commonly known as a Feistel network. A large proportion of block ciphers use the scheme, including the US Data Encryption Standard, the Soviet/Russian GOST and the more recent Blowfish and Twofish ciphers. In a Feistel cipher, encryption and decryption are very similar operations, and both consist of iteratively running a function called a "round function" a fixed number of times.

In cryptography, padding is any of a number of distinct practices which all include adding data to the beginning, middle, or end of a message prior to encryption. In classical cryptography, padding may include adding nonsense phrases to a message to obscure the fact that many messages end in predictable ways, e.g. sincerely yours.

S/MIME is a standard for public key encryption and signing of MIME data. S/MIME is on an IETF standards track and defined in a number of documents, most importantly RFC 3369, 3370, 3850 and 3851. It was originally developed by RSA Data Security and the original specification used the IETF MIME specification with the de facto industry standard PKCS#7 secure message format. Change control to S/MIME has since been vested in the IETF and the specification is now layered on Cryptographic Message Syntax (CMS), an IETF specification that is identical in most respects with PKCS #7. S/MIME functionality is built into the majority of modern email software and interoperates between them. Since it is built on CMS, MIME can also hold an advanced digital signature.

Probabilistic encryption is the use of randomness in an encryption algorithm, so that when encrypting the same message several times it will, in general, yield different ciphertexts. The term "probabilistic encryption" is typically used in reference to public key encryption algorithms; however various symmetric key encryption algorithms achieve a similar property, and stream ciphers such as Freestyle which are inherently random. To be semantically secure, that is, to hide even partial information about the plaintext, an encryption algorithm must be probabilistic.

Authenticated encryption (AE) and authenticated encryption with associated data (AEAD) are forms of encryption which simultaneously assure the confidentiality and authenticity of data.

In cryptography, Galois/Counter Mode (GCM) is a mode of operation for symmetric-key cryptographic block ciphers which is widely adopted for its performance. GCM throughput rates for state-of-the-art, high-speed communication channels can be achieved with inexpensive hardware resources. The operation is an authenticated encryption algorithm designed to provide both data authenticity (integrity) and confidentiality. GCM is defined for block ciphers with a block size of 128 bits. Galois Message Authentication Code (GMAC) is an authentication-only variant of the GCM which can form an incremental message authentication code. Both GCM and GMAC can accept initialization vectors of arbitrary length.

Institute of Electrical and Electronics Engineers (IEEE) standardization project for encryption of stored data, but more generically refers to the Security in Storage Working Group (SISWG), which includes a family of standards for protection of stored data and for the corresponding cryptographic key management.

In cryptography, a distinguishing attack is any form of cryptanalysis on data encrypted by a cipher that allows an attacker to distinguish the encrypted data from random data. Modern symmetric-key ciphers are specifically designed to be immune to such an attack. In other words, modern encryption schemes are pseudorandom permutations and are designed to have ciphertext indistinguishability. If an algorithm is found that can distinguish the output from random faster than a brute force search, then that is considered a break of the cipher.

Cryptography Practice and study of secure communication techniques

Cryptography, or cryptology, is the practice and study of techniques for secure communication in the presence of third parties called adversaries. More generally, cryptography is about constructing and analyzing protocols that prevent third parties or the public from reading private messages; various aspects in information security such as data confidentiality, data integrity, authentication, and non-repudiation are central to modern cryptography. Modern cryptography exists at the intersection of the disciplines of mathematics, computer science, electrical engineering, communication science, and physics. Applications of cryptography include electronic commerce, chip-based payment cards, digital currencies, computer passwords, and military communications.

Crypto Wars

The Crypto Wars is an unofficial name for the U.S. and allied governments' attempts to limit the public's and foreign nations' access to cryptography strong enough to resist decryption by national intelligence agencies.

References

  1. 1 2 3 4 Nikitin, Kirill; Barman, Ludovic; Lueks, Wouter; Underwood, Matthew; Hubaux, Jean-Pierre; Ford, Bryan (2019). "Reducing Metadata Leakage from Encrypted Files and Communication with PURBs" (PDF). Proceedings on Privacy Enhancing Technologies (PoPETS). 2019 (4): 6–33. doi: 10.2478/popets-2019-0056 . S2CID   47011059.
  2. Hintz, Andrew (April 2002). Fingerprinting Websites Using Traffic Analysis. International Workshop on Privacy Enhancing Technologies. doi:10.1007/3-540-36467-6_13.
  3. Sun, Qixiang; Simon, D.R.; Wang, Yi-Min; Russell, W.; Padmanabhan, V.N.; Qiu, Lili (May 2002). Statistical Identification of Encrypted Web Browsing Traffic. IEEE Symposium on Security and Privacy. doi:10.1109/SECPRI.2002.1004359.
  4. Bernstein, Daniel J.; Hamburg, Mike; Krasnova, Anna; Lange, Tanja (November 2013). Elligator: Elliptic-curve points indistinguishable from uniform random strings. Computer Communications Security.
  5. Tibouchi, Mehdi (March 2014). Elligator Squared: Uniform Points on Elliptic Curves of Prime Order as Uniform Random Strings (PDF). Financial Cryptography and Data Security.
  6. Aranha, Diego F.; Fouque, Pierre-Alain; Qian, Chen; Tibouchi, Mehdi; Zapalowicz, Jean-Christophe (August 2014). Binary Elligator Squared (PDF). International Conference on Selected Areas in Cryptography.