Convergent encryption

Last updated

Convergent encryption, also known as content hash keying, is a cryptosystem that produces identical ciphertext from identical plaintext files. This has applications in cloud computing to remove duplicate files from storage without the provider having access to the encryption keys. [1] The combination of deduplication and convergent encryption was described in a backup system patent filed by Stac Electronics in 1995. [2] This combination has been used by Farsite, [3] Permabit, [4] Freenet, MojoNation, GNUnet, flud, and the Tahoe Least-Authority File Store. [5]

Contents

The system gained additional visibility in 2011 when cloud storage provider Bitcasa announced they were using convergent encryption to enable de-duplication of data in their cloud storage service. [6]

Overview

  1. The system computes a cryptographic hash of the plaintext in question.
  2. The system then encrypts the plaintext by using the hash as a key.
  3. Finally, the hash itself is stored, encrypted with a key chosen by the user.

Known Attacks

Convergent encryption is open to a "confirmation of a file attack" in which an attacker can effectively confirm whether a target possesses a certain file by encrypting an unencrypted, or plain-text, version and then simply comparing the output with files possessed by the target. [7] This attack poses a problem for a user storing information that is non-unique, i.e. also either publicly available or already held by the adversary - for example: banned books or files that cause copyright infringement. An argument could be made that a confirmation of a file attack is rendered less effective by adding a unique piece of data such as a few random characters to the plain text before encryption; this causes the uploaded file to be unique and therefore results in a unique encrypted file. However, some implementations of convergent encryption where the plain-text is broken down into blocks based on file content, and each block then independently convergently encrypted may inadvertently defeat attempts at making the file unique by adding bytes at the beginning or end. [8]

Even more alarming than the confirmation attack is the "learn the remaining information attack" described by Drew Perttula in 2008. [9] This type of attack applies to the encryption of files that are only slight variations of a public document. For example, if the defender encrypts a bank form including a ten digit bank account number, an attacker that is aware of generic bank form format may extract defender's bank account number by producing bank forms for all possible bank account numbers, encrypt them and then by comparing those encryptions with defender's encrypted file deduce the bank account number. Note that this attack can be extended to attack a large number of targets at once (all spelling variations of a target bank customer in the example above, or even all potential bank customers), and the presence of this problem extends to any type of form document: tax returns, financial documents, healthcare forms, employment forms, etc. Also note that there is no known method for decreasing the severity of this attack -- adding a few random bytes to files as they are stored does not help, since those bytes can likewise be attacked with the "learn the remaining information" approach. The only effective approach to mitigating this attack is to encrypt the contents of files with a non-convergent secret before storing (negating any benefit from convergent encryption), or to simply not use convergent encryption in the first place.

See also

Related Research Articles

Blowfish is a symmetric-key block cipher, designed in 1993 by Bruce Schneier and included in many cipher suites and encryption products. Blowfish provides a good encryption rate in software, and no effective cryptanalysis of it has been found to date. However, the Advanced Encryption Standard (AES) now receives more attention, and Schneier recommends Twofish for modern applications.

In cryptography, padding is any of a number of distinct practices which all include adding data to the beginning, middle, or end of a message prior to encryption. In classical cryptography, padding may include adding nonsense phrases to a message to obscure the fact that many messages end in predictable ways, e.g. sincerely yours.

The Encrypting File System (EFS) on Microsoft Windows is a feature introduced in version 3.0 of NTFS that provides filesystem-level encryption. The technology enables files to be transparently encrypted to protect confidential data from attackers with physical access to the computer.

Multiple encryption is the process of encrypting an already encrypted message one or more times, either using the same or a different algorithm. It is also known as cascade encryption, cascade ciphering, multiple encryption, and superencipherment. Superencryption refers to the outer-level encryption of a multiple encryption.

Encryption software is software that uses cryptography to prevent unauthorized access to digital information. Cryptography is used to protect digital information on computers as well as the digital information that is sent to other computers over the Internet.

Disk encryption is a special case of data at rest protection when the storage medium is a sector-addressable device. This article presents cryptographic aspects of the problem. For an overview, see disk encryption. For discussion of different software packages and hardware devices devoted to this problem, see disk encryption software and disk encryption hardware.

EncFS is a Free (LGPL) FUSE-based cryptographic filesystem. It transparently encrypts files, using an arbitrary directory as storage for the encrypted files.

Institute of Electrical and Electronics Engineers (IEEE) standardization project for encryption of stored data, but more generically refers to the Security in Storage Working Group (SISWG), which includes a family of standards for protection of stored data and for the corresponding cryptographic key management.

Filesystem-level encryption, often called file-based encryption, FBE, or file/folder encryption, is a form of disk encryption where individual files or directories are encrypted by the file system itself.

MedImmune, Inc. v. Genentech, Inc., 549 U.S. 118 (2007), was a decision by the Supreme Court of the United States involving patent law. It arose from a lawsuit filed by MedImmune which challenged one of the Cabilly patents issued to Genentech. One of the central issues was whether a licensee retained the right to challenge a licensed patent, or whether this right was forfeited upon signing of the license agreement. The case related indirectly to past debate over whether the US should change to a first to file patent system - in 2011, President Obama signed the Leahy-Smith America Invents Act, which shifted the United States to a first-inventor-to-file patent system.

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent.

The analytical profile index or API is a classification of bacteria based on biochemical tests, allowing fast identification. This system is developed for quick identification of clinically relevant bacteria. Because of this, only known bacteria can be identified.

Preco Electronics Inc is a multinational vendor of radar-based object detection systems headquartered in Boise, Idaho, United States.

Qarbon is a software company headquartered in San Jose, CA.

<span class="mw-page-title-main">Jean Paoli</span>

Jean Paoli is one of the inventors of XML. Along with Tim Bray and C. Michael Sperberg-McQueen, Paoli co-edited the XML 1.0 recommendation for the World Wide Web Consortium starting in 1997 and until at least 2008.

i4i is an independent software company specializing in the delivery of XML / SGML document processing software in Toronto, Ontario, Canada, founded by Michel Vulpe in 1993.

Database encryption can generally be defined as a process that uses an algorithm to transform data stored in a database into "cipher text" that is incomprehensible without first being decrypted. It can therefore be said that the purpose of database encryption is to protect the data stored in a database from being accessed by individuals with potentially "malicious" intentions. The act of encrypting a database also reduces the incentive for individuals to hack the aforementioned database as "meaningless" encrypted data is of little to no use for hackers. There are multiple techniques and technologies available for database encryption, the most important of which will be detailed in this article.

Markes International, headquartered in the UK, develops and manufactures scientific instruments for thermal desorption.

Randall Coleman Fowler is an American inventor, entrepreneur, and investor.

Crypto-shredding is the practice of 'deleting' data by deliberately deleting or overwriting the encryption keys. This requires that the data have been encrypted. Data may be considered to exist in three states: data at rest, data in transit and data in use. General data security principles, such as in the CIA triad of confidentiality, integrity, and availability, require that all three states must be adequately protected.

References

  1. Secure Data Deduplication, Mark W. Storer Kevin Greenan Darrell D. E. Long Ethan L. Miller http://www.ssrc.ucsc.edu/Papers/storer-storagess08.pdf
  2. System for backing up files from disk volumes on multiple nodes of a computer network, US Patent 5,778,395 filed October 1995, http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=5778395.PN.&OS=PN/5778395&RS=PN/5778395
  3. Reclaiming Space from Duplicate Files in a Serverless Distributed File System, MSR-TR-2002-30, http://research.microsoft.com/apps/pubs/default.aspx?id=69954
  4. Data repository and method for promoting network storage of data, US Patent 7,412,462 provisionally filed Feb 2000, http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=7,412,462.PN.&OS=PN/7,412,462&RS=PN/7,412,462
  5. Drew Perttula and Attacks on Convergent Encryption https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html
  6. Finally! Bitcasa CEO Explains How The Encryption Works, September 18th, 2011, https://techcrunch.com/2011/09/18/bitcasa-explains-encryption/
  7. tahoe-lafs.org (2008-08-20). Retrieved on 2013-09-05.
  8. Storer, Greenan, Long & Miller: "Secure Data Deduplication" University of California at Santa Cruz (2008-10-31). Retrieved 2013-09-5.
  9. tahoe-lafs.org (2008-08-20). Retrieved on 2013-09-05.