HashKeeper

Last updated

HashKeeper is a database application of value primarily to those conducting forensic examinations of computers on a somewhat regular basis.

Database organized collection of data

A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.

A computer is a device that can be instructed to carry out sequences of arithmetic or logical operations automatically via computer programming. Modern computers have the ability to follow generalized sets of operations, called programs. These programs enable computers to perform an extremely wide range of tasks. A "complete" computer including the hardware, the operating system, and peripheral equipment required and used for "full" operation can be referred to as a computer system. This term may as well be used for a group of computers that are connected and work together, in particular a computer network or computer cluster.

Contents

Overview

HashKeeper uses the MD5 file signature algorithm to establish unique numeric identifiers (hash values) for files "known to be good" and "known to be bad."

The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. Although MD5 was initially designed to be used as a cryptographic hash function, it has been found to suffer from extensive vulnerabilities. It can still be used as a checksum to verify data integrity, but only against unintentional corruption. It remains suitable for other non-cryptographic purposes, for example for determining the partition for a particular key in a partitioned database.

A computer file is a computer resource for recording data discretely in a computer storage device. Just as words can be written to paper, so can information be written to a computer file. Files can be edited and transferred through the internet.

Algorithm An unambiguous specification of how to solve a class of problems

In mathematics and computer science, an algorithm is an unambiguous specification of how to solve a class of problems. Algorithms can perform calculation, data processing, automated reasoning, and other tasks.

The HashKeeper application was developed to reduce the amount of time required to examine files on digital media. Once an examiner defines a file as known to be good, the examiner need not repeat that analysis.

HashKeeper compares hash values of known to be good files against the hash values of files on a computer system. Where those values match "known to be good" files, the examiner can say, with substantial certainty, that the corresponding files on the computer system have been previously identified as known to be good and therefore do not need to be examined.

Where those values match known to be bad files, the examiner can say with substantial certainty that the corresponding files on the system being examined that the files are bad and therefore require further scrutiny. A hash match on known to be bad files does not relieve the examiner of the responsibility of verifying that the file or files are, in fact, of a criminal nature.

History

Created by the National Drug Intelligence Center (NDIC)—a component of the United States Department of Justice—in 1996, it was the first large scale source for hash values of "known to be good" and "known to be bad" files. HashKeeper was, and still is, the only community effort based upon the belief that members of state, national, and international law enforcement agencies can be trusted to submit properly categorized hash values. One of the first community sources of "known to be good" hash values was the United States Internal Revenue Service. The first source of "known to be bad" hash values was the Luxembourg Police who contributed hash values of recognized child pornography.

National Drug Intelligence Center

The United States National Drug Intelligence Center (NDIC), established in 1993, was a component of the U.S. Department of Justice and a member of the Intelligence Community. The General Counterdrug Intelligence Plan, implemented in February 2000, designated NDIC as the nation's principal center for strategic domestic counterdrug intelligence.

United States Department of Justice U.S. federal executive department in charge of law enforcement

The United States Department of Justice (DOJ), also known as the Justice Department, is a federal executive department of the U.S. government, responsible for the enforcement of the law and administration of justice in the United States, equivalent to the justice or interior ministries of other countries. The department was formed in 1870 during the Ulysses S. Grant administration.

Internal Revenue Service revenue service of the United States federal government

The Internal Revenue Service (IRS) is the revenue service of the United States federal government. The government agency is a bureau of the Department of the Treasury, and is under the immediate direction of the Commissioner of Internal Revenue, who is appointed to a five-year term by the President of the United States. The IRS is responsible for collecting taxes and administering the Internal Revenue Code, the main body of federal statutory tax law of the United States. The duties of the IRS include providing tax assistance to taxpayers and pursuing and resolving instances of erroneous or fraudulent tax filings. The IRS has also overseen various benefits programs, and enforces portions of the Affordable Care Act.

Availability

HashKeeper is available, free-of-charge, to law enforcement, military and other government agencies throughout the world. It is available to the public by sending a Freedom of Information Act request to NDIC.

A law enforcement agency (LEA), in North American English, is a government agency responsible for the enforcement of the laws.

Military organized body primarily tasked with preparing for and conducting war

A military is a heavily-armed highly-organised force primarily intended for warfare, also known collectively as armed forces. It is typically officially authorized and maintained by a sovereign state, with its members identifiable by their distinct military uniform. It may consist of one or more military branches such as an Army, Navy, Air Force and in certain countries, Marines and Coast Guard. The main task of the military is usually defined as defence of the state and its interests against external armed threats. Beyond warfare, the military may be employed in additional sanctioned and non-sanctioned functions within the state, including internal security threats, population control, the promotion of a political agenda, emergency services and reconstruction, protecting corporate economic interests, social ceremonies and national honor guards. For a nation's military may function as a discrete subculture within a larger civil society, through the development of separate infrastructures, which may include housing, schools, utilities, logistics, health and medical, law, food production, finance and banking.

Freedom of Information Act (United States) US statute regarding access to information held by the US government

The Freedom of Information Act (FOIA), 5 U.S.C. § 552, is a federal freedom of information law that requires the full or partial disclosure of previously unreleased information and documents controlled by the United States government upon request. The Act defines agency records subject to disclosure, outlines mandatory disclosure procedures, and defines nine exemptions to the statute. President Lyndon B. Johnson, despite his misgivings, signed the Freedom of Information Act into law on July 4, 1966, and it went into effect the following year.

In the 2012 United States budget, NDIC was de-funded and closed its doors on June 16, 2012. The availability and future of HashKeeper is uncertain.

Sources

HashKeeper Overview, National Drug Intelligence Center.[ better source needed ]

See also

Related Research Articles

Hash function type of function that can be used to map data of arbitrary size to data of fixed size

A hash function is any function that can be used to map data of arbitrary size onto data of a fixed size. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. Hash functions are often used in combination with a hash table, a common data structure used in computer software for rapid data lookup. Hash functions accelerate table or database lookup by detecting duplicated records in a large file. One such application is finding similar stretches in DNA sequences. They are also useful in cryptography. A cryptographic hash function allows one to easily verify whether some input data map onto a given hash value, but if the input data is unknown it is deliberately difficult to reconstruct it by knowing the stored hash value. This is used for assuring integrity of transmitted data, and is the building block for HMACs, which provide message authentication.

Hash table Associates data values with key values - a lookup table

In computing, a hash table is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

Password used for user authentication to prove identity or access approval

A password is a word or string of characters used for user authentication to prove identity or access approval to gain access to a resource, which is to be kept secret from those not allowed access.

In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection.

Distributed hash table

A distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table: pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

In cryptanalysis and computer security, password cracking is the process of recovering passwords from data that have been stored in or transmitted by a computer system. A common approach is to try guesses repeatedly for the password and check them against an available cryptographic hash of the password.

Hashcash is a proof-of-work system used to limit email spam and denial-of-service attacks, and more recently has become known for its use in bitcoin as part of the mining algorithm. Hashcash was proposed in 1997 by Adam Back and described more formally in Back's paper "Hashcash - A Denial of Service Counter-Measure".

File verification is the process of using an algorithm for verifying the integrity of a computer file. This can be done by comparing two files bit-by-bit, but requires two copies of the same file, and may miss systematic corruptions which might occur to both files. A more popular approach is to also store checksums (hashes) of files, also known as message digests, for later comparison.

National Software Reference Library organization

The National Software Reference Library (NSRL), is a project of the National Institute of Standards and Technology (NIST) which maintains a repository of known software, file profiles and file signatures for use by law enforcement and other organizations involved with computer forensic investigations. The project is supported by the United States Department of Justice's National Institute of Justice, the Federal Bureau of Investigation (FBI), Defense Computer Forensics Laboratory (DCFL), the U.S. Customs Service, software vendors, and state and local law enforcement. It also provides a research environment for computational analysis of large sets of files.

In computer science, data validation is the process of ensuring data have undergone data cleansing to ensure they have data quality, that is, that they are both correct and useful. It uses routines, often called "validation rules" "validation constraints" or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system. The rules may be implemented through the automated facilities of a data dictionary, or by the inclusion of explicit application program validation logic.

Git free and open source revision control software

Git is a distributed version-control system for tracking changes in source code during software development. It is designed for coordinating work among programmers, but it can be used to track changes in any set of files. Its goals include speed, data integrity, and support for distributed, non-linear workflows.

Rainbow table precomputed table for reversing cryptographic hash functions

A rainbow table is a precomputed table for reversing cryptographic hash functions, usually for cracking password hashes. Tables are usually used in recovering a password up to a certain length consisting of a limited set of characters. It is a practical example of a space–time tradeoff, using less computer processing time and more storage than a brute-force attack which calculates a hash on every attempt, but more processing time and less storage than a simple lookup table with one entry per hash. Use of a key derivation function that employs a salt makes this attack infeasible.

In computer science, consistent hashing is a special kind of hashing such that when a hash table is resized, only keys need to be remapped on average, where is the number of keys, and is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

Hash list

In computer science, a hash list is typically a list of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup and distributed databases.

Merkle tree tree in which every non-leaf node is labelled with the hash of the labels or values (in case of leaves) of its child nodes

In cryptography and computer science, a hash tree or Merkle tree is a tree in which every leaf node is labelled with the hash of a data block, and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes. Hash trees allow efficient and secure verification of the contents of large data structures. Hash trees are a generalization of hash lists and hash chains.

Digital forensics

Digital forensics is a branch of forensic science encompassing the recovery and investigation of material found in digital devices, often in relation to computer crime. The term digital forensics was originally used as a synonym for computer forensics but has expanded to cover investigation of all devices capable of storing digital data. With roots in the personal computing revolution of the late 1970s and early 1980s, the discipline evolved in a haphazard manner during the 1990s, and it was not until the early 21st century that national policies emerged.

Fingerprint (computing) digital identifier to identify a certain piece of data, derived from the data by an algorithm

In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes. This fingerprint may be used for data deduplication purposes. This is also referred to as file fingerprinting, data fingerprinting, or structured data fingerprinting.

In the BitTorrent file distribution system, a torrent file or METAINFO is a computer file that contains metadata about files and folders to be distributed, and usually also a list of the network locations of trackers, which are computers that help participants in the system find each other and form efficient distribution groups called swarms. A torrent file does not contain the content to be distributed; it only contains information about those files, such as their names, sizes, folder structure, and cryptographic hash values for verifying file integrity. The term torrent may refer either to the metadata file or to the files downloaded, depending on the context.

References

http://www.justice.gov/archive/ndic/ndic-moved.html

http://www.nsrl.nist.gov/nsrl-faqs.html#faq12