DataVault

Last updated February 07, 2023

The DataVault was Thinking Machines' mass storage system, storing five gigabytes of data, expandable to ten gigabytes with transfer rates of 40 megabytes per second. Eight DataVaults could be operated in parallel for a combined data transfer rate of 320 megabytes per second for up to 80 gigabytes of data.^[1]

Each DataVault unit stored its data in an array of 39 individual disk drives with data spread across the drives. Each 64-bit data chunk received from the I/O bus was split into two 32-bit words. After verifying parity, the DataVault controller added 7 bits of Error Correcting Code (ECC) and stored the resulting 39 bits on 39 individual drives. Subsequent failure of any one of the 39 drives would not impair reading of the data, since the ECC code allows any single bit error to be detected and corrected.^[1]

Although operation is possible with a single failed drive, three spare drives were available to replace failed units until they are repaired. The ECC codes permit 100% recovery of the data on any one failed disk, allowing a new copy of this data to be reconstructed and written onto the replacement disk. Once this recovery is complete, the data base is considered to be healed.^[1]

In today's terminology this would be labeled a RAID-2 subsystem.^[2] However, these units shipped before the label RAID was formed.

The DataVault was an example of unusual industrial design. Instead of the usual rectilinear box, the cabinet had a gentle curve that made it look like an information desk or a bartender's station.^[2]

Related Research Articles

The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures. To disambiguate arbitrarily sized bytes from the common 8-bit definition, network protocol documents such as The Internet Protocol refer to an 8-bit byte as an octet. Those bits in an octet are usually counted with numbering from 0 to 7 or 7 to 0 depending on the bit endianness. The first bit is number 0, making the eighth bit number 7.

A binary prefix is a unit prefix for multiples of units. It is most often used in data processing, data transmission, and digital information, principally in association with the bit and the byte, to indicate multiplication by a power of 2. As shown in the table to the right there are two sets of symbols for binary prefixes, one set established by International Electrotechnical Commission (IEC) and several other standards and trade organizations using two-letter symbols, e.g. Mi indicating 1048576 with a second set established by semiconductor industry convention using one-letter symbols, e.g., M also indicating 1048576.

Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

The gigabyte is a multiple of the unit byte for digital information. The prefix giga means 10⁹ in the International System of Units (SI). Therefore, one gigabyte is one billion bytes. The unit symbol for the gigabyte is GB.

A hard disk drive (HDD), hard disk, hard drive, or fixed disk, is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnetic material. The platters are paired with magnetic heads, usually arranged on a moving actuator arm, which read and write data to the platter surfaces. Data is accessed in a random-access manner, meaning that individual blocks of data can be stored and retrieved in any order. HDDs are a type of non-volatile storage, retaining stored data when powered off. Modern HDDs are typically in the form of a small rectangular box.

The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix mega is a multiplier of 1000000 (10⁶) in the International System of Units (SI). Therefore, one megabyte is one million bytes of information. This definition has been incorporated into the International System of Quantities.

Small Computer System Interface is a set of standards for physically connecting and transferring data between computers and peripheral devices. The SCSI standards define commands, protocols, electrical, optical and logical interfaces. The SCSI standard defines command sets for specific peripheral device types; the presence of "unknown" as one of these types means that in theory it can be used as an interface to almost any device, but the standard is highly pragmatic and addressed toward commercial requirements. The initial Parallel SCSI was most commonly used for hard disk drives and tape drives, but it can connect a wide range of other devices, including scanners and CD drives, although not all controllers can handle all devices.

Flash memory is an electronic non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for the NOR and NAND logic gates. Both use the same cell design, consisting of floating gate MOSFETs. They differ at the circuit level depending on whether the state of the bit line or word lines is pulled high or low: in NAND flash, the relationship between the bit line and the word lines resembles a NAND gate; in NOR flash, it resembles a NOR gate.

RAID is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).

Thinking Machines Corporation was a supercomputer manufacturer and artificial intelligence (AI) company, founded in Waltham, Massachusetts, in 1983 by Sheryl Handler and W. Daniel "Danny" Hillis to turn Hillis's doctoral work at the Massachusetts Institute of Technology (MIT) on massively parallel computing architectures into a commercial product named the Connection Machine. The company moved in 1984 from Waltham to Kendall Square in Cambridge, Massachusetts, close to the MIT AI Lab. Thinking Machines made some of the most powerful supercomputers of the time, and by 1993 the four fastest computers in the world were Connection Machines. The firm filed for bankruptcy in 1994; its hardware and parallel computing software divisions were acquired in time by Sun Microsystems.

A parity bit, or check bit, is a bit added to a string of binary code. Parity bits are a simple form of error detecting code. Parity bits are generally applied to the smallest units of a communication protocol, typically 8-bit octets (bytes), although they can also be applied separately to an entire message string of bits.

Data degradation is the gradual corruption of computer data due to an accumulation of non-critical failures in a data storage device. The phenomenon is also known as data decay, data rot or bit rot.

Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of measures to provide end-to-end data integrity, or lack of errors.

Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by International Business Machines (IBM) as a term to describe the robustness of their mainframe computers.

Error correction code memory is a type of computer data storage that uses an error correction code (ECC) to detect and correct n-bit data corruption which occurs in memory. ECC memory is used in most computers where data corruption cannot be tolerated, like industrial control applications, critical databases, and infrastructural memory caches.

Chipkill is IBM's trademark for a form of advanced error checking and correcting (ECC) computer memory technology that protects computer memory systems from any single memory chip failure as well as multi-bit errors from any portion of a single memory chip. One simple scheme to perform this function scatters the bits of a Hamming code ECC word across multiple memory chips, such that the failure of any single memory chip will affect only one ECC bit per word. This allows memory contents to be reconstructed despite the complete failure of one chip. Typical implementations use more advanced codes, such as a BCH code, that can correct multiple bits with less overhead.

In computer main memory, auxiliary storage and computer buses, data redundancy is the existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data. The additional data can simply be a complete copy of the actual data, or only select pieces of data that allow detection of errors and reconstruction of lost or damaged data up to a certain level.

In computer storage, the standard RAID levels comprise a basic set of RAID configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (HDDs). The most common types are RAID 0 (striping), RAID 1 (mirroring) and its variants, RAID 5, and RAID 6. Multiple RAID levels can also be combined or nested, for instance RAID 10 or RAID 01. RAID levels and their associated data formats are standardized by the Storage Networking Industry Association (SNIA) in the Common RAID Disk Drive Format (DDF) standard. The numerical values only serve as identifiers and do not signify performance, reliability, generation, or any other metric.

This timeline of binary prefixes lists events in the history of the evolution, development, and use of units of measure which are germane to the definition of the binary prefixes by the International Electrotechnical Commission (IEC) in 1998, used primarily with units of information such as the bit and the byte.

ZFS is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an open source license as OpenSolaris for around 5 years from 2005, before being placed under a closed source license when Oracle Corporation acquired Sun in 2009/2010. During 2005 to 2010, the open source version of ZFS was ported to Linux, Mac OS X and FreeBSD. In 2010, the illumos project forked a recent version of OpenSolaris, to continue its development as an open source project, including ZFS. In 2013, OpenZFS was founded to coordinate the development of open source ZFS. OpenZFS maintains and manages the core ZFS code, while organizations using ZFS maintain the specific code and validation processes required for ZFS to integrate within their systems. OpenZFS is widely used in Unix-like systems.

References

1 2 3 "Connection Machine® Model CM-2 Technical Summary, Chapter 7". Thinking Machines Corporation. April 1987. pp. 28–30. Retrieved January 16, 2015.
1 2 Padua, David, ed. (2011). "Connection Machine". Encyclopedia of Parallel Computing. Springer Science & Business Media. p. 393. ISBN 9780387097657 . Retrieved 16 January 2015.

External links

USpatent 4899342,"Method and apparatus for operating multi-unit array of memories",issued 1990-02-06, assigned to Thinking Machines Corporation

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[TMCM2-1] 1 2 3 "Connection Machine® Model CM-2 Technical Summary, Chapter 7". Thinking Machines Corporation. April 1987. pp. 28–30. Retrieved January 16, 2015.

[Padua2011-2] 1 2 Padua, David, ed. (2011). "Connection Machine". Encyclopedia of Parallel Computing. Springer Science & Business Media. p. 393. ISBN 9780387097657 . Retrieved 16 January 2015.

[1]

[2]