Content Addressable File Store

Last updated

The Content Addressable File Store (CAFS) [1] was a hardware device developed by International Computers Limited (ICL) that provided a disk storage with built-in search capability. The motivation for the device was the discrepancy between the high speed at which a disk could deliver data, and the much lower speed at which a general-purpose processor could filter the data looking for records that matched a search condition. [2] [3]

Development of CAFS started in ICL's Research and Advanced Development Centre under Gordon Scarrott in the late 1960s following research by George Coulouris and John Evans who had completed a field study at Imperial College and Queen Mary College on database systems and applications (Scarrott, 1995). Their study had revealed the potential for substantial performance improvements in large-scale database applications by the inclusion of search logic in the disk controller. [1]

In its initial form, the search logic was built into the disk head. A standalone CAFS device was installed with a few customers, including BT Directory Enquiries, during the 1970s. The device was subsequently productised and in 1982 was incorporated as a standard feature within ICL's 2900 series and Series 39 mainframes. By this stage, to reduce costs and to take advantage of increased hardware speeds, the search logic was incorporated into the disk controller. A query expressed in a high-level query language could be compiled into a search specification that was then sent to the disk controller for execution. Initially this capability was integrated into ICL's own Querymaster query language, which worked in conjunction with the IDMS database; subsequently it was integrated into the ICL VME port of the Ingres relational database.

ICL received the Queen's Award for Technological Achievement for CAFS in 1985.

One factor which limited the adoption of CAFS was that the device needed to know the layout of data on disk, and placed constraints on this layout. Integrating database products with CAFS often involved a change in page layout, making the integration very expensive, especially with the market trend towards use of third-party database software. Managing data integrity in a concurrent environment also required close attention, since a CAFS search would execute without any knowledge of locks and caches maintained by the database software.

ICL also produced a version of CAFS for its DRS minicomputer range called SCAFS (Son of CAFS). Unlike its mainframe cousin, this was implemented using custom firmware running on an industry-standard microprocessor. Software supporting third-party databases including Ingres, Informix and Oracle was marketed as the Ingres Search Accelerator (etc.). Each third-party product required modification, and was supplied with a dummy SCAFS interface library, to be replaced by the ICL product. The technology was also licensed to IBM for use with DB2 on the RS/6000. The device eventually became obsolete as processor speeds increased, removing the original justification for the device, namely that a central processor was not able to search data as fast as the disk subsystem could deliver it. Larger memory sizes also meant that many medium-sized databases could be kept entirely in memory. These factors removed any mass market for SCAFS and made it uneconomic.

See also

Related Research Articles

Database Organized collection of data

A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.

Mainframe computer Computers used primarily by large organizations for business-critical applications

A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications, bulk data processing. A mainframe computer is larger and has more processing power than some other classes of computers, such as minicomputers, servers, workstations, and personal computers. Most large-scale computer-system architectures were established in the 1960s, but they continue to evolve. Mainframe computers are often used as servers.

IBM System/360 Mainframe computer system family delivered between 1965 and 1978

The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover the complete range of applications, from small to large, both commercial and scientific. The design made a clear distinction between architecture and implementation, allowing IBM to release a suite of compatible designs at different prices. All but the only partially compatible Model 44 and the most expensive systems use microcode to implement the instruction set, which features 8-bit byte addressing and binary, decimal and hexadecimal floating-point calculations.

A direct-access storage device (DASD) is a secondary storage device in which "each physical record has a discrete location and a unique address". IBM coined the term DASD as a shorthand describing hard disk drives, magnetic drums, and data cells. Later, optical disc drives and flash memory units are also classified as DASD. The term DASD contrasts with sequential storage media such as magnetic tape, and unit record equipment such as card devices like card readers and punches.

Systems Network Architecture (SNA) is IBM's proprietary networking architecture, created in 1974. It is a complete protocol stack for interconnecting computers and their resources. SNA describes formats and protocols and is, in itself, not a piece of software. The implementation of SNA takes the form of various communications packages, most notably Virtual Telecommunications Access Method (VTAM), the mainframe software package for SNA communications.

ISAM is a method for creating, maintaining, and manipulating computer files of data so that records can be retrieved sequentially or randomly by one or more keys. Indexes of key fields are maintained to achieve fast retrieval of required file records in Indexed files. IBM originally developed ISAM for mainframe computers, but implementations are available for most computer systems.

Content-addressable memory (CAM) is a special type of computer memory used in certain very-high-speed searching applications. It is also known as associative memory or associative storage and compares input search data against a table of stored data, and returns the address of matching data.

International Computers Limited (ICL) was a British computer hardware, computer software and computer services company that operated from 1968 until 2002. It was formed through a merger of International Computers and Tabulators (ICT), English Electric Leo Marconi (EELM) and Elliott Automation in 1968. The company's most successful product line was the ICL 2900 Series range of mainframe computers.

In computing and especially in computer hardware, a controller is a chip, an expansion card, or a stand-alone device that interfaces with a more peripheral device. This may be a link between two parts of a computer or a controller on an external device that manages the operation of that device.

In computing, channel I/O is a high-performance input/output (I/O) architecture that is implemented in various forms on a number of computer architectures, especially on mainframe computers. In the past, channels were generally implemented with custom devices, variously named channel,I/O processor, I/O controller, or DMA controller.

ICL 2900 Series

The ICL 2900 Series was a range of mainframe computer systems announced by the UK manufacturer ICL on 9 October 1974. The company had started development, under the name "New Range" immediately on its formation in 1968. The range was not designed to be compatible with any previous machines produced by the company, nor with any competitor's machines: rather, it was conceived as a synthetic option combining the best ideas available from a variety of sources.

One Per Desk

The One Per Desk, or OPD, was an innovative hybrid personal computer/telecommunications terminal based on the hardware of the Sinclair QL. The One Per Desk was built by International Computers Limited (ICL) and launched in the UK in 1984. It was the result of a collaborative project between ICL, Sinclair Research and British Telecom begun in 1981, originally intended to incorporate Sinclair's flat-screen CRT technology.

Content-addressable storage, also referred to as content-addressed storage or abbreviated CAS, is a way to store information so it can be retrieved based on its content, not its location. It has been used for high-speed storage and retrieval of fixed content, such as documents stored for compliance with government regulations. Content-addressable storage is like content-addressable memory.

HP 64000

The HP 64000 Logic Development System, introduced 17 September 1979, is a tool for developing hardware and software for products based on commercial microprocessors from a variety of manufacturers. The systems assisted software development with assemblers and compilers for Pascal and C, provided hardware for in-circuit emulation of processors and memory, had debugging tools including logic analysis hardware, and a programmable read-only memory (PROM) chip programmer. A wide variety of optional cards and software were available tailored to particular microprocessors. When introduced the HP 64000 had two distinguishing characteristics. First, unlike most microprocessor development systems of the day, such as the Intel Intellec and Motorola EXORciser, it was not dedicated to a particular manufacturer's microprocessors, and second, it was designed such that up to six workstations would be connected via the HP-IB (IEEE-488) instrumentation bus to a common hard drive and printer to form a tightly integrated network.

Michael Stonebraker American computer scientist

Michael Ralph Stonebraker is a computer scientist specializing in database research. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational database systems. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB, and served as chief technical officer of Informix. He is also an editor for the book Readings in Database Systems.

The ICL DRS was a range of departmental computers from International Computers Limited (ICL). Standing originally for Distributed Resource System, the full name was later dropped in favour of the abbreviation.

ICL 7500 series

The ICL 7500 series was a range of terminals and workstations, that were developed by ICL during the 1970s for their new range ICL 2900 Series mainframe computers. The colour scheme was compatible with the 2900. The term 7561 is a commonly used though loose term for the interactive video aspects of the 7502 series. The 7501 and 7502 systems were known as Modular Terminal Processors in marketing publications. 7501 and 7502 systems were built at Blackhorse Road, Letchworth.

This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices.

Computer Automation

Computer Automation Inc. was a computer manufacturer founded by David H. Methvin in 1968, based originally in Newport Beach, California, United States. It opened a sales, support and repair arm in the UK in 1972, based at Hertford House, Maple Cross, Rickmansworth, Hertfordshire. Later relocated to Suite 2 Milfield House, Croxley Centre, Croxley Green, Watford, Hertfordshire.

Actian Vector is an SQL relational database management system designed for high performance in analytical database applications. It published record breaking results on the Transaction Processing Performance Council's TPC-H benchmark for database sizes of 100 GB, 300 GB, 1 TB and 3 TB on non-clustered hardware.

References

  1. 1 2 Coulouris, G. F.; Evans, J. M.; Mitchell, R. W. (1972). "Towards Content-Addressing in Data Bases". The Computer Journal. 15 (2): 95. doi: 10.1093/comjnl/15.2.95 .
  2. Leung, C. H. C. and Wong, K. S., 'File Processing Efficiency on the Content Addressable File Store', Proc VLDB 1985. http://www.vldb.org/conf/1985/P282.PDF
  3. Scarrott, Gordon G., 'From Torsional Mode Delay Lines to DAP', Computer Resurrection, Number 12, Summer 1995, ISSN 0958-7403, pp. 19-28. http://www.cs.manchester.ac.uk/CCS/res/pdfs/res12.pdf%5B%5D