DBM (computing)

Last updated

In computing, a DBM is a library and file format providing fast, single-keyed access to data. A key-value database from the original Unix, dbm is an early example of a NoSQL system. [1] [2] [3]

Contents

History

The original dbm library and file format was a simple database engine, originally written by Ken Thompson and released by AT&T in 1979. The name is a three-letter acronym for DataBase Manager, and can also refer to the family of database engines with APIs and features derived from the original dbm.

The dbm library stores arbitrary data by use of a single key (a primary key) in fixed-size buckets and uses hashing techniques to enable fast retrieval of the data by key.

The hashing scheme used is a form of extendible hashing, so that the hashing scheme expands as new buckets are added to the database, meaning that, when nearly empty, the database starts with one bucket, which is then split when it becomes full. The two resulting child buckets will themselves split when they become full, so the database grows as keys are added.

The dbm library and its derivatives are pre-relational databases   they manage associative arrays, implemented as on-disk hash tables. In practice, they can offer a more practical solution for high-speed storage accessed by key, as they do not require the overhead of connecting and preparing queries. This is balanced by the fact that they can generally only be opened for writing by a single process at a time. An agent daemon can handle requests from multiple processes, but introduces IPC overhead.

Implementations

The original AT&T dbm library has been replaced by its many successor implementations. Notable examples include: [3]

The following databases are dbm-inspired, but they do not directly provide a dbm interface, even though it would be trivial to wrap one:

Availability

As of 2001, the ndbm implementation of DBM was standard on Solaris and IRIX, whereas gdbm is ubiquitous on Linux. The Berkeley DB implementations were standard on some free operating systems. [2] [10] After a change of licensing of the Berkeley DB to GNU AGPL in 2013, projects like Debian have moved to LMDB. [11]

Reliability

A 2018 AFL fuzzing test against many DBM-family databases exposed many problems in implementations when it comes to corrupt or invalid database files. Only freecdb by Daniel J. Bernstein showed no crashes. The authors of gdbm, tdb, and lmdb were prompt to respond. Berkeley DB fell behind due to the sheer amount of other issues; [10] the fixes would be irrelevant to open-source software users due to the licensing change locking them back on an old version. [11]

See also

Related Research Articles

Berkeley DB (BDB) is an embedded database software library for key/value data, historically significant in open-source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbitrary key/data pairs as byte arrays and supports multiple data items for a single key. Berkeley DB is not a relational database, although it has database features including database transactions, multiversion concurrency control and write-ahead logging. BDB runs on a wide variety of operating systems, including most Unix-like and Windows systems, and real-time operating systems.

<span class="mw-page-title-main">MySQL</span> SQL database engine software

MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.

zlib DEFLATE codec library

zlib is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code.

The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. An implementation of ar is included as one of the GNU Binutils.

In computer programming, glob patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txttextfiles/ moves all files with names ending in .txt from the current directory to the directory textfiles. Here, * is a wildcard and *.txt is a glob pattern. The wildcard * stands for "any string of any length including empty, but excluding the path separator characters ".

<span class="mw-page-title-main">FOSDEM</span> Annual event in Brussels centered on free and open source software development

Free and Open source Software Developers' European Meeting (FOSDEM) is a non-commercial, volunteer-organized European event centered on free and open-source software development. It is aimed at developers and anyone interested in the free and open-source software movement. It aims to enable developers to meet and to promote the awareness and use of free and open-source software.

Windows Services for UNIX (SFU) is a discontinued software package produced by Microsoft which provided a Unix environment on Windows NT and some of its immediate successor operating-systems.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

<span class="mw-page-title-main">LAMP (software bundle)</span> Acronym for a common web hosting solution

A LAMP is one of the most common software stacks for the web's most popular applications. Its generic software stack model has largely interchangeable components.

<span class="mw-page-title-main">History of free and open-source software</span>

The history of free and open-source software begins at the advent of computer software in the early half of the 20th century. In the 1950s and 1960s, computer operating software and compilers were delivered as a part of hardware purchases without separate fees. At the time, source code—the human-readable form of software—was generally distributed with the software, providing the ability to fix bugs or add new functions. Universities were early adopters of computing technology. Many of the modifications developed by universities were openly shared, in keeping with the academic principles of sharing knowledge, and organizations sprung up to facilitate sharing.

Sphinx is a fulltext search engine that provides text search functionality to client applications.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

<span class="mw-page-title-main">Drizzle (database server)</span>

Drizzle is a discontinued free software/open-source relational database management system (DBMS) that was forked from the now-defunct 6.0 development branch of the MySQL DBMS.

The following outline is provided as an overview of and topical guide to the Perl programming language:

<span class="mw-page-title-main">Key–value database</span> Data storage paradigm

A key–value database, or key–value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, and a data structure more commonly known today as a dictionary or hash table. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to find the data within the database.

crypt is a POSIX C library function. It is typically used to compute the hash of user account passwords. The function outputs a text string which also encodes the salt, and identifies the hash algorithm used. This output string forms a password record, which is usually stored in a text file.

Lightning Memory-Mapped Database (LMDB) is an embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages. LMDB stores arbitrary key/data pairs as byte arrays, has a range-based search capability, supports multiple data items for a single key and has a special mode for appending records (MDB_APPEND) without checking for consistency. LMDB is not a relational database, it is strictly a key-value store like Berkeley DB and DBM.

The following outline is provided as an overview of and topical guide to MySQL:

References

  1. Kew 2007, p. 80: "DBMs have been with us since the early days of computing, when the need for fast keyed lookups was recognized. The original DBM is a UNIX-based library and file format for fast, highly-scalable keyed access to data. It was followed (in order) by NDBM ('new DBM'), GDBM ('GNU DBM'), and the Berkeley DB. This last is by far the most advanced, and the only DBM under active development today. Nevertheless, all of the DBMs from NDBM onward provide the same core functionality used by most programs, including Apache. A minimal-implementation SDBM is also bundled with APR, and is available to applications along with the other DBMs.
    Although NDBM is now old - like the city named New Town ('Neapolis') by the Greeks in about 600BC and still called Naples today - it remains the baseline DBM. NDBM was used by early Apache modules such as the Apache 1.x versions of mod_auth_dbm and mod_rewrite. Both GDBM and Berkeley DB provide NDBM emulations, and Linux distributions ship with one or other of these emulations in place of the 'real' NDBM, which is excluded for licensing reasons. Unfortunately, the various file formats are totally incompatible, and there are subtle differences in behaviour concerning database locking. These issues led a steady stream of Linux users to report problems with DBMs in Apache 1.x."
  2. 1 2 Hazel 2001, p. 500: "The most common [single-key] format is called DBM. Most modern versions of Unix have a DBM library installed as standard, though this is not true of some older systems. The two most common DBM libraries are ndbm (standard on Solaris and IRIX) and Berkeley DB Version 2 or 3 (standard on several free operating systems). Exim supports both of these, as well as the older Berkeley DB Version 1, gdbm, and tdb."
  3. 1 2 Ladd & O'Donnell 2001, pp. 823–824: "Most UNIX systems have some kind of DBM database. DBM is a set of library routines that manages data files consisting of key and value pairs. The DBM routines control how users enter and retrieve information from the database. Although it isn't the most powerful mechanism for storing information, using DBM is a faster method of retrieving information than using a flat file. Because most UNIX sites use one of the DBM libraries, the tools you need to store your information in a DBM database are readily available.
    Almost as many flavors of the DBM libraries exist as there are UNIX systems. Although most of these libraries are compatible with each other, they all basically work the same way...
    A list follows of some of the most popular DBM libraries available:
    • DBM - DBM stores the database in two files. The first has the extension .Pag and contains the bitmap. The second, which has the extension .Dir, contains the data.
    • NDBM - NDBM is much like DBM but with a few additional features; it was written to provide better storage and retrieval methods. Also, NDBM enables you to open many databases, unlike DBM, in which you are allowed to have only one database open within your script. Like DBM, NDBM stores its information in two files using the extensions .Pag and .Dir.
    • SDBM - SDBM comes with the Perl archive, which has been ported to many platforms. Therefore, you can use DBM databases as long as a version of Perl exists for your computer. SDBM was written to match the functions provided with NDBM, so portability of code shouldn't be a problem. Perl is available on just about all popular platforms.
    • GDBM - GDBM is the GNU version of the DBM family of database routines. GDBM also enables you to cache data, reducing the time that it takes to write to the database. The database has no size limit; its size depends completely on your system's resources. GDBM database files have the extension .Db. Unlike DBM and NDBM, both of which use two files, GDBM only uses one file.
    • Berkeley db - The Berkeley db expands on the original DBM routines significantly. The Berkeley db uses hashed tables the same as the other DBM databases, but the library also can create databases based on a sorted balanced binary tree (BTREE) and store information with a record line number (RECNO). The method that you use depends completely on how you want to store and retrieve the information from a database. Berkeley db creates only one file, which has no extension."
  4. "Crash Tolerance". GDBM manual. Retrieved 3 October 2021.
  5. "Crashproofing the Original NoSQL Key-Value Store" . Retrieved 3 October 2021.
  6. yigit, ozan. "sdbm.bun". cse.yorku.ca. Retrieved 8 May 2019.
  7. "Ruby SDBM library". SDBM on Github. Note that Ruby used to ship SDBM in the standard distribution up until version 2.7, after which it was made available only as an external library, similarly to the DBM and GDBM libraries, removed from the standard library in Ruby 3.1.
  8. "QDBM: Quick Database Manager". fallabs.com. 2006. Archived from the original on 2020-02-27. Retrieved 2020-02-27.
  9. "tdb: Main Page". tdb.samba.org.
  10. 1 2 Debroux, Lionel (16 Jun 2018). "oss-security - Fun with DBM-type databases..." openwall.com.
  11. 1 2 Surý, Ondřej (19 June 2014). "New project goal: Get rid of Berkeley DB (post jessie)". debian-devel (Mailing list). Debian.

Bibliography