Berkeley DB

Last updated
Berkeley DB
Original author(s) Margo Seltzer and Keith Bostic of Sleepycat Software
Developer(s) Sleepycat Software, later Oracle Corporation
Initial release1994;30 years ago (1994)
Stable release
18.1.40 [1] / May 29, 2020;
3 years ago
 (2020-05-29)
Written in C
Operating system Windows, Unix-like
Size ~1244 kB compiled on Windows x86
Type Embedded database, NoSQL Database
License Dual licensed (GNU Affero General Public License and proprietary license
Website www.oracle.com/database/technologies/related/berkeleydb.html

Berkeley DB (BDB) is an embedded database software library for key/value data, historically significant in open-source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbitrary key/data pairs as byte arrays and supports multiple data items for a single key. Berkeley DB is not a relational database, [2] although it has database features including database transactions, multiversion concurrency control and write-ahead logging. BDB runs on a wide variety of operating systems, including most Unix-like and Windows systems, and real-time operating systems.

Contents

BDB was commercially supported and developed by Sleepycat Software from 1996 to 2006. Sleepycat Software was acquired by Oracle Corporation in February 2006, who continued to develop and sell the C Berkeley DB library. In 2013 Oracle re-licensed BDB under the AGPL license [3] [4] and released new versions until May 2020. Bloomberg L.P. continues to develop a fork of the 2013 version of BDB within their Comdb2 database, under the original Sleepycat permissive license.

Origin

Berkeley DB originated at the University of California, Berkeley as part of BSD, Berkeley's version of the Unix operating system. After 4.3BSD (1986), the BSD developers attempted to remove or replace all code originating in the original AT&T Unix from which BSD was derived. In doing so, they needed to rewrite the Unix database package. [5] Seltzer and Yigit [6] created a new database, unencumbered by any AT&T patents: an on-disk hash table that outperformed the existing dbm libraries. Berkeley DB itself was first released in 1991 and later included with 4.4BSD. [5] In 1996 Netscape requested that the authors of Berkeley DB improve and extend the library, then at version 1.86, to suit Netscape's requirements for an LDAP server [7] and for use in the Netscape browser. That request led to the creation of Sleepycat Software. This company was acquired by Oracle Corporation in February 2006.

Berkeley DB 1.x releases focused on managing key/value data storage and are referred to as "Data Store" (DS). The 2.x releases added a locking system enabling concurrent access to data. This is what is known as "Concurrent Data Store" (CDS). The 3.x releases added a logging system for transactions and recovery, called "Transactional Data Store" (TDS). The 4.x releases added the ability to replicate log records and create a distributed highly available single-master multi-replica database. This is called the "High Availability" (HA) feature set. Berkeley DB's evolution has sometimes led to minor API changes or log format changes, but very rarely have database formats changed. Berkeley DB HA supports online upgrades from one version to the next by maintaining the ability to read and apply the prior release's log records.

Starting with the 6.0.21 (Oracle 12c) release, all Berkeley DB products are licensed under the GNU AGPL. [8] [9] Previously, Berkeley DB was redistributed under the 4-clause BSD license (before version 2.0), and the Sleepycat Public License, which is an OSI-approved open-source license as well as an FSF-approved free software license. [10] [11] The product ships with complete source code, build script, test suite, and documentation. The comprehensive feature along with the licensing terms have led to its use in a multitude of free and open-source software. Those who do not wish to abide by the terms of the GNU AGPL, or use an older version with the Sleepycat Public License, have the option of purchasing another proprietary license for redistribution from Oracle Corporation. This technique is called dual licensing.

Berkeley DB includes compatibility interfaces for some historic Unix database libraries: dbm, ndbm and hsearch (a System V and POSIX library for creating in-memory hash tables). [12]

Architecture

Berkeley DB has an architecture notably simpler than relational database management systems. Like SQLite and LMDB, it is not based on a server/client model, and does not provide support for network access  programs access the database using in-process API calls. Oracle added support for SQL in 11g R2 release based on the popular SQLite API by including a version of SQLite in Berkeley DB (it uses Berkeley DB for storage). [13]

A program accessing the database is free to decide how the data is to be stored in a record. Berkeley DB puts no constraints on the record's data. The record and its key can both be up to four gigabytes long.

Berkeley DB supports database features such as ACID transactions, fine-grained locking, hot backups and replication.

Oracle Corporation use of name "Berkeley DB"

The name "Berkeley DB" is used by Oracle Corporation for three different products, only one of which is BDB: [14]

  1. Berkeley DB, the C database library that is the subject of this article
  2. Berkeley DB Java Edition, [15] a pure Java library whose design is modelled after the C library but is otherwise unrelated
  3. Berkeley DB XML, [16] a C++ program that supports XQuery, and which includes a legacy version of the C database library

Open-source programs still using Berkeley DB

BDB was once very widespread, but usage dropped steeply from 2013 (see licensing section). Notable software that still uses Berkeley DB for data storage include:

Open-source operating systems and languages such as Perl and Python still support old BerkelyDB interfaces. The FreeBSD and OpenBSD operating systems ship Berkeley DB 1.8x to support the dbopen() [18] [19] operating system call used by password programs such as pwb_mkdb. [20] Linux operating systems, including those based on Debian, [21] and Fedora [22] ship Berkeley DB 5.3 libraries.

Licensing

Berkeley DB V2.0 and higher is available under a dual license:

  1. Oracle commercial license [23]
  2. The GNU AGPL v3. [24]

Switching the open source license in 2013 from the Sleepycat license to the AGPL had a major effect on open source software. Since BDB is a library, any application linking to it must be under an AGPL-compatible license. Many open source applications and all closed source applications would need to be relicensed to become AGPL-compatible, which was not acceptable to many developers and open source operating systems. By 2013 there were many alternatives to BDB, and Debian Linux was typical in their decision to completely phase out Berkeley DB, with a preference for the Lightning Memory-Mapped Database (LMDB). [25]

Related Research Articles

<span class="mw-page-title-main">Common Desktop Environment</span> Desktop environment for Unix, Unix-like, and OpenVMS operating systems

The Common Desktop Environment (CDE) is a desktop environment for Unix and OpenVMS, based on the Motif widget toolkit. It was part of the UNIX 98 Workstation Product Standard, and was for a long time the Unix desktop associated with commercial Unix workstations. It helped to influence early implementations of successor projects such as KDE and GNOME, which largely replaced CDE following the turn of the century.

<span class="mw-page-title-main">Free software</span> Software licensed to be freely used, modified and distributed

Free software, libre software, or libreware is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, not price; all users are legally free to do what they want with their copies of a free software regardless of how much is paid to obtain the program. Computer programs are deemed "free" if they give end-users ultimate control over the software and, subsequently, over their devices.

<span class="mw-page-title-main">GNU</span> Free software collection

GNU is an extensive collection of free software, which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operating systems popularly known as Linux. Most of GNU is licensed under the GNU Project's own General Public License (GPL).

In software engineering, a project fork happens when developers take a copy of source code from one software package and start independent development on it, creating a distinct and separate piece of software. The term often implies not merely a development branch, but also a split in the developer community; as such, it is a form of schism. Grounds for forking are varying user preferences and stagnated or discontinued development of the original software.

<span class="mw-page-title-main">Sleepycat Software</span> American technology company

Sleepycat Software, Inc. was the software company primarily responsible for maintaining the Berkeley DB packages from 1996 to 2006.

The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. It is a wrapper around the system calls of the Linux kernel for application use. Despite its name, it now also directly supports C++. It was started in the 1980s by the Free Software Foundation (FSF) for the GNU operating system.

<span class="mw-page-title-main">Ion (window manager)</span> Tiling and tabbing window manager

In Unix computing, Ion is a tiling and tabbing window manager for the X Window System. It is designed such that it is possible to manage windows using only a keyboard, without needing a mouse. It is the successor of PWM and is written by the same author, Tuomo Valkonen. Since the first release of Ion in the summer 2000, similar alternative window management ideas have begun to show in other new window managers: Larswm, ratpoison, StumpWM, wmii, xmonad and dwm.

<span class="mw-page-title-main">Computer Systems Research Group</span> Former American research group at University of California, Berkeley

The Computer Systems Research Group (CSRG) was a research group at the University of California, Berkeley that was dedicated to enhancing AT&T Unix operating system and funded by Defense Advanced Research Projects Agency.

<span class="mw-page-title-main">Keith Bostic (software engineer)</span> American software engineer

Keith Bostic is an American software engineer and one of the key people in the history of Berkeley Software Distribution (BSD) Unix and open-source software.

<span class="mw-page-title-main">LAMP (software bundle)</span> Acronym for a common web hosting solution

LAMP is an acronym denoting one of the most common software stacks for the web's most popular applications. Its generic software stack model has largely interchangeable components.

<span class="mw-page-title-main">Linux</span> Family of Unix-like operating systems

Linux is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution (distro), which includes the kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses and recommends the name "GNU/Linux" to emphasize the use and importance of GNU software in many distributions, causing some controversy.

In computing, a DBM is a library and file format providing fast, single-keyed access to data. A key-value database from the original Unix, dbm is an early example of a NoSQL system.

GNU variants are operating systems based upon the GNU operating system. According to the GNU project and others, these also include most operating systems using the Linux kernel and a few others using BSD-based kernels.

<span class="mw-page-title-main">History of free and open-source software</span>

In the 1950s and 1960s, computer operating software and compilers were delivered as a part of hardware purchases without separate fees. At the time, source code, the human-readable form of software, was generally distributed with the software providing the ability to fix bugs or add new functions. Universities were early adopters of computing technology. Many of the modifications developed by universities were openly shared, in keeping with the academic principles of sharing knowledge, and organizations sprung up to facilitate sharing. As large-scale operating systems matured, fewer organizations allowed modifications to the operating software, and eventually such operating systems were closed to modification. However, utilities and other added-function applications are still shared and new organizations have been formed to promote the sharing of software.

<span class="mw-page-title-main">GNU Affero General Public License</span> Free software license based on the AGPLv1 and GPLv3

The GNU Affero General Public License is a free, copyleft license published by the Free Software Foundation in November 2007, and based on the GNU GPL version 3 and the Affero General Public License (non-GNU).

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

Lightning Memory-Mapped Database (LMDB) is an embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages. LMDB stores arbitrary key/data pairs as byte arrays, has a range-based search capability, supports multiple data items for a single key and has a special mode for appending records (MDB_APPEND) without checking for consistency. LMDB is not a relational database, it is strictly a key-value store like Berkeley DB and DBM.

The Server Side Public License (SSPL) is a source-available copyleft software license introduced by MongoDB Inc. in 2018.

References

  1. "Oracle Berkeley DB Downloads" . Retrieved 27 September 2020.
  2. Berkeley DB Reference Guide: What is Berkeley DB not?. Doc.gnu-darwin.org (2001-05-31). Retrieved on 2013-09-18.
  3. "Major Release: Berkeley DB 12gR1 (12.1.6.0)". Open Source Projects at Oracle. 2013-06-10. Archived from the original on 2013-12-05. Retrieved 2021-04-11.
  4. Nathan, Willis (2013-07-10). "Debian, Berkeley DB, and AGPLv3". Linux Weekly News . Archived from the original on 2013-07-22.
  5. 1 2 Olson, Michael A.; Bostic, Keith; Seltzer, Margo (1999). "Berkeley DB" (PDF). Proc. FREENIX Track, USENIX Annual Tech. Conf. Archived (PDF) from the original on 2022-10-09. Retrieved October 20, 2009.
  6. Seltzer, Margo; Yigit, Ozan (1991). "A New Hashing Package for UNIX". Proc. USENIX Winter Tech. Conf. Retrieved October 20, 2009.
  7. Brunelli, Mark (March 28, 2005). "A Berkeley DB primer". Enterprise Linux News. Archived from the original on September 6, 2008. Retrieved December 28, 2008.
  8. [Berkeley DB Announce] Major Release: Berkeley DB 12gR1 (12.1.6.0). Retrieved July 5, 2013. (Despite AGPL mentions there, the source archive still declares BSD-4-Clause terms in 6.0.19.)
  9. "Snapshot of the 6.0.19 source at the time". 13 June 2013.
  10. "The Sleepycat License". Open Source Initiative. October 31, 2006. Retrieved December 28, 2008.
  11. "Licenses". Free Software Foundation. December 10, 2008. Archived from the original on December 16, 2008. Retrieved December 28, 2008.
  12. "Compatibility with historic UNIX interfaces". docs.oracle.com. Retrieved 2019-11-20.
  13. "Twitter / Gregory Burd: @humanications We didn't r ..."
  14. "Oracle Berkeley DB Downloads: Latest Production Releases".
  15. "Oracle Berkeley DB Java Edition". Archived from the original on 2017-07-11.
  16. "Berkeley DB XML". Archived from the original on 2016-07-18.
  17. "bogofilter -- Fast Bayesian Spam Filter Code (Git)". sourceforge.net. Retrieved 2024-05-17.
  18. "dbopen(3)". FreeBSD Manual Pages. Retrieved 2023-04-18.
  19. "OpenBSD manual pages". dbopen(3). Retrieved 2023-04-18.
  20. "pwd_mkdb(8)". OpenBSD manual pages. Retrieved 2023-04-18.
  21. Webmaster, Debian. "Debian -- Details of package libdb5.3 in sid". Debian -- Packages. Retrieved 2023-04-18.
  22. "Overview - rpms/libdb". src.fedoraproject.org. Retrieved 2023-04-18.
  23. "Download, license and sales information". Nov 30, 2017.
  24. "Major Release: Berkeley DB 12gR1 (12.1.6.0)". June 10, 2013. Retrieved July 15, 2013.
  25. Ondřej Surý (June 19, 2014). "New project goal: Get rid of Berkeley DB (post jessie)". debian-devel (Mailing list). Debian.