Lightning Memory-Mapped Database

Last updated
OpenLDAP Lightning Memory-Mapped Database
Original author(s) Howard Chu
Developer(s) Symas
Initial releaseNovember 24, 2011;12 years ago (2011-11-24)
Stable release
0.9.31 / 10 July 2023;6 months ago (10 July 2023)
Repository
Written in C
Operating system Unix, Linux, Windows, AIX, Sun Solaris, SCO Unix, macOS, iOS
Size 64 KB
Type Embedded database
License OpenLDAP Public License (permissive software license)
Website symas.com/lmdb

Lightning Memory-Mapped Database (LMDB) is an embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages. LMDB stores arbitrary key/data pairs as byte arrays, has a range-based search capability, supports multiple data items for a single key and has a special mode for appending records (MDB_APPEND) without checking for consistency. [1] LMDB is not a relational database, it is strictly a key-value store like Berkeley DB and DBM.

Contents

LMDB may also be used concurrently in a multi-threaded or multi-processing environment, with read performance scaling linearly by design. LMDB databases may have only one writer at a time, however unlike many similar key-value databases, write transactions do not block readers, nor do readers block writers. LMDB is also unusual in that multiple applications on the same system may simultaneously open and use the same LMDB store, as a means to scale up performance. Also, LMDB does not require a transaction log (thereby increasing write performance by not needing to write data twice) because it maintains data integrity inherently by design.

History

LMDB's design was first discussed in a 2009 post to the OpenLDAP developer mailing list, [2] in the context of exploring solutions to the cache management difficulty caused by the project's dependence on Berkeley DB. A specific goal was to replace the multiple layers of configuration and caching inherent to Berkeley DB's design with a single, automatically managed cache under the control of the host operating system.

Development subsequently began, initially as a fork of a similar implementation from the OpenBSD ldapd project. [3] The first publicly available version appeared in the OpenLDAP source repository in June 2011. [4]

The project was known as MDB until November 2012, after which it was renamed in order to avoid conflicts with existing software. [5]

Technical description

Internally LMDB uses B+ tree data structures. The efficiency of its design and small footprint had the unintended side-effect of providing good write performance as well. LMDB has an API similar to Berkeley DB and dbm. LMDB treats the computer's memory as a single address space, shared across multiple processes or threads using shared memory with copy-on-write semantics (known historically as a single-level store). Most former modern computing architectures had a 32-bit memory address space, imposing a hard limit of 4 GB on the size of any database that directly mapped into a single-level store. However, today's 64-bit processors now mostly implement 48-bit address spaces, giving access to 47-bit addresses or 128 TB of database size, [6] making databases using shared memory useful once again in real-world applications.

Specific noteworthy technical features of LMDB are:

The file format of LMDB is, unlike that of Berkeley DB, architecture-dependent. This means that a conversion must be done before moving a database from a 32-bit machine to a 64-bit machine, [8] or between computers of differing endianness. [9]

Concurrency

LMDB employs multiversion concurrency control (MVCC) and allows multiple threads within multiple processes to coordinate simultaneous access to a database. Readers scale linearly by design. [10] [11] While write transactions are globally serialized via a mutex, read-only transactions operate in parallel, including in the presence of a write transaction. They are entirely wait free except for the first read-only transaction on a thread. Each thread reading from a database gains ownership of an element in a shared memory array, which it may update to indicate when it is within a transaction. Writers scan the array to determine the oldest database version the transaction must preserve without requiring direct synchronization with active readers.

Performance

In 2011, Google published software that allowed users to generate micro-benchmarks comparing LevelDB's performance to SQLite and Kyoto Cabinet in different scenarios. [12] In 2012, Symas added support for LMDB and Berkeley DB and made the updated benchmarking software publicly available. [13] The resulting benchmarks showed that LMDB outperformed all other databases in read and batch write operations. SQLite with LMDB excelled in write operations, and particularly so on synchronous/transactional writes.

The benchmarks showed the underlying filesystem as having a big influence on performance. JFS with an external journal performs well, especially compared to other modern systems like Btrfs and ZFS. [14] [15] Zimbra has tested back-mdb vs back-hdb performance in OpenLDAP, with LMDB clearly outperforming the BDB based back-hdb. [16] Many other OpenLDAP users have observed similar benefits. [17]

Since the initial benchmarking work done in 2012, multiple follow-on tests have been conducted with additional database engines for both in-memory [18] and on-disk [19] workloads characterizing the performance across multiple CPUs and record sizes. These tests show that LMDB performance is unmatched on all in-memory workloads and excels in all disk-bound read workloads and disk-bound write workloads using large record sizes. The benchmark driver code was subsequently published on GitHub [20] and further expanded in database coverage.

Reliability

LMDB was designed to resist data loss in the face of system and application crashes. Its copy-on-write approach never overwrites currently-in-use data. Avoiding overwrites means the structure on disk/storage is always valid, so application or system crashes can never leave the database in a corrupted state. In its default mode, at worst, a crash can lose data from the last not-yet-committed write transaction. Even with all asynchronous modes enabled, it is only an OS catastrophic failure or hardware power-loss [21] event rather than merely an application crash that could potentially result in any data corruption.

Two academic papers from the USENIX OSDI Symposium [22] covered failure modes of DB engines (including LMDB) under a sudden power loss or system crash. [23] [24] The paper from Pillai et al., did not find any failure in LMDB that would occur in the real-world file systems considered; the single failure identified by the study in LMDB only relates to hypothetical file systems. [25] The Mai Zheng et al. paper claims to point out failures in LMDB, but the conclusion depends on whether fsync or fdatasync is utilised. Using fsync ameliorates the problem. The selection of fsync or fdatasync is a compile-time switch that is not the default behavior in current Linux builds of LMDB but is the default on macOS, *BSD, Android, and Windows. Default Linux builds of LMDB are, therefore, the only ones vulnerable to the problem discovered by the zhengmai researchers however, LMDB may simply be rebuilt by Linux users to utilise fsync instead. [26]

When provided with a corrupt database, such as one produced by fuzzing, LMDB may crash. LMDB's author considers the case unlikely to be concerning but has produced a partial fix in a separate branch. [27]

Open source license

In June 2013, Oracle changed the license of Berkeley DB (a related project) from the Sleepycat license to the Affero General Public License, [28] thus restricting its use in a wide variety of applications. This caused the Debian project to exclude the library from 6.0 onwards. It was also criticized that this license is not friendly to commercial redistributors. The discussion was sparked over whether the same licensing change could happen to LMDB. Author Howard Chu clarified that LMDB is part of the OpenLDAP project, which had its BSD-style license before he joined, and it will stay like it. No copyright is transferred to anybody by checking in, which would make a similar move like Oracle's impossible. [29] [30] [31] [32] [33] [34] [35] [36] [37]

The Berkeley DB license issue has caused major Linux distributions such as Debian to completely phase out their use of Berkeley DB, with a preference for LMDB. [38]

API and uses

There are wrappers for several programming languages, such as C++, [39] [40] Java, [41] Python, [42] [43] Lua, [44] Rust, [45] [46] Go, [47] Ruby, [48] Objective C, [49] Javascript, [50] C#, [51] Perl, [52] PHP, [53] Tcl [54] and Common Lisp. [55] A complete list of wrappers may be found on the main web site. [56]

Howard Chu ported SQLite 3.7.7.1 to use LMDB instead of its original B-tree code, calling the end result SQLightning. [57] One cited insert test of 1000 records was 20 times faster (than the original SQLite with its B-Tree implementation). [58] LMDB is available as a backing store for other open source projects including Cyrus SASL, [59] Heimdal Kerberos, [60] and OpenDKIM. [61] It is also available in some other NoSQL projects like MemcacheDB [62] and Mapkeeper. [63] LMDB was used to make the in-memory store Redis persist data on disk. The existing back-end in Redis showed pathological behaviour in rare cases, and a replacement was sought. The baroque API of LMDB was criticized though, forcing a lot of coding to get simple things done. However, its performance and reliability during testing was considerably better than the alternative back-end stores that were tried. [64]

An independent third-party software developer utilised the Python bindings to LMDB [65] in a high-performance environment and published, on the technology news site Slashdot, how the system managed to successfully sustain 200,000 simultaneous read, write and delete operations per second (a total of 600,000 database operations per second). [66] [67]

An up-to-date list of applications using LMDB is maintained on the main web site. [68]

Application support

Many popular free software projects distribute or include support for LMDB, often as the primary or sole storage mechanism.

Technical reviews of LMDB

LMDB makes novel use of well-known computer science techniques such as copy-on-write semantics and B+ trees to provide atomicity and reliability guarantees as well as performance that can be hard to accept, given the library's relative simplicity and that no other similar key-value store database offers the same guarantees or overall performance, even though the authors explicitly state in presentations that LMDB is read-optimised not write-optimised. Additionally, as LMDB was primarily developed for use in OpenLDAP, its developers are focused mainly on the development and maintenance of OpenLDAP, not on LMDB per se. The developers limited time spent presenting the first benchmark results was therefore criticized as not stating limitations and for giving a "silver bullet impression" not adequate to address an engineers attitude [79] (it has to be pointed out that the concerns raised however were later adequately addressed to the reviewer's satisfaction by the key developer behind LMDB. [80] )

The presentation did spark other database developers to dissect the code in-depth to understand how and why it works. Reviews run from brief [81] to in-depth. Database developer Oren Eini wrote a 12-part series of articles on his analysis of LMDB, beginning July 9, 2013. The conclusion was in the lines of "impressive codebase ... dearly needs some love", mainly because of too long methods and code duplication. [82] This review, conducted by a .NET developer with no former experience of C, concluded on August 22, 2013 with "beyond my issues with the code, the implementation is really quite brilliant. The way LMDB manages to pack so much functionality by not doing things is quite impressive... I learned quite a lot from the project, and it has been frustrating, annoying and fascinating experience". [83]

Multiple other reviews cover LMDB [84] [85] in various languages including Chinese. [86] [87]

Related Research Articles

Berkeley DB (BDB) is an embedded database software library for key/value data, historically significant in open source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbitrary key/data pairs as byte arrays, and supports multiple data items for a single key. Berkeley DB is not a relational database, although it has database features including database transactions, multiversion concurrency control and write-ahead logging. BDB runs on a wide variety of operating systems including most Unix-like and Windows systems, and real-time operating systems.

<span class="mw-page-title-main">MySQL</span> SQL database engine software

MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

<span class="mw-page-title-main">OpenLDAP</span>

OpenLDAP is a free, open-source implementation of the Lightweight Directory Access Protocol (LDAP) developed by the OpenLDAP Project. It is released under its own BSD-style license called the OpenLDAP Public License.

Memcached is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read. Memcached is free and open-source software, licensed under the Revised BSD license. Memcached runs on Unix-like operating systems and on Microsoft Windows. It depends on the libevent library.

The Sleepycat License is a copyleft free software license used by Oracle Corporation for the open-source editions of Berkeley DB, Berkeley DB Java Edition and Berkeley DB XML embedded database products older than version 6.0.20.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

In computing, a DBM is a library and file format providing fast, single-keyed access to data. A key-value database from the original Unix, dbm is an early example of a NoSQL system.

<span class="mw-page-title-main">GNU Affero General Public License</span> Free software license based on the AGPLv1 and GPLv3

The GNU Affero General Public License is a free, copyleft license published by the Free Software Foundation in November 2007, and based on the GNU GPL version 3 and the Affero General Public License.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

MongoDB is a source-available, cross-platform, document-oriented database program. Classified as a NoSQL database product, MongoDB utilizes JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and current versions are licensed under the Server Side Public License (SSPL). MongoDB is a member of the MACH Alliance.

MemcacheDB is a persistence enabled variant of memcached. MemcacheDB has not been actively maintained since 2009. It is a general-purpose distributed memory caching system often used to speed up dynamic database-driven websites by caching data and objects in memory. It was developed by Steve Chu and Howard Chu. The main difference between MemcacheDB and memcached is that MemcacheDB has its own key-value database system. based on Berkeley DB, so it is meant for persistent storage rather than limited to a non-persistent cache. A version of MemcacheDB using Lightning Memory-Mapped Database (LMDB) is also available, offering greater performance. MemcacheDB is accessed through the same protocol as memcached, so applications may use any memcached API as a means of accessing a MemcacheDB database.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

LevelDB is an open-source on-disk key-value store written by Google fellows Jeffrey Dean and Sanjay Ghemawat. Inspired by Bigtable, LevelDB source code is hosted on GitHub under the New BSD License and has been ported to a variety of Unix-based systems, macOS, Windows, and Android.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

The Yahoo! Cloud Serving Benchmark (YCSB) is an open-source specification and program suite for evaluating retrieval and maintenance capabilities of computer programs. It is often used to compare the relative performance of NoSQL database management systems.

<span class="mw-page-title-main">RocksDB</span> Embedded key-value database

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

HammerDB is an open source database benchmarking application developed by Steve Shaw. HammerDB supports databases such as Oracle, SQL Server, Db2, MySQL and MariaDB. HammerDB is written in TCL and C, and is licensed under the GPL v3.

References

  1. 1 2 LMDB Reference Guide Retrieved on 2023-03-21
  2. back-mdb - futures. Retrieved on 2014-10-19
  3. MDB: A Memory-Mapped Database and Backend for OpenLDAP. Retrieved 2018-10-22
  4. First public version of MDB source code. Retrieved 2020-03-16
  5. MDB renamed to LMDB. Retrieved 2020-03-16
  6. Chu, Howard (2011). MDB: A Memory-Mapped Database and Backend for OpenLDAP (PDF). LDAPCon..
  7. B+ tree#Implementation
  8. "The LMDB file format". Separate Concern. Retrieved 27 February 2020.
  9. Chu, Howard. "lmdb - Is the Monero blockchain database portable between 32 and 64 bit architectures, and little/big endian architectures?". Monero Stack Exchange.
  10. scaling benchmarks for LMDB
  11. in-memory benchmark scaling for LMDB
  12. "LevelDB Benchmarks". Google, Inc. Archived from the original on 20 August 2011. Retrieved 8 August 2014.
  13. Chu, Howard. "Database Microbenchmarks". Symas Corp. Archived from the original on 9 August 2014. Retrieved 8 August 2014.
  14. "MDB Microbenchmarks". Symas Corp., 2012-09
  15. Database Microbenchmarks, Symas Corp., 2012-07.
  16. "OpenLDAP MDB vs HDB performance". Zimbra, Inc.
  17. "OpenLDAP: A comparison of back-mdb and back-hdb performance". 16 May 2013. Retrieved 8 May 2017.
  18. Chu, Howard. "In-Memory Microbenchmark". Symas Corp. Archived from the original on 2014-12-09. Retrieved 2014-12-06.
  19. Chu, Howard. "On-Disk Microbenchmark". Symas Corp. Archived from the original on 2014-12-09. Retrieved 2014-12-06.
  20. "Benchmark Drivers". GitHub .
  21. "LMDB Corruption detection".
  22. "OSDI 2014". 2013-02-08.
  23. Langston, Mark C.; Skelly, Hal (2014). OSDI 2014, All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications. pp. 433–448. ISBN   9781931971164.
  24. Langston, Mark C.; Skelly, Hal (2014). OSDI 2014, Torturing Databases for Fun and Profit. pp. 449–464. ISBN   9781931971164.
  25. "Archive of discussion regarding the Usenix 2014 pillai paper".
  26. "LMDB Crash consistency discussion".
  27. Debroux, Lionel (16 Jun 2018). "oss-security - Fun with DBM-type databases..." openwall.com.
  28. "Berkeley DB Release Announcement". Oracle Corporation. 11 June 2013. Starting with the 6.0 / 12c releases, all Berkeley DB products are licensed under the GNU AFFERO GENERAL PUBLIC LICENSE (AGPL), version 3. This license is published by the Free Software Foundation (FSF) (1) and approved by the Open Source Initiative (2). Please review the terms of the license to ensure compliance before upgrading to the 12c release. Previous releases of Berkeley DB software will continue to be distributed under the Sleepycat license.
  29. Ondřej Surý (July 2, 2013). "Berkeley DB 6.0 license change to AGPLv3". debian-devel (Mailing list). Debian.
  30. Simon Phipps (July 5, 2013). "Oracle switches Berkeley DB license". InfoWorld.
  31. "Oracle Quietly Switches BerkeleyDB to AGPL". Slashdot. 5 July 2013.
  32. "Oracle меняет лицензию Berkeley DB" [Oracle Berkeley DB license changes]. Programmers in Ukraine (in Russian). Blogspot. July 22, 2013.
  33. Jean Elyan (July 8, 2013). "Oracle passe Berkeley DB sous licence GNU AGPL" [Oracle Berkeley DB passes under GNU AGPL] (in French). Le Monde Informatique.
  34. Ondřej Surý (July 2, 2013). "Berkeley DB 6.0 vydána pod licencí AGPLv3" [Berkeley DB 6.0 is released under the GPLv3 license] (in Czech). Abclinuxu.
  35. Nathan Willis (July 10, 2013). "Debian, Berkeley DB, and AGPLv3". LWN.net.
  36. Dan Shearer (July 2, 2013). "Berkeley DB 6.0 license change to AGPLv3". debian-devel (Mailing list). Debian.
  37. Howard Chu (July 2, 2013). "Berkeley DB 6.0 license change to AGPLv3". debian-devel (Mailing list). Debian.
  38. Ondřej Surý (June 19, 2014). "New project goal: Get rid of Berkeley DB (post jessie)". debian-devel (Mailing list). Debian.
  39. LMDB C++11 wrapper, 2015-04
  40. LMDB C++ wrapper, 2012-11.
  41. LmdbJava, 2019-04
  42. LMDB Python wrapper, 2013-02
  43. py-lmdb. Retrieved on 2014-10-20.
  44. LMDB Lua wrapper, 2013-04.
  45. typed LMDB Rust wrapper, 2023-01
  46. high-level Rust wrapper, 2022-12
  47. LMDB Go wrapper, 2013-03
  48. LMDB Ruby wrapper, 2013-02
  49. LMDB Objective-C wrapper, 2013-04
  50. LMDB Node.js wrapper, 2013-05
  51. LMDB .Net wrapper, 2013-06
  52. LMDB Perl wrapper, 2013-08
  53. LMDB PHP wrapper, 2015-04
  54. tcl-lmdb, 2015-11
  55. Using LMDB from Common Lisp, 2016-04
  56. "Symas LMDB Tech Info".
  57. "gitorious.org Git - mdb:sqlightning.git/summary". gitorious.org. Archived from the original on 9 August 2013. Retrieved 8 May 2017.
  58. SQLightning tests.
  59. "Cyrus IMAP — Cyrus IMAP 3.0.1 (stable) documentation". cyrusimap.web.cmu.edu. Archived from the original on 30 April 2017. Retrieved 8 May 2017.
  60. "Heimdal". h5l.org. Retrieved 8 May 2017.
  61. "OpenDKIM". www.opendkim.org. Retrieved 8 May 2017.
  62. "gitorious.org Git - mdb:memcachedb.git/summary". gitorious.org. Retrieved 8 May 2017.
  63. "GitHub - m1ch1/mapkeeper: Thrift based key-value store with various storage backends, including MySQL, Berkeley DB, and LevelDB". github.com. Archived from the original on 9 February 2016.
  64. "Second Strike With Lightning". Anchor. 2013-05-09.
  65. "Python bindings to LMDB".
  66. "Python-LMDB in a high-performance environment on Slashdot". 17 October 2014.
  67. "Open letter to Howard Chu and David Wilson regarding Python-LMDB".
  68. "List of projects using LMDB".
  69. liblmdb0 in Debian. Retrieved 2014-10-20.
  70. D'Vine, Rhonda. "Ubuntu – Package Search Results -- lmdb-utils". packages.ubuntu.com. Retrieved 2 Jan 2018.
  71. LMDB in Fedora 20. Retrieved 2014-10-20.
  72. lmdb in OpenSUSE. Retrieved 2014-10-20.
  73. OpenLDAP back-mdb. Retrieved 2014-10-20
  74. Postfix lmdb_table(5). Retrieved 2014-10-20
  75. "CFEngine 3.6 Documentation - New in CFEngine". docs.cfengine.com. Retrieved 8 May 2017.
  76. "Google Groups". groups.google.com. Retrieved 8 May 2017.
  77. "Storage | Meilisearch Documentation v1.0" . Retrieved 21 Mar 2023.
  78. "LMDB-IndexedDB on GitHub". GitHub . Retrieved 2 Apr 2023.
  79. "LMDB: The Leveldb Killer?".
  80. "Response to LMDB review". symas.com. Archived from the original on 11 November 2020.
  81. "Lightning Memory-Mapped Database". Archived from the original on 14 March 2016.
  82. "Reviewing Lightning memory-mapped database library: Partial".
  83. "Some final notes about LMDB review".
  84. "Design Review: Key-Value Storage". mozilla.github.io. We propose the standardization of a simple key-value storage capability, based on LMDB, that is fast, compact, multi-process-capable, and equally usable from JS, Java, Rust, Swift, and C++.
  85. "LMDB". Sampath Herga. Archived from the original on 2013-08-29. Retrieved 2013-08-30.
  86. "lmdb简介 - 简书".
  87. "lmdb". Archived from the original on 5 March 2016. Retrieved 8 May 2017.