Database engine

Last updated November 26, 2024

A database engine (or storage engine) is the underlying software component that a database management system (DBMS) uses to create, read, update and delete (CRUD) data from a database. Most database management systems include their own application programming interface (API) that allows the user to interact with their underlying engine without going through the user interface of the DBMS.

Storage engines

Many of the modern DBMS support multiple storage engines within the same database. For example, MySQL supports InnoDB as well as MyISAM.

Some storage engines are transactional.

Name	License	Transactional	Compatibility	Notes
Aria	GPL	No	MariaDB and MySQL
Falcon	GPL	Yes	MySQL	Discontinued
InnoDB	GPL	Yes	MySQL and MariaDB	The default for MySQL and MariaDB
Memory	GPL	No	MySQL and MariaDB
MyISAM	GPL	No	MySQL	Was the default for MySQL
InfiniDB	GPL	No
TokuDB	GPL	Yes	MySQL and MariaDB	Uses fractal tree index
WiredTiger	GPL	Yes	MongoDB	Default for MongoDB
XtraDB	GPL	Yes	MariaDB and Percona Server for MySQL
RocksDB / MyRocks	GPL v2 or Apache 2.0	Yes	ArangoDB, Cassandra, MariaDB, MongoDB, MySQL, SurrealDB

Additional engine types include:

Embedded database engines
In-memory database engines

Design considerations

Information in a database is stored in the form of bits, laid out into data structures on storage hardware. These data structures are designed for efficient reads and writes to and from the storage hardware. Typically the storage hardware itself is designed to meet the requirements of various systems, including databases, that extensively utilize storage. An operating DBMS always utilizes several storage types simultaneously. These different storage types, such as flash memory and external disk storage, each require different data layout methods.

In principle, database storage can be viewed as a linear address space where every bit of data has a unique address. In practice, only a very small percentage of addresses are kept as initial reference points, which also require storage. Most data is accessed instead by indirection using displacement calculations (distance in bits from the reference points) and data structures which define access paths (using pointers) to all needed data in an effective manner, optimized for the needed data access operations.

Database storage hierarchy

A database, while in operation, resides simultaneously in several types of storage, forming a storage hierarchy. Inside of a contemporary computer hosting a DBMS, most of the "database" part resides, partially replicated, in volatile storage. Data that are actively being processed and manipulated reside inside the processor, possibly in processor's caches. These data are read from and written to memory, typically through a computer bus, which is usually a volatile storage component. Computer memory communicates data to and from external storage, typically through standard storage interfaces or networks (e.g., fibre channel, iSCSI).

A storage array, a common external storage unit, typically has storage hierarchy of its own. A fast cache, typically consisting of volatile and fast DRAM, is connected (via standard interfaces) to drives. These drives may have different speeds, like flash drives and non-volatile magnetic disk drives. Speed and price are generally correlated. The drives may be connected to magnetic tapes, on which the least active parts of a large database may reside. This may also be where backups are located.

Data structures

A data structure is an abstract construct that embeds data in a well defined manner. An efficient data structure allows manipulation of the data in efficient ways. The data manipulation may include data insertion, deletion, updating and retrieval in various modes. A certain data structure type may be very effective in certain operations, and very ineffective in others. A data structure type is selected upon DBMS development to best meet the operations needed for the types of data it contains. Type of data structure selected for a certain task typically also takes into consideration the type of storage it resides in (e.g., speed of access, minimal size of storage chunk accessed, etc.). In some DBMSs database administrators have the flexibility to select among options of data structures to contain user data for performance reasons. Sometimes the data structures have selectable parameters to tune the database performance.

Databases may store data in many data structure types.^[1] Common examples are the following:

Data orientation and clustering

In contrast to conventional row-orientation, relational databases can also be column-oriented or correlational in the way they store data in any particular structure.

In general, substantial performance improvement is gained if different types of database objects that are usually utilized together are laid in storage in proximity, being "clustered". This usually allows to retrieve needed related objects from storage in minimum number of input operations (each sometimes substantially time-consuming). Even for in-memory databases clustering provides performance advantage due to common utilization of large caches for input-output operations in memory, with similar resulting behavior.

For example, it may be beneficial to cluster a record of an "item" in stock with all its respective "order" records. The decision of whether to cluster certain objects or not depends on the objects' utilization statistics, object sizes, caches sizes, storage types, etc.

Database indexing

Indexing is a technique some storage engines use for improving database performance. The many types of indexes share the common property that they reduce the need to examine every entry when running a query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a binary search with an adjacent reference to the location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date).

Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as the database grows and database usage evolves. Indexes can speed up data access, but they consume space in the database, and must be updated each time the data is altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost.

Related Research Articles

Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference locality – temporal and spatial locality. Temporal locality refers to the reuse of specific data and/or resources within a relatively small time duration. Spatial locality refers to the use of data elements within relatively close storage locations. Sequential locality, a special case of spatial locality, occurs when data elements are arranged and accessed linearly, such as traversing the elements in a one-dimensional array.

Indexed Sequential Access Method (ISAM) is a method for creating, maintaining, and manipulating computer files of data so that records can be retrieved sequentially or randomly by one or more keys. Indexes of key fields are maintained to achieve fast retrieval of required file records in indexed files. IBM originally developed ISAM for mainframe computers, but implementations are available for most computer systems.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It is also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

The Access Database Engine is a database engine on which several Microsoft products have been built. The first version of Jet was developed in 1992, consisting of three modules which could be used to manipulate a database.

A NetApp FAS is a computer storage product by NetApp running the ONTAP operating system; the terms ONTAP, AFF, ASA, FAS are often used as synonyms. "Filer" is also used as a synonym although this is not an official name. There are three types of FAS systems: Hybrid, All-Flash, and All SAN Array:

NetApp proprietary custom-build hardware appliances with HDD or SSD drives called hybrid Fabric-Attached Storage
NetApp proprietary custom-build hardware appliances with only SSD drives and optimized ONTAP for low latency called ALL-Flash FAS
All SAN Array build on top of AFF platform, and provide only SAN-based data protocol connectivity.

Database tuning describes a group of activities used to optimize and homogenize the performance of a database. It usually overlaps with query tuning, but refers to design of the database files, selection of the database management system (DBMS) application, and configuration of the database's environment.

A solid-state drive (SSD) is a type of solid-state storage device that uses integrated circuits to store data persistently. It is sometimes called semiconductor storage device, solid-state device, or solid-state disk.

SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.

Database tables and indexes may be stored on disk in one of a number of forms, including ordered/unordered flat files, ISAM, heap files, hash buckets, or B+ trees. Each form has its own particular advantages and disadvantages. The most commonly used forms are B-trees and ISAM. Such forms or structures are one aspect of the overall schema used by a database engine to store information.

In computer science, memory virtualization decouples volatile random access memory (RAM) resources from individual systems in the data center, and then aggregates those resources into a virtualized memory pool available to any computer in the cluster. The memory pool is accessed by the operating system or applications running on top of the operating system. The distributed memory pool can then be utilized as a high-speed cache, a messaging layer, or a large, shared memory resource for a CPU or a GPU application.

In computer storage, a global file system is a distributed file system that can be accessed from multiple locations, typically across a wide-area network, and provides concurrent access to a global namespace from all locations. In order for a file system to be considered global, it must allow for files to be created, modified, and deleted from any location. This access is typically provided by a cloud storage gateway at each edge location, which provides access using the NFS or SMB network file sharing protocols.

This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices.

InfinityDB is an all-Java embedded database engine and client/server DBMS with an extended java.util.concurrent.ConcurrentNavigableMap interface that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-tree architecture that enables client programmers to reach high levels of performance without risk of failures.

eXtremeDB is a high-performance, low-latency, ACID-compliant embedded database management system using an in-memory database system (IMDS) architecture and designed to be linked into C/C++ based programs. It runs on Windows, Linux, and other real-time and embedded operating systems.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

In an enterprise server, a Caching SAN Adapter is a host bus adapter (HBA) for storage area network (SAN) connectivity which accelerates performance by transparently storing duplicate data such that future requests for that data can be serviced faster compared to retrieving the data from the source. A caching SAN adapter is used to accelerate the performance of applications across multiple clustered or virtualized servers and uses DRAM, NAND Flash or other memory technologies as the cache. The key requirement for the memory technology is that it is faster than the media storing the original copy of the data to ensure performance acceleration is achieved.

The following outline is provided as an overview of and topical guide to MySQL:

References

↑ Lightstone, S.; Teorey, T.; Nadeau, T. (2007). Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more. Morgan Kaufmann Press. ISBN 978-0-12-369389-1.

External links

https://web.archive.org/web/20100330045149/http://dev.mysql.com/tech-resources/articles/storage-engine/part_3.html
MySQL Administrator's Bible Chapter 11 "Storage Engines"

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Physical_Database_Design-1] Lightstone, S.; Teorey, T.; Nadeau, T. (2007). Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more. Morgan Kaufmann Press. ISBN 978-0-12-369389-1.

[1]

v t e Database
Main	Requirements Theory Database object Models Database management system Machine Engine Server Application Connection datasource DSN Administrator Synonym Lock Types Tools
Languages	Data definition Data manipulation Query information retrieval
Security	Activity monitoring Audit Forensics Negative database
Design	Entities and relationships (and Enhanced notation) Normalization Schema Refactoring Cardinality
Programming	Abstraction layer Object–relational mapping
Management	Virtualization Tuning caching Migration Preservation Integrity
Lists of	Academic Biological Biodiversity Facial expression Online Online music Online real estate
See also	Database-centric architecture Intelligent database Two-phase locking Locks with ordered sharing Load file Publishing Halloween Problem Log shipping
WikiProject Category