Database engine

Last updated

A database engine (or storage engine) is the underlying software component that a database management system (DBMS) uses to create, read, update and delete (CRUD) data from a database. Most database management systems include their own application programming interface (API) that allows the user to interact with their underlying engine without going through the user interface of the DBMS.

Contents

The term "database engine" is frequently used interchangeably with "database server" or "database management system". A "database instance" refers to the processes and memory structures of the running database engine.

Storage engines

Many of the modern DBMS support multiple storage engines within the same database. For example, MySQL supports InnoDB as well as MyISAM.

Some storage engines are transactional.

NameLicenseTransactionalCompatibilityNotes
Aria GPL No MariaDB and MySQL
Falcon GPLYesMySQLDiscontinued
InnoDB GPLYesMySQL and MariaDBThe default for MySQL and MariaDB
Memory GPLNoMySQL and MariaDB
MyISAM GPLNoMySQLWas the default for MySQL
InfiniDB GPLNo
TokuDB GPLYesMySQL and MariaDBUses fractal tree index
WiredTiger GPLYes MongoDB | Default for MongoDB
XtraDB GPLYesMariaDB and Percona Server for MySQL
RocksDB / MyRocks GPL v2 or Apache 2.0 Yes ArangoDB, Cassandra, MariaDB, MongoDB, MySQL, SurrealDB

Additional engine types include:

Design considerations

Information in a database is stored as bits laid out as data structures in storage that can be efficiently read from and written to given the properties of hardware. Typically the storage itself is designed to meet requirements of various areas that extensively utilize storage, including databases. A DBMS in operation always simultaneously utilizes several storage types (e.g., memory, and external storage), with respective layout methods.

In principle the database storage can be viewed as a linear address space, where every bit of data has its unique address in this address space. In practice, only a very small percentage of addresses are kept as initial reference points (which also requires storage); most data is accessed by indirection using displacement calculations (distance in bits from the reference points) and data structures which define access paths (using pointers) to all needed data in an effective manner, optimized for the needed data access operations.

Database storage hierarchy

A database, while in operation, resides simultaneously in several types of storage, forming a storage hierarchy. By the nature of contemporary computers most of the database part inside a computer that hosts the DBMS resides (partially replicated) in volatile storage. Data (pieces of the database) that are being processed/manipulated reside inside a processor, possibly in processor's caches. These data are being read from/written to memory, typically through a computer bus (so far typically volatile storage components). Computer memory is communicating data (transferred to/from) external storage, typically through standard storage interfaces or networks (e.g., fibre channel, iSCSI). A storage array, a common external storage unit, typically has storage hierarchy of its own, from a fast cache, typically consisting of (volatile and fast) DRAM, which is connected (again via standard interfaces) to drives, possibly with different speeds, like flash drives and magnetic disk drives (non-volatile). The drives may be connected to magnetic tapes, on which typically the least active parts of a large database may reside, or database backup generations.

Typically a correlation exists currently between storage speed and price, while the faster storage is typically volatile.

Data structures

A data structure is an abstract construct that embeds data in a well defined manner. An efficient data structure allows manipulation of the data in efficient ways. The data manipulation may include data insertion, deletion, updating and retrieval in various modes. A certain data structure type may be very effective in certain operations, and very ineffective in others. A data structure type is selected upon DBMS development to best meet the operations needed for the types of data it contains. Type of data structure selected for a certain task typically also takes into consideration the type of storage it resides in (e.g., speed of access, minimal size of storage chunk accessed, etc.). In some DBMSs database administrators have the flexibility to select among options of data structures to contain user data for performance reasons. Sometimes the data structures have selectable parameters to tune the database performance.

Databases may store data in many data structure types. [1] Common examples are the following:

Data orientation and clustering

In contrast to conventional row-orientation, relational databases can also be column-oriented or correlational in the way they store data in any particular structure.

In general, substantial performance improvement is gained if different types of database objects that are usually utilized together are laid in storage in proximity, being "clustered". This usually allows to retrieve needed related objects from storage in minimum number of input operations (each sometimes substantially time-consuming). Even for in-memory databases clustering provides performance advantage due to common utilization of large caches for input-output operations in memory, with similar resulting behavior.

For example, it may be beneficial to cluster a record of an "item" in stock with all its respective "order" records. The decision of whether to cluster certain objects or not depends on the objects' utilization statistics, object sizes, caches sizes, storage types, etc.

Database indexing

Indexing is a technique some storage engines use for improving database performance. The many types of indexes share the common property that they reduce the need to examine every entry when running a query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a binary search with an adjacent reference to the location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date).

Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as the database grows and database usage evolves. Indexes can speed up data access, but they consume space in the database, and must be updated each time the data is altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost.

Related Research Articles

<span class="mw-page-title-main">Computer data storage</span> Storage of digital data readable by computers

Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

<span class="mw-page-title-main">Computer memory</span> Computer component that stores information for immediate use

Computer memory stores information, such as data and programs for immediate use in the computer. The term memory is often synonymous with the term primary storage or main memory. An archaic synonym for memory is store.

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

ISAM, an acronym for Indexed Sequential Access Method, is a method for creating, maintaining, and manipulating computer files of data so that records can be retrieved sequentially or randomly by one or more keys. Indexes of key fields are maintained to achieve fast retrieval of required file records in indexed files. IBM originally developed ISAM for mainframe computers, but implementations are available for most computer systems.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It is also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

An in-memory database is a database management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems that employ a disk storage mechanism. In-memory databases are faster than disk-optimized databases because disk access is slower than memory access and the internal optimization algorithms are simpler and execute fewer CPU instructions. Accessing data in memory eliminates seek time when querying the data, which provides faster and more predictable performance than disk.

The Access Database Engine is a database engine on which several Microsoft products have been built. The first version of Jet was developed in 1992, consisting of three modules which could be used to manipulate a database.

Database tuning describes a group of activities used to optimize and homogenize the performance of a database. It usually overlaps with query tuning, but refers to design of the database files, selection of the database management system (DBMS) application, and configuration of the database's environment.

A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns, and more options for data compression. However, they are typically less efficient for inserting new data.

SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.

Database tables and indexes may be stored on disk in one of a number of forms, including ordered/unordered flat files, ISAM, heap files, hash buckets, or B+ trees. Each form has its own particular advantages and disadvantages. The most commonly used forms are B-trees and ISAM. Such forms or structures are one aspect of the overall schema used by a database engine to store information.

Database caching is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.

In computer science, memory virtualization decouples volatile random access memory (RAM) resources from individual systems in the data centre, and then aggregates those resources into a virtualized memory pool available to any computer in the cluster. The memory pool is accessed by the operating system or applications running on top of the operating system. The distributed memory pool can then be utilized as a high-speed cache, a messaging layer, or a large, shared memory resource for a CPU or a GPU application.

In computer storage, a global file system is a distributed file system that can be accessed from multiple locations, typically across a wide-area network, and provides concurrent access to a global namespace from all locations. In order for a file system to be considered global, it must allow for files to be created, modified, and deleted from any location. This access is typically provided by a cloud storage gateway at each edge location, which provides access using the NFS or SMB network file sharing protocols.

This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices.

InfinityDB is an all-Java embedded database engine and client/server DBMS with an extended java.util.concurrent.ConcurrentNavigableMap interface that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-tree architecture that enables client programmers to reach high levels of performance without risk of failures.

The following is provided as an overview of and topical guide to databases:

Solid-state storage (SSS) is a type of non-volatile computer storage that stores and retrieves digital information using only electronic circuits, without any involvement of moving mechanical parts. This differs fundamentally from the traditional electromechanical storage, which records data using rotating or linearly moving media coated with magnetic material.

In an enterprise server, a Caching SAN Adapter is a host bus adapter (HBA) for storage area network (SAN) connectivity which accelerates performance by transparently storing duplicate data such that future requests for that data can be serviced faster compared to retrieving the data from the source. A caching SAN adapter is used to accelerate the performance of applications across multiple clustered or virtualized servers and uses DRAM, NAND Flash or other memory technologies as the cache. The key requirement for the memory technology is that it is faster than the media storing the original copy of the data to ensure performance acceleration is achieved.

The following outline is provided as an overview of and topical guide to MySQL:

References

  1. Lightstone, S.; Teorey, T.; Nadeau, T. (2007). Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more. Morgan Kaufmann Press. ISBN   978-0-12-369389-1.