GiST

Last updated

In computing, GiST or Generalized Search Tree, is a data structure and API that can be used to build a variety of disk-based search trees. GiST is a generalization of the B+ tree, providing a concurrent and recoverable height-balanced search tree infrastructure without making any assumptions about the type of data being stored, or the queries being serviced. GiST can be used to easily implement a range of well-known indexes, including B+ trees, R-trees, hB-trees, RD-trees, and many others; it also allows for easy development of specialized indexes for new data types. It cannot be used directly to implement non-height-balanced trees such as quad trees or prefix trees (tries), though like prefix trees it does support compression, including lossy compression. GiST can be used for any data type that can be naturally ordered into a hierarchy of supersets. Not only is it extensible in terms of data type support and tree layout, it allows the extension writer to support any query predicates that they choose.

Contents

GiST is an example of software extensibility in the context of database systems: it allows the easy evolution of a database system to support new tree-based indexes. It achieves this by factoring out its core system infrastructure from a narrow API that is sufficient to capture the application-specific aspects of a wide variety of index designs. The GiST infrastructure code manages the layout of the index pages on disk, the algorithms for searching indexes and deleting from indexes, and complex transactional details such as page-level locking for high concurrency and write-ahead logging for crash recovery. This allows authors of new tree-based indexes to focus on implementing the novel features of the new index type for example, the way in which subsets of the data should be described for search without becoming experts in database system internals.

Although originally designed for answering Boolean selection queries, GiST can also support nearest-neighbor search, and various forms of statistical approximation over large data sets.

Implementations

The most widely used GiST implementation is in the PostgreSQL relational database; it was also implemented in the Informix Universal Server, and as a standalone library, libgist.

PostgreSQL

The PostgreSQL GiST implementation includes support for variable length keys, composite keys, concurrency control and recovery; these features are inherited by all GiST extensions. There are several contributed modules developed using GiST and distributed with PostgreSQL. For example:

The PostgreSQL GiST implementation provides the indexing support for the PostGIS (geographic information system) and the BioPostgres bioinformatics system.

Related Research Articles

Database Organized collection of data

In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues including supporting concurrent access and fault tolerance.

PostgreSQL Free and open-source relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the Ingres database developed at the University of California, Berkeley. In 1996, the project was renamed to PostgreSQL to reflect its support for SQL. After a review in 2007, the development team decided to keep the name PostgreSQL and the alias Postgres.

SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables. SQL offers two main advantages over older read–write APIs such as ISAM or VSAM. Firstly, it introduced the concept of accessing many records with one single command. Secondly, it eliminates the need to specify how to reach a record, e.g. with or without an index.

PostGIS Geospatial extension for the PostgreSQL Database

PostGIS is an open source software program that adds support for geographic objects to the PostgreSQL object-relational database. PostGIS follows the Simple Features for SQL specification from the Open Geospatial Consortium (OGC).

SQLite Serverless relational database management system (RDBMS)

SQLite is a relational database management system (RDBMS) contained in a C library. In contrast to many other database management systems, SQLite is not a client–server database engine. Rather, it is embedded into the end program.

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

Null (SQL) Marker used in SQL databases to indicate a value does not exist

Null or NULL is a special marker used in Structured Query Language to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL Null serves to fulfil the requirement that all true relational database management systems (RDMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent Null in database theory. In SQL, NULL is a reserved word used to identify this marker.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

H2 (DBMS)

H2 is a relational database management system written in Java. It can be embedded in Java applications or run in client-server mode.

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spatial databases allow the representation of simple geometric objects such as points, lines and polygons. Some spatial databases handle more complex structures such as 3D objects, topological coverages, linear networks, and TINs. While typical databases have developed to manage various numeric and character types of data, such databases require additional functionality to process spatial data types efficiently, and developers have often added geometry or feature data types. The Open Geospatial Consortium (OGC) developed the Simple Features specification and sets standards for adding spatial functionality to database systems. The SQL/MM Spatial ISO/IEC standard is a part the SQL/MM multimedia standard and extends the Simple Features standard with data types that support circular interpolations.

Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

The nested set model is a technique for representing nested sets in relational databases.

Michael Stonebraker American computer scientist

Michael Ralph Stonebraker is a computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databases. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB, and served as chief technical officer of Informix. For his contributions to database research, Stonebraker received the 2014 Turing Award, often described as "the Nobel Prize for computing."

In database management systems (DBMS), a prepared statement or parameterized statement is a feature used to pre-compile SQL code, separating it from data. Benefits of prepared statements are:

Transbase is a relational database management system, developed and maintained by Transaction Software GmbH, Munich. The development of Transbase was started in the 1980s by Rudolf Bayer under the name „Merkur“ at the department of Computer Science of the Technical University of Munich (TUM).

YugabyteDB Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

References