Grid file

Last updated

In computer science, a grid file or bucket grid is a point access method which splits a space into a non-periodic grid where one or more cells of the grid refer to a small set of points. Grid files (a symmetric data structure) provide an efficient method of storing these indexes on disk to perform complex data lookups.

Contents

It provides a grid of n-dimensions where n represents how many keys can be used to reference a single point.

Grid files do not contain any data themselves but instead contain references to the correct bucket.

Uses

A grid file is usually used in cases where a single value can be referenced by multiple keys.

A grid file began being used because "traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. They manifest various deficiencies in particular for multikey access to highly dynamic files." [1]

In a traditional single dimensional data structure (e.g. hash), a search on a single criterion is usually very simple but searching for a second criterion can be much more complex.

Grid files represent a special kind of hashing, where the traditional hash is replaced by a grid directory.

Examples

Census Database

Sources: [2] [3]

Consider a database containing data from a census. A single record represents a single household, and all records are grouped into buckets. All records in a bucket can be indexed by either their city (which is the same for all records in the bucket), and the streets in that city whose names begin with the same letter.

A grid file can be used to provide an efficient index for this structure, where records come in groupings of 26, each of them relating to street names in a city starting with one of the letters of the alphabet. This structure can be thought of as an array, table, or grid with two dimensions which we will call the x and y axes.

One may consider the x-axis to be the city and the y-axis to be each of the letters in the alphabet, or alternatively, the first letter of each street.

Each record in this structure is known as a cell. Each cell will contain a pointer to the appropriate bucket in the database where the actual data is stored. An extra cell, or record header, may be required to store the name of the city. Other cells grouped with it will only need to contain the pointer to their respective bucket, since the first cell corresponds to street names beginning with "A", the second to "B", and so on.

The database can be further extended to contain a continent field to expand the census to other continents. This would cause records in the same bucket to correspond to households on a street beginning with the same letter, in the same city, in the same continent.

The cells in the grid file would then consist of a city header, and six (one for each continent, not including Antarctica) groupings of 26 cells relating to the streets with the same starting letter, in the same city, on the same continent and could now be thought of as a three-dimensional array.

Advantages

Since a single entry in the grid file contains pointers to all records indexed by the specified keys: [4]

Disadvantages

However, because of the nature of the grid file, which gives it its advantages, there are also some disadvantages: [4]

See also

Related Research Articles

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems.

<span class="mw-page-title-main">Hash function</span> Mapping arbitrary data to fixed-size values

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.

<span class="mw-page-title-main">Hash table</span> Associative array for storing key-value pairs

In computing, a hash table, also known as a hash map or a hash set, is a data structure that implements an associative array, also called a dictionary, which is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

<span class="mw-page-title-main">Trie</span> K-ary search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of pairs, such that each possible key appears at most once in the collection. In mathematical terms, an associative array is a function with finite domain. It supports 'lookup', 'remove', and 'insert' operations.

ISAM, an acronym for Indexed Sequential Access Method, is a method for creating, maintaining, and manipulating computer files of data so that records can be retrieved sequentially or randomly by one or more keys. Indexes of key fields are maintained to achieve fast retrieval of required file records in indexed files. IBM originally developed ISAM for mainframe computers, but implementations are available for most computer systems.

Kademlia is a distributed hash table for decentralized peer-to-peer computer networks designed by Petar Maymounkov and David Mazières in 2002. It specifies the structure of the network and the exchange of information through node lookups. Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is formed by the participant nodes. Each node is identified by a number or node ID. The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values.

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.

Btrieve is a database developed by Pervasive Software. The architecture of Btrieve has been designed with record management in mind. This means that Btrieve only deals with the underlying record creation, data retrieval, record updating and data deletion primitives. Together with the MicroKernel Database Engine it uses ISAM, Indexed Sequential Access Method, as its underlying storage mechanism.

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

<span class="mw-page-title-main">Z-order curve</span> Mapping function that preserves data point locality

In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It is named in France after Henri Lebesgue, who studied it in 1904, and named in the United States after Guy Macdonald Morton, who first applied the order to file sequencing in 1966. The z-value of a point in multidimensions is simply calculated by interleaving the binary representations of its coordinate values. Once the data are sorted into this ordering, any one-dimensional data structure can be used, such as simple one dimensional arrays, binary search trees, B-trees, skip lists or hash tables. The resulting ordering can equivalently be described as the order one would get from a depth-first traversal of a quadtree or octree.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

Linear hashing (LH) is a dynamic data structure which implements a hash table and grows or shrinks one bucket at a time. It was invented by Witold Litwin in 1980. It has been analyzed by Baeza-Yates and Soza-Pollman. It is the first in a number of schemes known as dynamic hashing such as Larson's Linear Hashing with Partial Extensions, Linear Hashing with Priority Splitting, Linear Hashing with Partial Expansions and Priority Splitting, or Recursive Linear Hashing.

In computer science, a ternary search tree is a type of trie where nodes are arranged in a manner similar to a binary search tree, but with up to three children rather than the binary tree's limit of two. Like other prefix trees, a ternary search tree can be used as an associative map structure with the ability for incremental string search. However, ternary search trees are more space efficient compared to standard prefix trees, at the cost of speed. Common applications for ternary search trees include spell-checking and auto-completion.

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

Database tables and indexes may be stored on disk in one of a number of forms, including ordered/unordered flat files, ISAM, heap files, hash buckets, or B+ trees. Each form has its own particular advantages and disadvantages. The most commonly used forms are B-trees and ISAM. Such forms or structures are one aspect of the overall schema used by a database engine to store information.

In computer science, the Bx tree is a query that is used to update efficient B+ tree-based index structures for moving objects.

Rocket U2 is a suite of database management (DBMS) and supporting software now owned by Rocket Software. It includes two MultiValue database platforms: UniData and UniVerse. Both of these products are operating environments which run on current Unix, Linux and Windows operating systems. They are both derivatives of the Pick operating system. The family also includes developer and web-enabling technologies including SB/XA, U2 Web Development Environment (WebDE), UniObjects connectivity API and wIntegrate terminal emulation software.

References

  1. 1 2 J. Nievergelt, H. Hinterberger The Grid File: An Adaptable, Symmetric Multikey File Structure. Institut fur Informatik, ETH and K. C. Sevcik, 1984. Abstract, p. 1.
  2. Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Second Edition. Addison-Wesley, 1998. ISBN   0-201-89685-0. Section 6.5: Searching, pp. 564–566.
  3. Elmasri & Navathe Fundamentals of Database Systems, Third Edition. Addison-Wesley, 2000. ISBN   0-201-54263-3. Section 6.4.3: Grid Files, p. 185.
  4. 1 2 "Grid File". cs.sfu.ca. Retrieved 2016-11-27.