Navigational database

Last updated

A navigational database is a type of database in which records or objects are found primarily by following references from other objects. The term was popularized by the title of Charles Bachman's 1973 Turing Award paper, The Programmer as Navigator. [1] This paper emphasized the fact that the new disk-based database systems allowed the programmer to choose arbitrary navigational routes following relationships from record to record, contrasting this with the constraints of earlier magnetic-tape and punched card systems where data access was strictly sequential.

Contents

One of the earliest navigational databases was Integrated Data Store (IDS), which was developed by Bachman for General Electric in the 1960s. IDS became the basis for the CODASYL database model in 1969.

Although Bachman described the concept of navigation in abstract terms, the idea of navigational access came to be associated strongly with the procedural design of the CODASYL Data Manipulation Language. Writing in 1982, for example, Tsichritzis and Lochovsky [2] state that "The notion of currency is central to the concept of navigation." By the notion of currency, they refer to the idea that a program maintains (explicitly or implicitly) a current position in any sequence of records that it is processing, and that operations such as GET NEXT and GET PRIOR retrieve records relative to this current position, while also changing the current position to the record that is retrieved.

Navigational database programming thus came to be seen as intrinsically procedural; and moreover to depend on the maintenance of an implicit set of global variables (currency indicators) holding the current state. As such, the approach was seen as diametrically opposed to the declarative programming style used by the relational model. The declarative nature of relational languages such as SQL offered better programmer productivity and a higher level of data independence (that is, the ability of programs to continue working as the database structure evolves.) Navigational interfaces, as a result, were gradually eclipsed during the 1980s by declarative query languages.

During the 1990s it started becoming clear that for certain applications handling complex data (for example, spatial databases and engineering databases), the relational calculus had limitations. At that time, a reappraisal of the entire database market began, with several companies describing the new systems using the marketing term NoSQL. Many of these systems introduced data manipulation languages which, while far removed from the CODASYL DML with its currency indicators, could be understood as implementing Bachman's "navigational" vision. Some of these languages are procedural; others (such as XPath) are entirely declarative. Offshoots of the navigational concept, such as the graph database, found new uses in modern transaction processing workloads.

Description

Navigational access is traditionally associated with the network model and hierarchical model of database, and conventionally describes data manipulation APIs in which records (or objects) are processed one at a time, iteratively. The essential characteristic as described by Bachman, however, is finding records by virtue of their relationship to other records: so an interface can still be navigational if it has set-oriented features. [3] From this viewpoint, the key difference between navigational data manipulation languages and relational languages is the use of explicit named relationships rather than value-based joins: for department with name="Sales", find all employees in set department-employees versus find employees, departments where employee.department-code = department.code and department.name="Sales".

In practice, however, most navigational APIs have been procedural: the above query would be executed using procedural logic along the lines of the following pseudo-code:

get department with name='Sales' get first employee in set department-employees until end-of-set do {   get next employee in set department-employees   process employee } 

On this viewpoint, the key difference between navigational APIs and the relational model (implemented in relational databases) is that relational APIs use "declarative" or logic programming techniques that ask the system what to fetch, while navigational APIs instruct the system in a sequence of steps how to reach the required records.

Most criticisms of navigational APIs fall into one of two categories:

For many years the primary defence of navigational APIs was performance. Database systems that support navigational APIs often use internal storage structures that contain physical links or pointers from one record to another. While such structures may allow very efficient navigation, they have disadvantages because it becomes difficult to reorganize the physical placement of data. It is quite possible to implement navigational APIs without low-level pointer chasing (Bachman's paper envisaged logical relationships being implemented just as in relational systems, using primary keys and foreign keys), so the two ideas should not be conflated. But without the performance benefits of low-level pointers, navigational APIs become harder to justify.

Hierarchical models often construct primary keys for records by concatenating the keys that appear at each level in the hierarchy. Such composite identifiers are found in computer file names (/usr/david/docs/index.txt), in URIs, in the Dewey decimal system, and for that matter in postal addresses. Such a composite key can be considered as representing a navigational path to a record; but equally, it can be considered as a simple primary key allowing associative access.

As relational systems came to prominence in the 1980s, navigational APIs (and in particular, procedural APIs) were criticized and fell out of favour. The 1990s, however, brought a new wave of object-oriented databases that often provided both declarative and procedural interfaces. One explanation for this is that they were often used to represent graph-structured information (for example spatial data and engineering data) where access is inherently recursive: the mathematics underpinning SQL (specifically, first-order predicate calculus) does not have sufficient power to support recursive queries, even those as simple as a transitive closure.

A current example of a popular navigational API can be found in the Document Object Model (DOM) often used in web browsers and closely associated with JavaScript. The DOM is essentially an in-memory hierarchical database with an API that is both procedural and navigational. By contrast, the same data (XML or HTML) can be accessed using XPath, which can be categorized as declarative and navigational: data is accessed by following relationships, but the calling program does not issue a sequence of instructions to be followed in order. Languages such as SPARQL used to retrieve Linked Data from the Semantic Web are also simultaneously declarative and navigational.

Examples

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data stored and accessed electronically through the use of a database management system. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance.

<span class="mw-page-title-main">Object database</span> Type of database management system

An object database or object-oriented database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented. A third type, object–relational databases, is a hybrid of both approaches.

Structured Query Language (SQL), is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

Object–relational mapping in computer science is a programming technique for converting data between a relational database and the heap of an object-oriented programming language. This creates, in effect, a virtual object database that can be used from within the programming language.

<span class="mw-page-title-main">Object–relational database</span> Database management system

An object–relational database (ORD), or object–relational database management system (ORDBMS), is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, just as with pure relational systems, it supports extension of the data model with custom data types and methods.

<span class="mw-page-title-main">Network model</span> Database model invented by Charles Bachman

The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.

A stored procedure is a subroutine available to applications that access a relational database management system (RDBMS). Such procedures are stored in the database data dictionary.

CODASYL, the Conference/Committee on Data Systems Languages, was a consortium formed in 1959 to guide the development of a standard programming language that could be used on many computers. This effort led to the development of the programming language COBOL, the CODASYL Data Model, and other technical standards.

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP.

A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL).

A database server is a server which uses a database application that provides database services to other computer programs or to computers, as defined by the client–server model. Database management systems (DBMSs) frequently provide database-server functionality, and some database management systems rely exclusively on the client–server model for database access.

A data manipulation language (DML) is a computer programming language used for adding (inserting), deleting, and modifying (updating) data in a database. A DML is often a sublanguage of a broader database language such as SQL, with the DML comprising some of the operators in the language. Read-only selecting of data is sometimes distinguished as being part of a separate data query language (DQL), but it is closely related and sometimes also considered a component of a DML; some operators may perform both selecting (reading) and writing.

<span class="mw-page-title-main">IDMS</span>

The Integrated Database Management System (IDMS) is a network model (CODASYL) database management system for mainframes. It was first developed at B.F. Goodrich and later marketed by Cullinane Database Systems. Since 1989 the product has been owned by Computer Associates, who renamed it Advantage CA-IDMS and later simply to CA IDMS.

The object–relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when organizations store data in relational data stores and then use this data via domain-driven object models, the default method of implementing business-centric objects in object-oriented programming languages. The problems arise not from a failure of addressing data as relational nor as domain objects, but as a result of the difficulty of implementing a data mapping between the data values of the two conceptually different logic models; both models are logical models that can be implemented differently depending upon the technology utilized. These issues are not limited to applications, but exist across an enterprise, whenever data is stored in a relational manner then utilized as domain-driven object models, and vice versa. These difficulties are sometimes mitigated by use of a object-oriented data store, but that too has its own set of implementation difficulties.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

<span class="mw-page-title-main">RDM Server</span>

RDM Server is an embeddable, heterogeneous, client/server database management system supporting both C/C++ and SQL APIs for programming flexibility. The databases can be disk resident and/or memory resident. RDM Server implements multi-user locking, hot database backup, and a fully ACID-compliant transaction logging system with automatic crash recovery. It is currently supported on many 32- and 64-bit enterprise and embedded operating systems. The database library can optionally be run in-process with the application, eliminating client/server remote procedure calls.

Apache Empire-db is a Java library that provides a high level object-oriented API for accessing relational database management systems (RDBMS) through JDBC. Apache Empire-db is open source and provided under the Apache License 2.0 from the Apache Software Foundation.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

PL/SQL is Oracle Corporation's procedural extension for SQL and the Oracle relational database. PL/SQL is available in Oracle Database, Times Ten in-memory database, and IBM Db2. Oracle Corporation usually extends PL/SQL functionality with each successive release of the Oracle Database.

References

  1. Bachman, Charles W. (1973). "The programmer as navigator". Communications of the ACM. Portal.acm.org. 16 (11): 653–658. doi: 10.1145/355611.362534 . S2CID   18635540.
  2. Dionysios C. Tsichritzis and Frederick H. Lochovsky (1982). Data Models . Prentice-Hall. p.  67. ISBN   0-13-196428-3.
  3. Błażewicz, Jacek; Królikowski, Zbyszko; Morzy, Tadeusz (2003). Handbook on Data and Management in Information Systems. Springer. p. 18. ISBN   3-540-43893-9.