Data dictionary

Last updated
A simple layout of a data dictionary Data dictionary.png
A simple layout of a data dictionary

A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format". [1] Oracle defines it as a collection of tables with metadata. The term can have one of several closely related meanings pertaining to databases and database management systems (DBMS):

Contents

Documentation

The terms data dictionary and data repository indicate a more general software utility than a catalogue. A catalogue is closely coupled with the DBMS software. It provides the information stored in it to the user and the DBA, but it is mainly accessed by the various software modules of the DBMS itself, such as DDL and DML compilers, the query optimiser, the transaction processor, report generators, and the constraint enforcer. On the other hand, a data dictionary is a data structure that stores metadata, i.e., (structured) data about information. The software package for a stand-alone data dictionary or data repository may interact with the software modules of the DBMS, but it is mainly used by the designers, users and administrators of a computer system for information resource management. These systems maintain information on system hardware and software configuration, documentation, application and users as well as other information relevant to system administration. [2]

If a data dictionary system is used only by the designers, users, and administrators and not by the DBMS Software, it is called a passive data dictionary. Otherwise, it is called an active data dictionary or data dictionary. When a passive data dictionary is updated, it is done so manually and independently from any changes to a DBMS (database) structure. With an active data dictionary, the dictionary is updated first and changes occur in the DBMS automatically as a result.

Database users and application developers can benefit from an authoritative data dictionary document that catalogs the organization, contents, and conventions of one or more databases. [3] This typically includes the names and descriptions of various tables (records or entities) and their contents (fields) plus additional details, like the type and length of each data element. Another important piece of information that a data dictionary can provide is the relationship between tables. This is sometimes referred to in entity-relationship diagrams (ERDs), or if using set descriptors, identifying which sets database tables participate in.

In an active data dictionary constraints may be placed upon the underlying data. For instance, a range may be imposed on the value of numeric data in a data element (field), or a record in a table may be forced to participate in a set relationship with another record-type. Additionally, a distributed DBMS may have certain location specifics described within its active data dictionary (e.g. where tables are physically located).

The data dictionary consists of record types (tables) created in the database by systems generated command files, tailored for each supported back-end DBMS. Oracle has a list of specific views for the "sys" user. This allows users to look up the exact information that is needed. Command files contain SQL Statements for CREATE TABLE, CREATE UNIQUE INDEX, ALTER TABLE (for referential integrity), etc., using the specific statement required by that type of database.

There is no universal standard as to the level of detail in such a document.

Middleware

In the construction of database applications, it can be useful to introduce an additional layer of data dictionary software, i.e. middleware, which communicates with the underlying DBMS data dictionary. Such a "high-level" data dictionary may offer additional features and a degree of flexibility that goes beyond the limitations of the native "low-level" data dictionary, whose primary purpose is to support the basic functions of the DBMS, not the requirements of a typical application. For example, a high-level data dictionary can provide alternative entity-relationship models tailored to suit different applications that share a common database. [4] Extensions to the data dictionary also can assist in query optimization against distributed databases. [5] Additionally, DBA functions are often automated using restructuring tools that are tightly coupled to an active data dictionary.

Software frameworks aimed at rapid application development sometimes include high-level data dictionary facilities, which can substantially reduce the amount of programming required to build menus, forms, reports, and other components of a database application, including the database itself. For example, PHPLens includes a PHP class library to automate the creation of tables, indexes, and foreign key constraints portably for multiple databases. [6] Another PHP-based data dictionary, part of the RADICORE toolkit, automatically generates program objects, scripts, and SQL code for menus and forms with data validation and complex joins. [7] For the ASP.NET environment, Base One's data dictionary provides cross-DBMS facilities for automated database creation, data validation, performance enhancement (caching and index utilization), application security, and extended data types. [8] Visual DataFlex features [9] provides the ability to use DataDictionaries as class files to form middle layer between the user interface and the underlying database. The intent is to create standardized rules to maintain data integrity and enforce business rules throughout one or more related applications.

Some industries use generalized data dictionaries as technical standards to ensure interoperability between systems. The real estate industry, for example, abides by a RESO's Data Dictionary to which the National Association of REALTORS mandates [10] its MLSs comply with through its policy handbook. [11] This intermediate mapping layer for MLSs' native databases is supported by software companies which provide API services to MLS organizations.

Platform-specific examples

Developers use a data description specification (DDS) to describe data attributes in file descriptions that are external to the application program that processes the data, in the context of an IBM i. [12] The sys.ts$ table in Oracle stores information about every table in the database. It is part of the data dictionary that is created when the Oracle Database is created. [13] Developers may also use DDS context from free and open-source software (FOSS) for structured and transactional queries in open environments.

Typical attributes

Here is a non-exhaustive list of typical items found in a data dictionary for columns or fields:

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

<span class="mw-page-title-main">Object–relational database</span> Database management system

An object–relational database (ORD), or object–relational database management system (ORDBMS), is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, just as with pure relational systems, it supports extension of the data model with custom data types and methods.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code.

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on reccurring schedules either as single jobs or aggregated into a batch of jobs.

Dataphor is an open-source truly-relational database management system (RDBMS) and its accompanying user interface technologies, which together are designed to provide highly declarative software application development. The Dataphor Server has its own storage engine or it can be a virtual, or federated, DBMS, meaning that it can utilize other database engines for storage.

An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.

Object–relational impedance mismatch creates difficulties going from data in relational data stores to usage in domain-driven object models. Object-orientation (OO) is the default method for business-centric design in programming languages. The problem lies in neither relational nor OO, but in the conceptual difficulty mapping between the two logic models. Both are logical models implementable differently on database servers, programming languages, design patterns, or other technologies. Issues range from application to enterprise scale, whenever stored relational data is used in domain-driven object models, and vice versa. Object-oriented data stores can trade this problem for other implementation difficulties.

Online transaction processing (OLTP) is a type of database system used in transaction-oriented applications, such as many operational systems. "Online" refers to that such systems are expected to respond to user requests and process them in real-time. The term is contrasted with online analytical processing (OLAP) which instead focuses on data analysis.

<span class="mw-page-title-main">Uniface (programming language)</span> Low-code development platform

Uniface is a low-code development and deployment platform for enterprise applications that can run in a large range of runtime environments, including mobile, mainframe, web, Service-oriented architecture (SOA), Windows, Java EE, and .NET. Uniface is used to create mission-critical applications.

Oracle Database provides information about all of the tables, views, columns, and procedures in a database. This information about information is known as metadata. It is stored in two locations: data dictionary tables and a metadata registry.

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix.

<span class="mw-page-title-main">Oracle Designer</span>

Oracle Designer was Oracle's CASE tool for designing an information system and generating it. After generating the information system one is able to edit the generated code with Oracle Developer Suite.

Database administration is the function of managing and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM Db2 and Microsoft SQL Server need ongoing management. As such, corporations that use DBMS software often hire specialized information technology personnel called database administrators or DBAs.

Apache Empire-db is a Java library that provides a high level object-oriented API for accessing relational database management systems (RDBMS) through JDBC. Apache Empire-db is open source and provided under the Apache License 2.0 from the Apache Software Foundation.

Innovative Routines International (IRI), Inc. is an American software company first known for bringing mainframe sort merge functionality into open systems. IRI was the first vendor to develop a commercial replacement for the Unix sort command, and combine data transformation and reporting in Unix batch processing environments. In 2007, IRI's coroutine sort ("CoSort") became the first product to collate and convert multi-gigabyte XML and LDIF files, join and lookup across multiple files, and apply role-based data privacy functions for fields within sensitive files.

A metadata repository is a database created to store metadata. Metadata is information about the structures that contain the actual data. Metadata is often said to be "data about data", but this is misleading. Data profiles are an example of actual "data about data". Metadata adds one layer of abstraction to this definition– it is data about the structures that contain data. Metadata may describe the structure of any data, of any subject, stored in any format.

The following is provided as an overview of and topical guide to databases:

Rocket U2 is a suite of database management (DBMS) and supporting software now owned by Rocket Software. It includes two MultiValue database platforms: UniData and UniVerse. Both of these products are operating environments which run on current Unix, Linux and Windows operating systems. They are both derivatives of the Pick operating system. The family also includes developer and web-enabling technologies including SB/XA, U2 Web Development Environment (WebDE), UniObjects connectivity API and wIntegrate terminal emulation software.

References

  1. ACM, IBM Dictionary of Computing, 10th edition, 1993
  2. Ramez Elmasri, Shamkant B. Navathe: Fundamentals of Database Systems, 3rd. ed. sect. 17.5, p. 582
  3. TechTarget, SearchSOA, What is a data dictionary? Archived 12 February 2009 at the Wayback Machine
  4. U.S. Patent 4774661, Database management system with active data dictionary, 19 November 1985, AT&T
  5. U.S. Patent 4769772, Automated query optimization method using both global and parallel local optimizations for materialization access planning for distributed databases, 28 February 1985, Honeywell Bull
  6. PHPLens, ADOdb Data Dictionary Library for PHP Archived 7 November 2007 at the Wayback Machine
  7. RADICORE, What is a Data Dictionary?
  8. Base One International Corp., Base One Data Dictionary
  9. VISUAL DATAFLEX,features Archived 5 April 2018 at the Wayback Machine
  10. "Real Estate Transaction Standards (RETS) Web API". nar.realtor. Retrieved 11 October 2020.
  11. "Handbook on Multiple Listing Policy". nar.realtor. Retrieved 11 October 2020.
  12. "DDS documentation for IBM System i V5R3".
  13. "Oracle Concepts - Data Dictionary". dba-oracle.com. Retrieved 13 February 2017.