Schema evolution

Last updated

In computer science, schema versioning and schema evolution, deal with the need to retain current data and software system functionality in the face of changing database structure. [1] The problem is not limited to the modification of the schema. It, in fact, affects the data stored under the given schema and the queries (and thus the applications) posed on that schema.

A database design is sometimes created as a "as of now" instance and thus schema evolution is not considered. (This is different but related to where a database is designed as a "one size fits all" which doesn't cover attribute volatility). This assumption, almost unrealistic in the context of traditional information systems, becomes unacceptable in the context of systems that retain large volumes of historical information or those such as web information systems, that due to the distributed and cooperative nature of their development, are subject of an even stronger pressure toward change (from 39% to over 500% more intense than in traditional settings). [2] Due to this historical heritage the process of schema evolution as of 2008 a particularly taxing one. It is, in fact, widely acknowledged that the data management core of an applications is one of the most difficult and critical components to evolve. The key problem is the impact of the schema evolution on queries and applications. As shown in the article Schema Evolution in Wikipedia - Toward a Web Information System Benchmark (2008) [3] (which provides an analysis of the MediaWiki evolution) each evolution step might affect up to 70% of the queries operating on the schema, that must be manually reworked consequently.

In 2008, the problem has been recognized as a pressing one by the database community for more than 12 years. [4] [5] Supporting schema evolution is a difficult problem involving complex mapping among schema versions and the tool support has been so far very limited. The recent theoretical advances on mapping composition [6] and mapping invertibility, [7] which represent the core problems underlying the schema evolution remains almost inaccessible to the large public.[ why? ] The issue is particular felt by temporal databases. [8]

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

<span class="mw-page-title-main">Object–relational database</span> Database management system

An object–relational database (ORD), or object–relational database management system (ORDBMS), is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, just as with pure relational systems, it supports extension of the data model with custom data types and methods.

A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data, unlike a natural key.

A temporal database stores data relating to time instances. It offers temporal data types and stores information relating to past, present and future time. Temporal databases can be uni-temporal, bi-temporal or tri-temporal.

A federated database system (FDBS) is a type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database. The constituent databases are interconnected via a computer network and may be geographically decentralized. Since the constituent database systems remain autonomous, a federated database system is a contrastable alternative to the task of merging several disparate databases. A federated database, or virtual database, is a composite of all constituent databases in a federated database system. There is no actual data integration in the constituent disparate databases as a result of data federation.

Hibernate ORM is an object–relational mapping tool for the Java programming language. It provides a framework for mapping an object-oriented domain model to a relational database. Hibernate handles object–relational impedance mismatch problems by replacing direct, persistent database accesses with high-level object handling functions.

ADO.NET is a data access technology from the Microsoft .NET Framework that provides communication between relational and non-relational systems through a common set of components. ADO.NET is a set of computer software components that programmers can use to access data and data services from a database. It is a part of the base class library that is included with the Microsoft .NET Framework. It is commonly used by programmers to access and modify data stored in relational database systems, though it can also access data in non-relational data sources. ADO.NET is sometimes considered an evolution of ActiveX Data Objects (ADO) technology, but was changed so extensively that it can be considered an entirely new product.

<span class="mw-page-title-main">Metamodeling</span> Concept of software engineering

A metamodel is a model of a model, and metamodeling is the process of generating such metamodels. Thus metamodeling or meta-modeling is the analysis, construction and development of the frames, rules, constraints, models and theories applicable and useful for modeling a predefined class of problems. As its name implies, this concept applies the notions of meta- and modeling in software engineering and systems engineering. Metamodels are of many types and have diverse applications.

Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial and scientific domains. Data integration appears with increasing frequency as the volume, complexity and the need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. Data integration encourages collaboration between internal as well as external users. The data being integrated must be received from a heterogeneous database system and transformed to a single coherent data store that provides synchronous data across a network of files for clients. A common use of data integration is in data mining when analyzing and extracting information from existing databases that can be useful for Business information.

Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.

Perst is an open source, dual license, object-oriented embedded database management system (ODBMS). Both the Java programming language, and the C# programming language versions are compact and Perst has been implemented on smart phones running the Android and Windows Phone (WP7) operating systems.

Entity Framework (EF) is an open source object–relational mapping (ORM) framework for ADO.NET. It was originally shipped as an integral part of .NET Framework, however starting with Entity Framework version 6.0 it has been delivered separately from the .NET Framework.

The terms schema matching and mapping are often used interchangeably for a database process. For this article, we differentiate the two as follows: schema matching is the process of identifying that two objects are semantically related while mapping refers to the transformations between the objects. For example, in the two schemas DB1.Student and DB2.Grad-Student ; possible matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.SSN = DB2.ID etc. and possible transformations or mappings would be: DB1.Marks to DB2.Grades.

Dataspaces are an abstraction in data management that aim to overcome some of the problems encountered in data integration system. The aim is to reduce the effort required to set up a data integration system by relying on existing matching and mapping generation techniques, and to improve the system in "pay-as-you-go" fashion as it is used. Labor-intensive aspects of data integration are postponed until they are absolutely needed.

Richard Thomas Snodgrass is an American computer scientist and writer and is professor emeritus at the University of Arizona. He is best known for his work on temporal databases, query language design, query optimization and evaluation, storage structures, database design, and ergalics.

In software engineering, a schema migration refers to the management of version-controlled, incremental and sometimes reversible changes to relational database schemas. A schema migration is performed on a database whenever it is necessary to update or revert that database's schema to some newer or older version.

The following is provided as an overview of and topical guide to databases:

Schema-agnostic databases or vocabulary-independent databases aim at supporting users to be abstracted from the representation of the data, supporting the automatic semantic matching between queries and databases. Schema-agnosticism is the property of a database of mapping a query issued with the user terminology and structure, automatically mapping it to the dataset vocabulary.

References

  1. Roddick, John F (1995). "A survey of schema versioning issues for database systems". Information and Software Technology. 37 (7): 383–393. CiteSeerX   10.1.1.54.8474 . doi:10.1016/0950-5849(95)91494-K.
  2. "Schema Evolution Benchmark - Schema Evolution". yellowstone.cs.ucla.edu. Retrieved 2010-07-29.
  3. Curino CA, Moon HJ, Tanca L, Zaniolo C (2008). Schema Evolution in Wikipedia: toward a Web Information System Benchmark, (PDF). ICEIS.
  4. Rahm E, Bernstein PA. "An Online Bibliography on Schema Evolution" (PDF). Retrieved 2 May 2017.
  5. Topor, Rodney; Salem, Kenneth; Gupta, Amarnath; Goda, Kazuo; Gehrke, Johannes; Palmer, Nathaniel; Sharaf, Mohamed; Labrinidis, Alexandros; Roddick, John F.; Fuxman, Ariel; Miller, Renée J.; Tan, Wang-Chiew; Kementsietsidis, Anastasios; Bonnet, Philippe; Shasha, Dennis; Roddick, John F.; Gupta, Amarnath; Peikert, Ronald; Ludäscher, Bertram; Bowers, Shawn; McPhillips, Timothy; Naumann, Harald; Voruganti, Kaladhar; Domingo-Ferrer, Josep; Carterette, Ben; Ipeirotis, Panagiotis G.; Arenas, Marcelo; Manolopoulos, Yannis; Theodoridis, Yannis; et al. (2009). "Schema Versioning". Encyclopedia of Database Systems. Springer, Boston, MA. pp. 2499–2502. doi:10.1007/978-0-387-39940-9_323. ISBN   978-0-387-35544-3.
  6. Nash, Alan; Bernstein, Philip A.; Melnik, Sergey (2007). "Composition of mappings given by embedded dependencies". ACM Transactions on Database Systems. 32: 4–es. CiteSeerX   10.1.1.534.3957 . doi:10.1145/1206049.1206053.
  7. Fagin R, Kolaitis PG, Popa L, Tan WC. "Quasi-inverses of Schema Mappings" (PDF).
  8. Roddick, John F.; Snodgrass, Richard T. (1995). "Schema Versioning". The TSQL2 Temporal Query Language. The Springer International Series in Engineering and Computer Science. Springer, Boston, MA. pp. 427–449. doi:10.1007/978-1-4615-2289-8_22. ISBN   9781461359661.
  9. "deltasql, Database Evolution Under Control". Deltasql Development Team. 2013-04-20. Retrieved 2019-02-08.