Object-relational impedance mismatch

Last updated

The object-relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when a relational database management system (RDBMS) is being served by an application program (or multiple application programs) written in an object-oriented programming language or style, particularly because objects or class definitions must be mapped to database tables defined by a relational schema.

Contents

The term object-relational impedance mismatch is derived from the electrical engineering term impedance matching .

Mismatches

Objects (instances) reference one another and therefore form a graph in the mathematical sense (a network including loops and cycles). Relational schemas are, in contrast, tabular and based on the relational algebra, which defines linked heterogeneous tuples (groupings of data fields into a "row" with different types for each field).

Object-oriented concepts

Encapsulation

Object-oriented programs are well designed with techniques that result in encapsulated objects whose internal representation can be hidden. In an object-oriented framework, the underlying properties of a given object are expected not to be exposed to any interface outside of the one implemented alongside the object. However, object-relational mapping necessarily exposes the underlying content of an object to interact with an interface that the object implementation cannot specify. Hence, object-relational mapping violates the encapsulation of the object, since many object-relational mappers automatically generate public fields corresponding to database columns.

Accessibility

In relational thinking, "private" versus "public" access is relative to need. In the object-oriented (OO) model it is an absolute characteristic of the data's state. The relational and OO models often have conflicts over relativity versus absolutism of classifications and characteristics.

Interface, class, inheritance and polymorphism

Under an object-oriented paradigm, objects have interfaces that together provide the only access to the internals of that object. The relational model, on the other hand, utilizes derived relation variables (views) to provide varying perspectives and constraints to ensure integrity. Similarly, essential OOP concepts for classes of objects, inheritance and polymorphism, are not supported by relational database systems.

Mapping to relational concepts

A proper mapping between relational concepts and object-oriented concepts can be made if relational database tables are linked[ further explanation needed ] to associations found in object-oriented analysis.

Data type differences

A major mismatch between existing relational and OO languages is the type system differences. The relational model strictly prohibits by-reference attributes (or pointers), whereas OO languages embrace and expect by-reference behavior. Scalar types and their operator semantics can be vastly different between the models, causing problems in mapping.

For example, most SQL systems support string types with varying collations and constrained maximum lengths (open-ended text types tend to hinder performance), while most OO languages consider collation only as an argument to sort routines and strings are intrinsically sized to available memory. A more subtle, but related example is that SQL systems often ignore trailing white space in a string for the purposes of comparison, whereas OO string libraries do not. It is typically not possible to construct new data types as a matter of constraining the possible values of other primitive types in an OO language.

Structural and integrity differences

Another mismatch has to do with the differences in the structural and integrity aspects of the contrasted models. In OO languages, objects can be composed of other objects—often to a high degree—or specialize from a more general definition. This may make the mapping to relational schemas less straightforward. This is because relational data tends to be represented in a named set of global, unnested relation variables. Relations themselves, being sets of tuples all conforming to the same header (see tuple relational calculus ) do not have an ideal counterpart in OO languages. Constraints in OO languages are generally not declared as such, but are manifested as exception raising protection logic surrounding code that operates on encapsulated internal data. The relational model, on the other hand, calls for declarative constraints on scalar types, attributes, relation variables, and the database as a whole.

Manipulative differences

The semantic differences are especially apparent in the manipulative aspects of the contrasted models, however. The relational model has an intrinsic, relatively small and well-defined set of primitive operators for usage in the query and manipulation of data, whereas OO languages generally handle query and manipulation through custom-built or lower-level, case- and physical-access-path-specific imperative operations. Some OO languages do have support for declarative query sublanguages, but because OO languages typically deal with lists and perhaps hash tables, the manipulative primitives are necessarily distinct from the set-based operations of the relational model.[ citation needed ]

Transactional differences

The concurrency and transaction aspects are significantly different also. In particular, transactions, the smallest unit of work performed by databases, are much larger in relational databases than are any operations performed by classes in OO languages. Transactions in relational databases are dynamically bounded sets of arbitrary data manipulations, whereas the granularity of transactions in an OO language is typically on the level of individual assignments to primitive-typed fields. In general, OO languages have no analogue of isolation or durability, so atomicity and consistency are only ensured when writing to fields of those primitive types.

Solving impedance mismatch

Working around the impedance mismatch problem for object-oriented programs starts with recognition of the differences in the specific logic systems being employed. The mismatch is then either minimized or compensated for. [1]

Alternative architectures

The Object-relational impedance mismatch problem is not a universal problem between OO and databases. As the name suggests, this impedance problem only occurs with relational databases. The most common solution to this problem is to use an alternative database, such as NoSQL or XML database.

Minimization

There have been some attempts at building object-oriented database management systems (OODBMS) that would avoid the impedance mismatch problem. They have been less successful in practice than relational databases however, partly due to the limitations of OO principles as a basis for a data model. [2] There has been research performed in extending the database-like capabilities of OO languages through such notions as transactional memory.

One common solution to the impedance mismatch problem is to layer the domain and framework logic. In this scheme, the OO language is used to model certain relational aspects at runtime rather than attempt the more static mapping. Frameworks which employ this method will typically have an analogue for a tuple, usually as a "row" in a "dataset" component or as a generic "entity instance" class, as well as an analogue for a relation. Advantages of this approach may include:

Disadvantages may include:

Compensation

The mixing of levels of discourse within OO application code presents problems, but there are some common mechanisms used to compensate. The biggest challenge is to provide framework support, automation of data manipulation and presentation patterns, within the level of discourse in which the domain data is being modelled. To address this, reflection and/or code generation are utilized. Reflection allows code (classes) to be addressed as data and thus provide automation of the transport, presentation, integrity, etc. of the data. Generation addresses the problem through addressing the entity structures as data inputs for code generation tools or meta-programming languages, which produce the classes and supporting infrastructure en masse. Both of these schemes may still be subject to certain anomalies where these levels of discourse merge. For instance, generated entity classes will typically have properties which map to the domain (e. g. Name, Address) as well as properties which provide state management and other framework infrastructure (e. g. IsModified).

Contention

It has been argued, by Christopher J. Date and others, that a truly relational DBMS would pose no such problem, [3] [4] [5] as domains and classes are essentially one and the same thing. A native mapping between classes and relational schemata is a fundamental design mistake [6] ; and that individual tuples within a database table (relation) ought to be viewed as establishing relationships between entities; not as representations for complex entities themselves. However, this view tends to diminish the influence and role of object-oriented programming, using it as little more than a field type management system.

The impedance mismatch is in programming between the domain objects and the user interface. Sophisticated user interfaces, to allow operators, managers, and other non-programmers to access and manipulate the records in the database, often require intimate knowledge about the nature of the various database attributes (beyond name and type). In particular, it's considered a good practice (from an end-user productivity point of view) to design user interfaces such that the UI prevents illegal transactions (those which cause a database constraint to be violated) from being entered; to do so requires much of the logic present in the relational schemata to be duplicated in the code.

Certain code-development frameworks can leverage certain forms of logic that are represented in the database's schema (such as referential integrity constraints), so that such issues are handled in a generic and standard fashion through library routines rather than ad hoc code written on a case-by-case basis.

It has been argued that SQL, due to a very limited set of domain types (and other alleged flaws) makes proper object and domain-modelling difficult; and that SQL constitutes a very lossy and inefficient interface between a DBMS and an application program (whether written in an object-oriented style or not). However, SQL is currently the only widely accepted common database language in the marketplace[ citation needed ]; use of vendor-specific query languages is seen as a bad practice when avoidable. Other database languages such as Business System 12 and Tutorial D have been proposed; but none of these has been widely adopted by DBMS vendors.

In current versions of mainstream "object-relational" DBMSs like Oracle and Microsoft SQL Server, the above point may be a non-issue. With these engines, the functionality of a given database can be arbitrarily extended through stored code (functions and procedures) written in a modern OO language (Java for Oracle, and a Microsoft .NET language for SQL Server), and these functions can be invoked in-turn in SQL statements in a transparent fashion: that is, the user neither knows nor cares that these functions/procedures were not originally part of the database engine. Modern software-development paradigms are fully supported: thus, one can create a set of library routines that can be re-used across multiple database schemas.

These vendors decided to support OO-language integration at the DBMS back-end because they realized that, despite the attempts of the ISO SQL-99 committee to add procedural constructs to SQL, SQL will never have the rich set of libraries and data structures that today's application programmers take for granted, and it is reasonable to leverage these as directly as possible rather than attempting to extend the core SQL language. Consequently, the difference between "application programming" and "database administration" is now blurred: robust implementation of features such as constraints and triggers may often require an individual with dual DBA/OO-programming skills, or a partnership between individuals who combine these skills. This fact also bears on the "division of responsibility" issue below.

Some, however, would point out that this contention is moot due to the fact that: (1) RDBMSes were never intended to facilitate object modelling, and (2) SQL generally should only be seen as a "lossy" or "inefficient" interface language when one is trying to achieve a solution for which RDBMSes were not designed. SQL is very efficient at doing what it was designed to do, namely, to query, sort, filter, and store large sets of data. Some would additionally point out that the inclusion of OO language functionality in the back-end simply facilitates bad architectural practice, as it admits high-level application logic into the data tier, antithetical to the RDBMS.

Here the "canonical" copy of state is located. The database model generally assumes that the database management system is the only authoritative repository of state concerning the enterprise; any copies of such state held by an application program are just that — temporary copies (which may be out of date, if the underlying database record was subsequently modified by a transaction). Many object-oriented programmers prefer to view the in-memory representations of objects themselves as the canonical data, and view the database as a backing store and persistence mechanism.

Another point of contention is the proper division of responsibility between application programmers and database administrators (DBA). It is often the case that needed changes to application code (in order to implement a requested new feature or functionality) require corresponding changes in the database definition; in most organizations, the database definition is the responsibility of the DBA. Due to the need to maintain a production database system 24 hours a day many DBAs are reluctant to make changes to database schemata that they deem gratuitous or superfluous and in some cases outright refuse to do so. Use of developmental databases (apart from production systems) can help somewhat; but when the newly developed application "goes live" the DBA will need to approve any changes. Some programmers view this as intransigence; however the DBA is frequently held responsible if any changes to the database definition cause a loss of service in a production system—as a result, many DBAs prefer to contain design changes to application code, where design defects are far less likely to have catastrophic consequences.

In organizations with a non-dysfunctional relationship between DBAs and developers, though, the above issue should not present itself, as the decision to change a database schema or not would only be driven by business needs: a new requirement to persist additional data or a performance boost of a critical application would both trigger a schema modification, for example.

Philosophical differences

Key philosophical differences between the OO and relational models can be summarized as follows:

As a result of the object-relational impedance mismatch, it is often argued by partisans on both sides of the debate that the other technology ought to be abandoned or reduced in scope. [7] Some database advocates view traditional "procedural" languages as more compatible with an RDBMS than many OO languages; or suggest that a less OO style ought to be used. (In particular, it is argued that long-lived domain objects in application code ought not to exist; any such objects that do exist should be created when a query is made and disposed of when a transaction or task is complete). Conversely, some OO advocates argue that more OO-friendly persistence mechanisms, such as OODBMS, ought to be developed and used, and that relational technology ought to be phased out. Many (if not most) programmers and DBAs hold neither of these viewpoints; and view the object-relational impedance mismatch as a mere fact of life that information technology has to deal with.

It is also argued that the O/R mapping is paying off in some situations, but is probably oversold: it has advantages besides drawbacks. Skeptics point out that it is worth to think carefully before using it, as it will add little value in some cases. [8]

See also

Related Research Articles

Database organized collection of data

A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.

A relational database is a digital database based on the relational model of data, as proposed by E. F. Codd in 1970. A software system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems have an option of using the SQL for querying and maintaining the database.

SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables.

Object-relational mapping in computer science is a programming technique for converting data between incompatible type systems using object-oriented programming languages. This creates, in effect, a "virtual object database" that can be used from within the programming language. There are both free and commercial packages available that perform object-relational mapping, although some programmers opt to construct their own ORM tools.

The Enterprise Objects Framework, or more commonly simply EOF, was introduced by NeXT in 1994 as a pioneering object-relational mapping product for its NeXTSTEP and OpenStep development platforms. EOF abstracts the process of interacting with a relational database by mapping database rows to Java or Objective-C objects. This largely relieves developers from writing low-level SQL code.

Object-relational database database management system

An object-relational database (ORD), or object-relational database management system (ORDBMS), is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, just as with pure relational systems, it supports extension of the data model with custom data types and methods.

In computer software, a data access object (DAO) is an object that provides an abstract interface to some type of database or other persistence mechanism. By mapping application calls to the persistence layer, the DAO provides some specific data operations without exposing details of the database. This isolation supports the single responsibility principle. It separates what data access the application needs, in terms of domain-specific objects and data types, from how these needs can be satisfied with a specific DBMS, database schema, etc..

Hibernate ORM is an object-relational mapping tool for the Java programming language. It provides a framework for mapping an object-oriented domain model to a relational database. Hibernate handles object-relational impedance mismatch problems by replacing direct, persistent database accesses with high-level object handling functions.

ADO.NET is a data access technology from the Microsoft .NET Framework that provides communication between relational and non-relational systems through a common set of components. ADO.NET is a set of computer software components that programmers can use to access data and data services from a database. It is a part of the base class library that is included with the Microsoft .NET Framework. It is commonly used by programmers to access and modify data stored in relational database systems, though it can also access data in non-relational data sources. ADO.NET is sometimes considered an evolution of ActiveX Data Objects (ADO) technology, but was changed so extensively that it can be considered an entirely new product.

Core Data is an object graph and persistence framework provided by Apple in the macOS and iOS operating systems. It was introduced in Mac OS X 10.4 Tiger and iOS with iPhone SDK 3.0. It allows data organized by the relational entity–attribute model to be serialized into XML, binary, or SQLite stores. The data can be manipulated using higher level objects representing entities and their relationships. Core Data manages the serialized version, providing object lifecycle and object graph management, including persistence. Core Data interfaces directly with SQLite, insulating the developer from the underlying SQL.

Entity–attribute–value model (EAV) is a data model to encode, in a space-efficient manner, entities where the number of attributes that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. Such entities correspond to the mathematical notion of a sparse matrix.

The Java Persistence API (JPA), in 2019 renamed to Jakarta Persistence, is a Java application programming interface specification that describes the management of relational data in applications using Java Platform, Standard Edition and Java Platform, Enterprise Edition/Jakarta EE.

Database model generic structure of a database type, for instance relational

A database model is a type of data model that determines the logical structure of a database and fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

Entity Framework (EF) is an open source object-relational mapping (ORM) framework for ADO.NET. It was a part of .NET Framework, but since Entity framework version 6 it is separated from .NET framework.

A document-oriented database, or document store, is a computer program designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

Semi-structured data is a form of structured data that does not obey the formal structure of data models associated with relational databases or other forms of data tables, but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as self-describing structure.

Versant Object Database (VOD) is an object database software product developed by Versant Corporation.

jOOQ Object Oriented Querying, commonly known as jOOQ, is a light database-mapping software library in Java that implements the active record pattern. Its purpose is to be both relational and object oriented by providing a domain-specific language to construct queries from classes generated from a database schema.

The following outline is provided as an overview of and topical guide to databases:

Oracle TopLink is mapping and persistence framework for Java developers. TopLink is produced by Oracle and is a part of Oracle's OracleAS, WebLogic, and OC4J servers. It is an object-persistence and object-transformation framework. TopLink provides development tools and run-time functionalities that ease the development process and help increase functionality. Persistent object-oriented data is stored in relational databases which helps build high-performance applications. Storing data in either XML or relational databases is made possible by transforming it from object-oriented data.

References

  1. A classification of object-relational impedance mismatch. Ireland, Christopher; Bowers, David; Newton, Mike and Waugh, Kevin (2009). A classification of object-relational impedance mismatch. In: First International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA), 1-6 Mar 2009, Cancun, Mexico.
  2. C. J. Date, Relational Database Writings
  3. Date, Christopher ‘Chris’ J; Pascal, Fabian (2012-08-12) [2005], "Type vs. Domain and Class", Database debunkings (World Wide Web log), Google, retrieved 12 September 2012.
  4. (2006), "4. On the notion of logical difference", Date on Database: writings 2000–2006, The expert’s voice in database; Relational database select writings, USA: Apress, p. 39, ISBN   978-1-59059-746-0, Class seems to be indistinguishable from type, as that term is classically understood.
  5. (2004), "26. Object/Relational databases", An introduction to database systems (8th ed.), Pearson Addison Wesley, p.  859, ISBN   978-0-321-19784-9, ...any such rapprochement should be firmly based on the relational model.
  6. Date, Christopher ‘Chris’ J; Darwen, Hugh, "2. Objects and Relations", The Third Manifesto, The first great blunder
  7. Neward, Ted (2006-06-26). "The Vietnam of Computer Science". Interoperability Happens. Retrieved 2010-06-02.
  8. Johnson, Rod (2002). J2EE Design and Development . Wrox Press. p.  256.