Join dependency

Last updated

In database theory, a join dependency is a constraint on the set of legal relations over a database scheme. A table is subject to a join dependency if can always be recreated by joining multiple tables each having a subset of the attributes of . If one of the tables in the join has all the attributes of the table , the join dependency is called trivial.

Contents

The join dependency plays an important role in the fifth normal form (5NF), also known as project-join normal form, because it can be proven that if a scheme is decomposed in tables to , the decomposition will be a lossless-join decomposition if the legal relations on are restricted to a join dependency on called .

Another way to describe a join dependency is to say that the relationships in the join dependency are independent of each other.

Unlike in the case of functional dependencies, there is no sound and complete axiomatization for join dependencies, [1] though axiomatization exist for more expressive dependency languages such as full typed dependencies. [2] :Chapter 8 However, implication of join dependencies is decidable. [2] :Theorem 8.4.12

Formal definition

Let be a relation schema and let be a decomposition of .

The relation satisfies the join dependency

if

A join dependency is trivial if one of the is itself. [3]

2-ary join dependencies are called multivalued dependency as a historical artifact of the fact that they were studied before the general case. More specifically if U is a set of attributes and R a relation over it, then R satisfies if and only if R satisfies

Example

Given a pizza-chain that models purchases in table Order = {order-number, customer-name, pizza-name, courier}. The following relations can be derived:

Since the relationships are independent there is a join dependency as follows: *((order-number, customer-name), (order-number, pizza-name), (order-number, courier)).

If each customer has his own courier however, there can be a join-dependency like this: *((order-number, customer-name), (order-number, pizza-name), (order-number, courier), (customer-name, courier)), but *((order-number, customer-name, courier), (order-number, pizza-name)) would be valid as well. This makes it obvious that just having a join dependency is not enough to normalize a database scheme.

See also

Related Research Articles

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

In database theory, relational algebra is a theory that uses algebraic structures for modeling data, and defining queries on it with a well founded semantics. The theory was introduced by Edgar F. Codd.

Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was defined in 1971 by Edgar F. Codd, an English computer scientist who invented the relational model for database management.

Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies XY, X is a superkey—that is, X is either a candidate key or a superset thereof.

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, a functional dependency is a constraint between two attributes in a relation. Given a relation R and sets of attributes , X is said to functionally determineY if and only if each X value in R is associated with precisely one Y value in R; R is then said to satisfy the functional dependency XY. Equivalently, the projection is a function, i.e. Y is a function of X. In simple words, if the values for the X attributes are known, then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: XY is called trivial if Y is a subset of X.

In algebra, a unit or invertible element of a ring is an invertible element for the multiplication of the ring. That is, an element u of a ring R is a unit if there exists v in R such that

<span class="mw-page-title-main">Referential integrity</span> Where all data references are valid

Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute, then the referenced value must exist.

<span class="mw-page-title-main">Vieta's formulas</span> Relating coefficients and roots of a polynomial

In mathematics, Vieta's formulas relate the coefficients of a polynomial to sums and products of its roots. They are named after François Viète.

In mathematical analysis, and applications in geometry, applied mathematics, engineering, and natural sciences, a function of a real variable is a function whose domain is the real numbers , or a subset of that contains an interval of positive length. Most real functions that are considered and studied are differentiable in some interval. The most widely considered such functions are the real functions, which are the real-valued functions of a real variable, that is, the functions of a real variable whose codomain is the set of real numbers.

In the mathematical discipline of measure theory, a Banach measure is a certain way to assign a size to all subsets of the Euclidean plane, consistent with but extending the commonly used Lebesgue measure. While there are certain subsets of the plane which are not Lebesgue measurable, all subsets of the plane have a Banach measure. On the other hand, the Lebesgue measure is countably additive while a Banach measure is only finitely additive.

<span class="mw-page-title-main">Real coordinate space</span> Space formed by the n-tuples of real numbers

In mathematics, the real coordinate space or real coordinate n-space, of dimension n, denoted Rn or , is the set of all ordered n-tuples of real numbers, that is the set of all sequences of n real numbers, also known as coordinate vectors. Special cases are called the real lineR1, the real coordinate planeR2, and the real coordinate three-dimensional spaceR3. With component-wise addition and scalar multiplication, it is a real vector space.

The theory of association schemes arose in statistics, in the theory of experimental design for the analysis of variance. In mathematics, association schemes belong to both algebra and combinatorics. In algebraic combinatorics, association schemes provide a unified approach to many topics, for example combinatorial designs and the theory of error-correcting codes. In algebra, association schemes generalize groups, and the theory of association schemes generalizes the character theory of linear representations of groups.

In mathematics, the Schwartz–Zippel lemma is a tool commonly used in probabilistic polynomial identity testing, i.e. in the problem of determining whether a given multivariate polynomial is the 0-polynomial. It was discovered independently by Jack Schwartz, Richard Zippel, and Richard DeMillo and Richard J. Lipton, although DeMillo and Lipton's version was shown a year prior to Schwartz and Zippel's result. The finite field version of this bound was proved by Øystein Ore in 1922.

Armstrong's axioms are a set of references used to infer all the functional dependencies on a relational database. They were developed by William W. Armstrong in his 1974 paper. The axioms are sound in generating only functional dependencies in the closure of a set of functional dependencies when applied to that set. They are also complete in that repeated application of these rules will generate all functional dependencies in the closure .

The complexity of constraint satisfaction is the application of computational complexity theory to constraint satisfaction. It has mainly been studied for discriminating between tractable and intractable classes of constraint satisfaction problems on finite domains.

<span class="mw-page-title-main">Radial distribution function</span> Description of particle density in statistical mechanics

In statistical mechanics, the radial distribution function, in a system of particles, describes how density varies as a function of distance from a reference particle.

In mathematics, Brauer's theorem, named for Richard Brauer, is a result on the representability of 0 by forms over certain fields in sufficiently many variables.

In database theory, a multivalued dependency is a full constraint between two sets of attributes in a relation.

In multilinear algebra, the higher-order singular value decomposition (HOSVD) of a tensor is a specific orthogonal Tucker decomposition. It may be regarded as one type of generalization of the matrix singular value decomposition. It has applications in computer vision, computer graphics, machine learning, scientific computing, and signal processing. Some aspects can be traced as far back as F. L. Hitchcock in 1928, but it was L. R. Tucker who developed for third-order tensors the general Tucker decomposition in the 1960s, further advocated by L. De Lathauwer et al. in their Multilinear SVD work that employs the power method, or advocated by Vasilescu and Terzopoulos that developed M-mode SVD a parallel algorithm that employs the matrix SVD.

In relational database theory, an embedded dependency (ED) is a certain kind of constraint on a relational database. It is the most general type of constraint used in practice, including both tuple-generating dependencies and equality-generating dependencies. Embedded dependencies can express functional dependencies, join dependencies, multivalued dependencies, inclusion dependencies, foreign key dependencies, and many more besides.

References

  1. Petrov, S. V. (1989). "Finite axiomatization of languages for representation of system properties". Information Sciences. 47: 339–372. doi:10.1016/0020-0255(89)90006-6.
  2. 1 2 Abiteboul; Hull; Vianu (1995). Foundations of databases . Addison-Wesley. ISBN   9780201537710.
  3. Silberschatz, Korth. Database System Concepts (1st ed.).