Lossless join decomposition

Last updated August 17, 2024

In database design, a lossless join decomposition is a decomposition of a relation $r$ into relations $r_{1},r_{2}$ such that a natural join of the two smaller relations yields back the original relation. This is central in removing redundancy safely from databases while preserving the original data.^[1] Lossless join can also be called non-additive.^[2]

Definition

A relation $r$ on schema $R$ decomposes losslessly onto schemas $R_{1}$ and $R_{2}$ if $\pi _{R_{1}}(r)\bowtie \pi _{R_{2}}(r)=r$ , that is $r$ is the natural join of its projections onto the smaller schemas. A pair $(R_{1},R_{2})$ is a lossless-join decomposition of $R$ or said to have a lossless join with respect to a set of functional dependencies $F$ if any relation $r(R)$ that satisfies $F$ decomposes losslessly onto $R_{1}$ and $R_{2}$ .^[3]

Decompositions into more than two schemas can be defined in the same way.^[4]

Criteria

A decomposition $R=R_{1}\cup R_{2}$ has a lossless join with respect to $F$ if and only if the closure of $R_{1}\cap R_{2}$ includes $R_{1}\setminus R_{2}$ or $R_{2}\setminus R_{1}$ . In other words, one of the following must hold:^[4]

$(R_{1}\cap R_{2})\to (R_{1}\setminus R_{2})\in F^{+}$
$(R_{1}\cap R_{2})\to (R_{2}\setminus R_{1})\in F^{+}$

Criteria for multiple sub-schemas

Multiple sub-schemas $R_{1},R_{2},...,R_{n}$ have a lossless join if there is some way in which we can repeatedly perform lossless joins until all the schemas have been joined into a single schema. Once we have a new sub-schema made from a lossless join, we are not allowed to use any of its isolated sub-schema to join with any of the other schemas. For example, if we can do a lossless join on a pair of schemas $R_{i},R_{j}$ to form a new schema $R_{i,j}$ , we use this new schema (rather than $R_{i}$ or $R_{j}$ ) to form a lossless join with another schema $R_{k}$ (which may already be joined (e.g., $R_{k,l}$ )).^{[ vague ]}

Example

Let $R=\{A,B,C,D\}$ be the relation schema, with attributes $A$ , $B$ , $C$ and $D$ .
Let $F=\{A\rightarrow BC\}$ be the set of functional dependencies.
Decomposition into $R_{1}=\{A,B,C\}$ and $R_{2}=\{A,D\}$ is lossless under $F$ because $R_{1}\cap R_{2}=A$ and we have a functional dependency $A\rightarrow BC$ . In other words, we have proven that $(R_{1}\cap R_{2}\rightarrow R_{1}\setminus R_{2})\in F^{+}$ .^[5]^[6]

Related Research Articles

In mathematics, especially in category theory and homotopy theory, a groupoid generalises the notion of group in several equivalent ways. A groupoid can be seen as a:

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

In category theory, a branch of mathematics, a natural transformation provides a way of transforming one functor into another while respecting the internal structure of the categories involved. Hence, a natural transformation can be considered to be a "morphism of functors". Informally, the notion of a natural transformation states that a particular map between functors can be done consistently over an entire category.

In mathematics, specifically abstract algebra, the isomorphism theorems are theorems that describe the relationship between quotients, homomorphisms, and subobjects. Versions of the theorems exist for groups, rings, vector spaces, modules, Lie algebras, and various other algebraic structures. In universal algebra, the isomorphism theorems can be generalized to the context of algebras and congruences.

In database theory, relational algebra is a theory that uses algebraic structures for modeling data, and defining queries on it with well founded semantics. The theory was introduced by Edgar F. Codd.

In relational database theory, a functional dependency is the following constraint between two attribute sets in a relation: Given a relation R and attribute sets $, X is said to functionally determine Y iff each X value is associated with precisely one Y value. R is then said to satisfy the functional dependency X \to Y . Equivalently, the projection is a function, that is, Y is a function of X . In simple words, if the values for the X attributes are known, then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x . Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: X \to Y is called trivial if Y is a subset of X .$

A candidate key, or simply a key, of a relational database is any set of columns that have a unique combination of values in each row, with the additional constraint that removing any column could produce duplicate combinations of values.

In mathematics, the barycentric subdivision is a standard way to subdivide a given simplex into smaller ones. Its extension on simplicial complexes is a canonical method to refine them. Therefore, the barycentric subdivision is an important tool in algebraic topology.

In probability theory, a Lévy process, named after the French mathematician Paul Lévy, is a stochastic process with independent, stationary increments: it represents the motion of a point whose successive displacements are random, in which displacements in pairwise disjoint time intervals are independent, and displacements in different time intervals of the same length have identical probability distributions. A Lévy process may thus be viewed as the continuous-time analog of a random walk.

In algebraic topology, a branch of mathematics, the (singular) homology of a topological space relative to a subspace is a construction in singular homology, for pairs of spaces. The relative homology is useful and important in several ways. Intuitively, it helps determine what part of an absolute homology group comes from which subspace.

In database theory, a multivalued dependency is a full constraint between two sets of attributes in a relation.

In the mathematical subject of group theory, the Grushko theorem or the Grushko–Neumann theorem is a theorem stating that the rank of a free product of two groups is equal to the sum of the ranks of the two free factors. The theorem was first obtained in a 1940 article of Grushko and then, independently, in a 1943 article of Neumann.

The chase is a simple fixed-point algorithm testing and enforcing implication of data dependencies in database systems. It plays important roles in database theory as well as in practice. It is used, directly or indirectly, on an everyday basis by people who design databases, and it is used in commercial systems to reason about the consistency and correctness of a data design. New applications of the chase in meta-data management and data exchange are still being discovered.

In database theory, a join dependency is a constraint on the set of legal relations over a database scheme. A table $is subject to a join dependency if can always be recreated by joining multiple tables each having a subset of the attributes of . If one of the tables in the join has all the attributes of the table, the join dependency is called trivial.$

In mathematics, specifically set theory, the Cartesian product of two sets $A$ and $B$ , denoted $A \times B$ , is the set of all ordered pairs $(a, b)$ where $a$ is in $A$ and $b$ is in $B$ . In terms of set-builder notation, that is

In mathematics, the Bloch group is a cohomology group of the Bloch–Suslin complex, named after Spencer Bloch and Andrei Suslin. It is closely related to polylogarithm, hyperbolic geometry and algebraic K-theory.

A canonical cover $for F is a set of dependencies such that F logically implies all dependencies in, and logically implies all dependencies in F.$

<span class="mw-page-title-main">Complexification (Lie group)</span> Universal construction of a complex Lie group from a real Lie group

In mathematics, the complexification or universal complexification of a real Lie group is given by a continuous homomorphism of the group into a complex Lie group with the universal property that every continuous homomorphism of the original group into another complex Lie group extends compatibly to a complex analytic homomorphism between the complex Lie groups. The complexification, which always exists, is unique up to unique isomorphism. Its Lie algebra is a quotient of the complexification of the Lie algebra of the original group. They are isomorphic if the original group has a quotient by a discrete normal subgroup which is linear.

In relational database theory, an embedded dependency (ED) is a certain kind of constraint on a relational database. It is the most general type of constraint used in practice, including both tuple-generating dependencies and equality-generating dependencies. Embedded dependencies can express functional dependencies, join dependencies, multivalued dependencies, inclusion dependencies, foreign key dependencies, and many more besides.

References

↑ Pohler, K (2015). "Lossless-Join Decomposition: applications in quantitative computing metrics". International Journal of Applied Computer Science. 21 (4): 190–212.
↑ Elmasri, Ramez (2016). Fundamentals of database systems (Seventh ed.). Hoboken, NJ: Pearson. p. 461. ISBN 978-0133970777.
↑ Maier, David (1983). The theory of relational databases (PDF). Computer Science Press. p. 101. ISBN 0-914894-42-0 . Retrieved 16 August 2024.
1 2 Ullman, Jeffrey D. (1988). Principles of Database and Knowledge-base Systems (PDF) (1 ed.). Computer Science Press. p. 397. ISBN 0-88175188-X . Retrieved 16 August 2024.
↑ "Lossless-Join Decomposition". Cs.sfu.ca. Retrieved 2016-02-07.
↑ "www.data-e-education.com - Lossless Join Decomposition". Archived from the original on 2014-02-21. Retrieved 2014-02-12.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Pohler, K (2015). "Lossless-Join Decomposition: applications in quantitative computing metrics". International Journal of Applied Computer Science. 21 (4): 190–212.

[Elmasri-2] Elmasri, Ramez (2016). Fundamentals of database systems (Seventh ed.). Hoboken, NJ: Pearson. p. 461. ISBN 978-0133970777.

[3] Maier, David (1983). The theory of relational databases (PDF). Computer Science Press. p. 101. ISBN 0-914894-42-0 . Retrieved 16 August 2024.

[Ullman1988-4] 1 2 Ullman, Jeffrey D. (1988). Principles of Database and Knowledge-base Systems (PDF) (1 ed.). Computer Science Press. p. 397. ISBN 0-88175188-X . Retrieved 16 August 2024.

[5] "Lossless-Join Decomposition". Cs.sfu.ca. Retrieved 2016-02-07.

[6] "www.data-e-education.com - Lossless Join Decomposition". Archived from the original on 2014-02-21. Retrieved 2014-02-12.

[1]

[2]

[3]

[4]

[5]

[6]