Lossless join decomposition

Last updated

In database design, a lossless join decomposition is a decomposition of a relation into relations such that a natural join of the two smaller relations yields back the original relation. This is central in removing redundancy safely from databases while preserving the original data. [1]

Contents

Criteria

Lossless join can also be called nonadditive. [2]

If is split into and , for this decomposition to be lossless (i.e., ) then at least one of the two following criteria should be met.

Check 1: Verify join explicitly

Projecting on and , and joining them back, results in the relation you started with. [3] [ unreliable source? ]

Check 2: Via functional dependencies

Let be a relation schema.

Let F be a set of functional dependencies on .

Let and form a decomposition of .

The decomposition is lossless if one of the sub-relations (i.e. or ) is a subset of the closure of their intersection. In other words, the decomposition is a lossless-join decomposition of if at least one of the following functional dependencies are in F+ (where F+ stands for the closure for every attribute or attribute sets in F): [4]

Criteria for multiple sub-relations

Multiple sub-relations have a lossless join if there is some way in which we can repeatedly perform lossless joins until all the relations have been joined into a single relation. Once we have a new sub-relation made from a lossless join, we are not allowed to use any of its isolated sub-relations to join with any of the other relations. For example, if we can do a lossless join on a pair of relations to form a new relation , we use this new relation (rather than or ) to form a lossless join with another relation (which may already be joined (e.g., )).

Examples

[5] [6]

Related Research Articles

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

In mathematics, specifically abstract algebra, the isomorphism theorems are theorems that describe the relationship between quotients, homomorphisms, and subobjects. Versions of the theorems exist for groups, rings, vector spaces, modules, Lie algebras, and various other algebraic structures. In universal algebra, the isomorphism theorems can be generalized to the context of algebras and congruences.

<span class="mw-page-title-main">Exact sequence</span> Sequence of homomorphisms such that each kernel equals the preceding image

An exact sequence is a sequence of morphisms between objects such that the image of one morphism equals the kernel of the next.

In formal logic and related branches of mathematics, a functional predicate, or function symbol, is a logical symbol that may be applied to an object term to produce another object term. Functional predicates are also sometimes called mappings, but that term has additional meanings in mathematics. In a model, a function symbol will be modelled by a function.

In database theory, relational algebra is a theory that uses algebraic structures for modeling data, and defining queries on it with a well founded semantics. The theory was introduced by Edgar F. Codd.

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, a functional dependency is a constraint between two attributes in a relation. Given a relation R and sets of attributes , X is said to functionally determineY if and only if each X value in R is associated with precisely one Y value in R; R is then said to satisfy the functional dependency XY. Equivalently, the projection is a function, i.e. Y is a function of X. In simple words, if the values for the X attributes are known, then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: XY is called trivial if Y is a subset of X.

In mathematics, a well-defined expression or unambiguous expression is an expression whose definition assigns it a unique interpretation or value. Otherwise, the expression is said to be not well defined, ill defined or ambiguous. A function is well defined if it gives the same result when the representation of the input is changed without changing the value of the input. For instance, if takes real numbers as input, and if does not equal then is not well defined. The term well defined can also be used to indicate that a logical expression is unambiguous or uncontradictory.

A candidate key, or simply a key, of a relational database is any set of columns that have a unique combination of values in each row, with the additional constraint that removing any column could produce duplicate combinations of values.

In mathematics, the jet is an operation that takes a differentiable function f and produces a polynomial, the truncated Taylor polynomial of f, at each point of its domain. Although this is the definition of a jet, the theory of jets regards these polynomials as being abstract polynomials rather than polynomial functions.

In category theory, a branch of mathematics, an antiisomorphism between structured sets A and B is an isomorphism from A to the opposite of B. If there exists an antiisomorphism between two structures, they are said to be antiisomorphic.

In mathematics, the (linear) Peetre theorem, named after Jaak Peetre, is a result of functional analysis that gives a characterisation of differential operators in terms of their effect on generalized function spaces, and without mentioning differentiation in explicit terms. The Peetre theorem is an example of a finite order theorem in which a function or a functor, defined in a very general way, can in fact be shown to be a polynomial because of some extraneous condition or symmetry imposed upon it.

Boyce–Codd normal form is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomalies not dealt with by 3NF as originally defined.

In database theory, a multivalued dependency is a full constraint between two sets of attributes in a relation.

In mathematical logic, Skolem arithmetic is the first-order theory of the natural numbers with multiplication, named in honor of Thoralf Skolem. The signature of Skolem arithmetic contains only the multiplication operation and equality, omitting the addition operation entirely.

In the mathematical subject of group theory, the Grushko theorem or the Grushko–Neumann theorem is a theorem stating that the rank of a free product of two groups is equal to the sum of the ranks of the two free factors. The theorem was first obtained in a 1940 article of Grushko and then, independently, in a 1943 article of Neumann.

The chase is a simple fixed-point algorithm testing and enforcing implication of data dependencies in database systems. It plays important roles in database theory as well as in practice. It is used, directly or indirectly, on an everyday basis by people who design databases, and it is used in commercial systems to reason about the consistency and correctness of a data design. New applications of the chase in meta-data management and data exchange are still being discovered.

In database theory, a join dependency is a constraint on the set of legal relations over a database scheme. A table is subject to a join dependency if can always be recreated by joining multiple tables each having a subset of the attributes of . If one of the tables in the join has all the attributes of the table , the join dependency is called trivial.

<span class="mw-page-title-main">Dependency network</span>

The dependency network approach provides a system level analysis of the activity and topology of directed networks. The approach extracts causal topological relations between the network's nodes, and provides an important step towards inference of causal activity relations between the network nodes. This methodology has originally been introduced for the study of financial data, it has been extended and applied to other systems, such as the immune system, and semantic networks.

A canonical cover for F is a set of dependencies such that F logically implies all dependencies in , and logically implies all dependencies in F.

<span class="mw-page-title-main">Complexification (Lie group)</span> Universal construction of a complex Lie group from a real Lie group

In mathematics, the complexification or universal complexification of a real Lie group is given by a continuous homomorphism of the group into a complex Lie group with the universal property that every continuous homomorphism of the original group into another complex Lie group extends compatibly to a complex analytic homomorphism between the complex Lie groups. The complexification, which always exists, is unique up to unique isomorphism. Its Lie algebra is a quotient of the complexification of the Lie algebra of the original group. They are isomorphic if the original group has a quotient by a discrete normal subgroup which is linear.

References

  1. Pohler, K (2015). "Lossless-Join Decomposition: applications in quantitative computing metrics". International Journal of Applied Computer Science. 21 (4): 190–212.
  2. Elmasri, Ramez (2016). Fundamentals of database systems (Seventh ed.). Hoboken, NJ: Pearson. p. 461. ISBN   978-0133970777.
  3. "Lossless Join Property". Stackoverflow.com. Retrieved 2016-02-07.
  4. "Lossless Join Decomposition" (PDF). University at Buffalo . Jan Chomicki. Retrieved 2012-02-08.
  5. "Lossless-Join Decomposition". Cs.sfu.ca. Retrieved 2016-02-07.
  6. "www.data-e-education.com - Lossless Join Decomposition". Archived from the original on 2014-02-21. Retrieved 2014-02-12.