Merge (SQL)

Last updated

A relational database management system uses SQL MERGE (also called upsert) statements to INSERT new records or UPDATE or DELETE existing records depending on whether condition matches. It was officially introduced in the SQL:2003 standard, and expanded[ citation needed ] in the SQL:2008 standard.

Contents

Usage

MERGEINTOtablenameUSINGtable_referenceON(condition)WHENMATCHEDTHENUPDATESETcolumn1=value1[, column2 = value2 ...]WHENNOTMATCHEDTHENINSERT(column1[, column2 ...])VALUES(value1[, value2 ...]);

A right join is employed over the Target (the INTO table) and the Source (the USING table / view / sub-query)--where Target is the left table and Source is the right one. The four possible combinations yield these rules:

If multiple Source rows match a given Target row, an error is mandated by SQL:2003 standards. You cannot update a Target row multiple times with a MERGE statement

Implementations

Database management systems PostgreSQL, [1] Oracle Database, IBM Db2, Teradata, EXASOL, Firebird, CUBRID, H2, HSQLDB, MS SQL, Vectorwise and Apache Derby support the standard syntax. Some also add non-standard SQL extensions.

Synonymous

Some database implementations adopted the term upsert (a portmanteau of update and insert) to a database statement, or combination of statements, that inserts a record to a table in a database if the record does not exist or, if the record already exists, updates the existing record. This synonym is used in PostgreSQL (v9.5+) [2] and SQLite (v3.24+). [3] It is also used to abbreviate the "MERGE" equivalent pseudo-code.

It is used in Microsoft Azure SQL Database. [4]

Other non-standard implementations

Some other database management systems support this, or very similar behavior, through their own, non-standard SQL extensions.

MySQL, for example, supports the use of INSERT...ONDUPLICATEKEYUPDATE syntax [5] which can be used to achieve a similar effect with the limitation that the join between target and source has to be made only on PRIMARY KEY or UNIQUE constraints, which is not required in the ANSI/ISO standard. It also supports >REPLACE INTO syntax, [6] which first attempts an insert, and if that fails, deletes the row, if exists, and then inserts the new one. There is also an IGNORE clause for the INSERT statement, [7] which tells the server to ignore "duplicate key" errors and go on (existing rows will not be inserted or updated, but all new rows will be inserted).

SQLite's INSERTORREPLACEINTO works similarly. It also supports REPLACE INTO as an alias for compatibility with MySQL. [8]

Firebird supports MERGE INTO though fails to throw an error when there are multiple Source data rows. Additionally there is a single-row version, UPDATEORINSERTINTOtablename(columns)VALUES(values)[MATCHING(columns)], but the latter does not give you the option to take different actions on insert versus update (e.g. setting a new sequence value only for new rows, not for existing ones.)

IBM Db2 extends the syntax with multiple WHEN MATCHED and WHEN NOT MATCHED clauses, distinguishing them with ... AND some-condition guards.

Microsoft SQL Server extends with supporting guards and also with supporting Left Join via WHENNOTMATCHEDBYSOURCE clauses.

PostgreSQL supports merge since version 15 but previously supported merging via INSERTINTO...ONCONFLICT[conflict_target]conflict_action. [9]

CUBRID supports MERGE INTO [10] statement. And supports the use of INSERT...ONDUPLICATEKEYUPDATE syntax. [11] It also supports REPLACE INTO for compatibility with MySQL. [12]

Apache Phoenix supports UPSERT VALUES [13] and UPSERT SELECT [14] syntax.

Spark SQL supports UPDATE SET * and INSERT * clauses in actions. [15]

Apache Impala supports UPSERT INTO ... SELECT. [16]

Usage in NoSQL

A similar concept is applied in some NoSQL databases.

In MongoDB the fields in a value associated with a key can be updated with an update operation. The update raises an error if the key is not found. In the update operation it is possible to set the upsert flag: in this case a new value is stored associated to the given key if it does not exist, otherwise the whole value is replaced.

In Redis the SET operations sets the value associated with a given key. Redis does not know any detail of the internal structure of the value, so an update would have no meaning. So the SET operation has always a set or replace semantics.

See also

Related Research Articles

Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

A foreign key is a set of attributes in a table that refers to the primary key of another table, linking these two tables. In the context of relational databases, a foreign key is subject to an inclusion dependency constraint that the tuples consisting of the foreign key attributes in one relation, R, must also exist in some other relation, S; furthermore that those attributes must also be a candidate key in S.

A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data, unlike a natural key.

<span class="mw-page-title-main">Join (SQL)</span> SQL clause

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

The SQL SELECT statement returns a result set of rows, from one or more tables.

An SQL INSERT statement adds one or more records to any single table in a relational database.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfill the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

In relational databases, the information schema is an ANSI-standard set of read-only views that provide information about all of the tables, views, columns, and procedures in a database. It can be used as a source of the information that some databases make available through non-standard commands, such as:

 => SELECT count(table_name) FROM information_schema.tables;  count   -------  99    => SELECT column_name, data_type, column_default, is_nullable  FROM information_schema.columns WHERE table_name='alpha';  column_name | data_type | column_default | is_nullable   -------------+-----------+----------------+-------------  foo | integer | | YES  bar | character | | YES    => SELECT * FROM information_schema.information_schema_catalog_name;  catalog_name   --------------  johnd  

The DUAL table is a special one-row, one-column table present by default in Oracle and other database installations. In Oracle, the table has a single VARCHAR2(1) column called DUMMY that has a value of 'X'. It is suitable for use in selecting a pseudo column such as SYSDATE or USER.

In SQL, a window function or analytic function is a function which uses values from one or multiple rows to return a value for each row. Window functions have an OVER clause; any function without an OVER clause is not a window function, but rather an aggregate or single-row (scalar) function.

A hierarchical query is a type of SQL query that handles hierarchical model data. They are special cases of more general recursive fixpoint queries, which compute transitive closures.

In database management systems (DBMS), a prepared statement, parameterized statement, or parameterized query is a feature where the database pre-compiles SQL code and stores the results, separating it from data. Benefits of prepared statements are:

In relational databases a virtual column is a table column whose value(s) is automatically computed using other columns values, or another deterministic expression. Virtual columns are defined of SQL:2003 as Generated Column, and are only implemented by some DBMSs, like MariaDB, SQL Server, Oracle, PostgreSQL, SQLite and Firebird.

ECPG is the standard, in the PostgreSQL database built-in, client programming interface for embedding SQL in programs written in the C programming language. It provides the option for accessing the PostgreSQL database directly from the C code in the application, using SQL commands.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. "E.1. Release 15". PostgreSQL Documentation. 13 October 2022. Archived from the original on 13 October 2022. Retrieved 13 October 2022.
  2. "PostgreSQL Upsert Using INSERT ON CONFLICT statement". PostgreSQL Tutorial. Archived from the original on Nov 28, 2022.
  3. "upsert", SQLite, visited 6-6-2018.
  4. "MERGE (Transact-SQL)". Transact-SQL Reference (Database Engine). Microsoft Learn. Archived from the original on Jun 24, 2016.
  5. MySQL :: MySQL 5.1 Reference Manual :: 12.2.4.3 INSERT ... ON DUPLICATE KEY UPDATE Syntax
  6. MySQL 5.1 Reference Manual: 11.2.6 REPLACE Syntax
  7. "MySQL 5.5 Reference Manual :: 13.2.5 INSERT Syntax" . Retrieved 29 October 2013.
  8. "SQL As Understood By SQLite: INSERT" . Retrieved 2012-09-27.
  9. PostgreSQL INSERT page
  10. "New CUBRID 9.0.0". CUBRID Official Blog. 2012-10-30. Retrieved 2012-11-08.
  11. CUBRID :: Data Manipulation Statements :: Insert :: ON DUPLICATE KEY UPDATE Clause
  12. CUBRID :: Data Manipulation Statements :: Replace
  13. "UPSERT VALUES".
  14. "UPSERT SELECT".
  15. "MERGE INTO (Delta Lake on Databricks)".
  16. "UPSERT Statement (Apache Impala Documentation)".