Sargable

Last updated

In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query. The term is derived from a contraction of Search ARGument ABLE. It was first used by IBM researchers as a contraction of Search ARGument, and has come to mean simply "can be looked up by an index." 1 [1] [2]

Contents

For database query optimizers, sargable is an important property in OLTP workloads because it suggests a good query plan can be obtained by a simple heuristic 2 matching query to indexes instead of a complex, time-consuming cost-based search, [1] thus it is often desired to write sargable queries. A query failing to be sargable is known as a non-sargable query and typically has a negative effect on query time, so one of the steps in query optimization is to convert them to be sargable. The effect is similar to searching for a specific term in a book that has no index, beginning at page one each time, instead of jumping to a list of specific pages identified in an index.

The typical situation that will make a SQL query non-sargable is to include in the WHERE clause a function operating on a column value. The WHERE clause is not the only clause where sargability can matter; it can also have an effect on ORDER BY, GROUP BY and HAVING clauses. The SELECT clause, on the other hand, can contain non-sargable expressions without adversely affecting the performance.

Some database management systems, for instance PostgreSQL, support functional indices. Conceptually, an index is simply a mapping between a value and one or more locations. With a functional index, the value stored in the index is the output of the function specified when the index is created. This capability expands what is sargable beyond base column expressions.

Simple example

WHERE clauses that are sargable typically have field values on the left of the operator, and scalar values or expressions on the right side of the operator.

Not sargable:

SELECT*FROMmyTableWHERESQRT(myIntField)>11.7

This is not sargable because myIntField is embedded in a function. If any indexes were available on myIntField, they could not be used. In addition, SQRT() would be called on every record in myTable.

Sargable version:

SELECT*FROMmyTableWHEREmyIntField>11.7*11.7

This is sargable because myIntField is NOT contained in a function, making any available indexes on myIntField potentially usable. Furthermore, the expression is evaluated only once, rather than for each record in the table.

Text example

WHERE ... LIKE clauses that are sargable have field values on the left of the operator, and LIKE text strings that do not begin with the % on the right.

Not sargable:

SELECT*FROMmyTableWHEREmyNameFieldLIKE'%Wales%'-- Begins with %, not sargable

This is not sargable. It must examine every row to find the fields containing the substring 'Wales' in any position.

Sargable version:

SELECT*FROMmyTableWHEREmyNameFieldLIKE'Jimmy%'-- Does not begin with %, sargable

This is sargable. It can use an index to find all the myNameField values that start with the substring 'Jimmy'.

See also

Notes

^1 Gulutzan and Pelzer, (Chapter 2, Simple "Searches")
^2 [3] gives an example of such simple heuristic.

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.

<span class="mw-page-title-main">Join (SQL)</span> SQL clause

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

The SQL SELECT statement returns a result set of records, from one or more tables.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. UDFs are usually written for the requirement of its creator.

In a database, a view is the result set of a stored query, which can be queried in the same manner as a persistent database collection object. This pre-established query command is kept in the data dictionary. Unlike ordinary base tables in a relational database, a view does not form part of the physical schema: as a result set, it is a virtual table computed or collated dynamically from data in the database when access to that view is requested. Changes applied to the data in a relevant underlying table are reflected in the data shown in subsequent invocations of the view.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

Query optimization is a feature of many relational database management systems and other databases such as NoSQL and graph databases. The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans.

A WHERE clause in SQL specifies that a SQL Data Manipulation Language (DML) statement should only affect rows that meet specified criteria. The criteria are expressed in the form of predicates. WHERE clauses are not mandatory clauses of SQL DML statements, but can be used to limit the number of rows affected by a SQL DML statement or returned by a query. In brief SQL WHERE clause is used to extract only those results from a SQL statement, such as: SELECT, INSERT, UPDATE, or DELETE statement.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

The nested set model is a technique for representing nested set collections in relational databases.

In SQL, a window function or analytic function is a function which uses values from one or multiple rows to return a value for each row. Window functions have an OVER clause; any function without an OVER clause is not a window function, but rather an aggregate or single-row (scalar) function.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

In a SQL database query, a correlated subquery is a subquery that uses values from the outer query. Because the subquery may be evaluated once for each row processed by the outer query, it can be slow.

CUBRID ( "cube-rid") is an open-source SQL-based relational database management system (RDBMS) with object extensions developed by CUBRID Corp. for OLTP. The name CUBRID is a combination of the two words cube and bridge, cube standing for a space for data and bridge standing for data bridge.

<span class="mw-page-title-main">Array DBMS</span> System that provides database services specifically for arrays

An array database management system or array DBMS provides database services specifically for arrays, that is: homogeneous collections of data items, sitting on a regular grid of one, two, or more dimensions. Often arrays are used to represent sensor, simulation, image, or statistics data. Such arrays tend to be Big Data, with single objects frequently ranging into Terabyte and soon Petabyte sizes; for example, today's earth and space observation archives typically grow by Terabytes a day. Array databases aim at offering flexible, scalable storage and retrieval on this information category.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. 1 2 Andy, Pavlo (Spring 2023). "CMU 15-721 :: Advanced Database Systems (Spring 2023) :: Lecture #16 Optimizer Implementation (Part 1) - Slide" (PDF). Archived (PDF) from the original on 2023-06-01. Retrieved 2024-01-25.
  2. Selinger, P. Griffiths; Astrahan, M. M.; Chamberlin, D. D.; Lorie, R. A.; Price, T. G. (1979). "Access Path Selection in a Relational Database Management System". SIGMOD '79: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data. ACM Press: 23. doi:10.1145/582095.582099. ISBN   978-0-89791-001-9.
  3. Silberschatz, Abraham; Korth, Henry F.; Sudarshan, S. (2020). Database system concepts (7th ed.). New York, NY: McGraw-Hill Education. p. 773. ISBN   978-1-260-08450-4.