Full table scan

Last updated April 25, 2024

A full table scan (also known as a sequential scan) is a scan made on a database where each row of the table is read in a sequential (serial) order and the columns encountered are checked for the validity of a condition.^[1] Full table scans ^[2] are usually the slowest method of scanning a table due to the heavy amount of I/O reads required from the disk which consists of multiple seeks as well as costly disk to memory transfers.

Overview

In a database, a query that is not indexed results in a full table scan, where the database processes each record of the table to find all records meeting the given requirements. Even if the query selects just a few rows from the table, all rows in the entire table will be examined. This usually results in suboptimal performance but may be acceptable with very small tables or when the overhead of keeping indexes up to date is high.

When the optimizer considers a full table scan

The most important factor in choosing depends on speed. This means that a full table scan should be used when it is the fastest and cannot use a different access path. Several full table scan examples are as follows.^[3]

No index The optimizer must use a full table scan since no index exists.
Small number of rows The cost of full table scan is less than index range scan due to small table.
When query processed SELECT COUNT(*), nulls existed in the column The query is counting the number of null columns in a typical index. However, SELECT COUNT(*) can't count the number of null columns.
The query is unselective The number of return rows is too large and takes nearly 100% in the whole table. These rows are unselective.
The table statistics does not update The number of rows in the table is higher than before, but table statistics haven't been updated yet. The optimizer can't correctly estimate that using the index is faster.
The table has a high degree of parallelism The high degree of parallelism table distorts the optimizer from a true way, because optimizer would use full table scan.
A full table scan hint The hint lets optimizer to use full table scan.

Examples

The first example shows a SQL statement that returns the name of every fruit in the fruits table whose color is red. If the fruits table does not have an index for the color column, then the database engine must load and examine every row within fruits in order to compare each row's color to 'red':

SELECTnameFROMfruitsWHEREcolor='red';

The second example shows a SQL statement which returns the name of all fruits in the fruits table. Because this statement has no condition - no WHERE clause - the database engine will use a table scan to load and return the data for this query even if the fruits table has an index on the name column because accessing - i.e. scanning - the table directly is faster than accessing the table through the extra abstraction layer of an index:

SELECTnameFROMfruits

The third example is a counter-example that will almost certainly cause the SQL engine to use an index instead of a table scan. This example uses almost the same query as the previous one, but adds an ORDER BY clause so that the returned names will be in alphabetical order. Assuming that the fruits table has an index on the name column, the database engine will now use that index to return the names in order because accessing the table through the extra abstraction layer of the index provides the benefit of returning the rows in the requested order. Had the engine loaded the rows using a table scan, it would then have to perform the additional work of sorting the returned rows. In some extreme cases - e.g. the statistics maintained by the database engine indicate that the table contains a very small number of rows - the optimizer may still decide to use a table scan for this type of query:

SELECTnameFROMfruitsORDERBYname

Pros and cons

Pros:

The cost is predictable, as every time database system needs to scan full table row by row.
When table is less than 2 percent of database block buffer, the full scan table is quicker.

Cons:

Full table scan occurs when there is no index or index is not being used by SQL. And the result of full scan table is usually slower that index table scan. The situation is that: the larger the table, the slower of the data returns.
Unnecessary full-table scan will lead to a huge amount of unnecessary I/O with a process burden on the entire database.

Related Research Articles

A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data, unlike a natural key.

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

The SQL SELECT statement returns a result set of rows, from one or more tables.

An SQL INSERT statement adds one or more records to any single table in a relational database.

A query plan is a sequence of steps used to access data in a SQL relational database management system. This is a specific case of the relational model concept of access plans.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

In a database, a view is the result set of a stored query, which can be queried in the same manner as a persistent database collection object. This pre-established query command is kept in the data dictionary. Unlike ordinary base tables in a relational database, a view does not form part of the physical schema: as a result set, it is a virtual table computed or collated dynamically from data in the database when access to that view is requested. Changes applied to the data in a relevant underlying table are reflected in the data shown in subsequent invocations of the view.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

Set operations in SQL is a type of operations which allow the results of multiple queries to be combined into a single result set.

Query optimization is a feature of many relational database management systems and other databases such as NoSQL and graph databases. The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans.

A WHERE clause in SQL specifies that a SQL Data Manipulation Language (DML) statement should only affect rows that meet specified criteria. The criteria are expressed in the form of predicates. WHERE clauses are not mandatory clauses of SQL DML statements, but can be used to limit the number of rows affected by a SQL DML statement or returned by a query. In brief SQL WHERE clause is used to extract only those results from a SQL statement, such as: SELECT, INSERT, UPDATE, or DELETE statement.

Gadfly is a relational database management system written in Python. Gadfly is a collection of Python modules that provides relational database functionality entirely implemented in Python. It supports a subset of the standard RDBMS Structured Query Language (SQL).

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.

In relational databases, a condition in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query. The term is derived from a contraction of Search ARGument ABLE. It was first used by IBM researchers as a contraction of Search ARGument, and has come to mean simply "can be looked up by an index."

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

In a SQL database query, a correlated subquery is a subquery that uses values from the outer query. Because the subquery may be evaluated once for each row processed by the outer query, it can be slow.

PL/SQL is Oracle Corporation's procedural extension for SQL and the Oracle relational database. PL/SQL is available in Oracle Database, Times Ten in-memory database, and IBM Db2. Oracle Corporation usually extends PL/SQL functionality with each successive release of the Oracle Database.

A Block Range Index or BRIN is a database indexing technique. They are intended to improve performance with extremely large tables.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

↑ "Avoiding Table Scans". Oracle. 2011.
↑ "Which is Faster: Index Access or Table Scan?". Microsoft TechNet. 2002.
↑ "Optimizer Access Paths". Oracle. 2013.

External links

Oracle full table scan tips

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Avoiding Table Scans". Oracle. 2011.

[2] "Which is Faster: Index Access or Table Scan?". Microsoft TechNet. 2002.

[3] "Optimizer Access Paths". Oracle. 2013.

[1]

[2]

[3]