Spanner (database)

Last updated
Cloud Spanner Booth at Google Cloud Summit GoogleCloudSpannerBooth.jpg
Cloud Spanner Booth at Google Cloud Summit

Spanner is a distributed SQL database management and storage service developed by Google. [1] It provides features such as global transactions, strongly consistent reads, and automatic multi-site replication and failover. Spanner is used in Google F1, the database for its advertising business Google Ads, as well as Gmail and Google Photos. [2] [3]

Contents

Features

Spanner stores large amounts of mutable structured data. Spanner allows users to perform arbitrary queries using SQL with relational data while maintaining strong consistency and high availability for that data with synchronous replication.

Key features of Spanner:

History

Spanner was first described in 2012 for internal Google data centers. [4]

Spanner's SQL capability was added in 2017 and documented in a SIGMOD 2017 paper. [5] It became available as part of Google Cloud Platform in 2017, under the name "Cloud Spanner". [6]

Architecture

Spanner uses the Paxos algorithm as part of its operation to shard (partition) data across up to hundreds of servers. [1] It makes heavy use of hardware-assisted clock synchronization using GPS clocks and atomic clocks to ensure global consistency. [1] TrueTime is the brand name for Google's distributed cloud infrastructure, which provides Spanner with the ability to generate monotonically increasing timestamps in data centers around the world. [7]

Google's F1 SQL database management system (DBMS) is built on top of Spanner, [2] replacing Google's custom MySQL variant. [8]

Related Research Articles

<span class="mw-page-title-main">MySQL</span> SQL database engine software

MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the initialism for Structured Query Language. A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Windows, Linux, macOS, FreeBSD, and OpenBSD, and handles a range of workloads from single machines to data warehouses, data lakes, or web services with many concurrent users.

A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location ; or maybe dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.

<span class="mw-page-title-main">Google data centers</span> Facilities containing Google servers

Google data centers are the large data center facilities Google uses to provide their services, which combine large drives, computer nodes organized in aisles of racks, internal and external networking, environmental controls, and operations software.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

In databases, and transaction processing, snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database, and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot.

Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Apache Cassandra is a free and open-source, distributed, wide-column store, NoSQL, database management system intended to handle large amounts of data across multiple commodity servers, providing availability with no single point of failure. Cassandra supports clusters and spanning of multiple data centers with asynchronous and master-less replication. It allows low latency operations for all clients. Cassandra implements Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

<span class="mw-page-title-main">Drizzle (database server)</span>

Drizzle is a discontinued free software/open-source relational database management system (DBMS) that was forked from the now-defunct 6.0 development branch of the MySQL DBMS.

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

<span class="mw-page-title-main">CockroachDB</span> Distributed database management system

CockroachDB is a source-available distributed SQL database management system developed by Cockroach Labs. The relational functionality is built on top of a distributed, transactional, consistent key-value store that can survive a variety of different underlying infrastructure failures, and is wire-compatible with PostgreSQL which means users can take advantage of a wide range of drivers and tools from the extensive PostgreSQL ecosystem. A CockroachDB cluster consists of a number of nodes that can be spread across failure domains such as data centres or public cloud regions. A cluster can be scaled both horizontally and vertically. It can provide high levels of resilience and availability and can be run in a variety of environments such as bare metal, VMs, containers and Kubernetes, both in private data centers and in the cloud. CockroachDB gets its name from cockroaches, as they are known for being disaster-resistant.

<span class="mw-page-title-main">PACELC theorem</span> Theorem in theoretical computer science

In database theory, the PACELC theorem is an extension to the CAP theorem. It states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and loss of consistency (C).

Sanjay Ghemawat is an Indian American computer scientist and software engineer. He is currently a Senior Fellow at Google in the Systems Infrastructure Group. Ghemawat's work at Google, much of it in close collaboration with Jeff Dean, has included big data processing model MapReduce, the Google File System, and databases Bigtable and Spanner. Wired have described him as one of the "most important software engineers of the internet age".

TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. Designed to be MySQL compatible, it is developed and supported primarily by PingCAP and licensed under Apache 2.0. It is also available as a paid product. TiDB drew its initial design inspiration from Google's Spanner and F1 papers.

A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or Raft algorithms to achieve consensus across multiple nodes.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

References

  1. 1 2 3 Corbett et al. 2012.
  2. 1 2 Shute et al. 2012.
  3. "Announcing Cloud Spanner Price Performance Updates".
  4. Clark, Jack (September 18, 2012). "Google reveals Spanner, the database tech that can span the planet". ZDNet. Retrieved August 4, 2021.
  5. Spanner: Becoming a SQL System. 9 May 2017. pp. 331–343. doi:10.1145/3035918.3056103. ISBN   9781450341974. S2CID   3055672.
  6. Srivastava, Deepti (February 14, 2017). "Introducing Cloud Spanner: a global database service for mission-critical applications". Google Cloud Blog. Retrieved August 4, 2021.
  7. "Cloud Spanner: TrueTime and external consistency". Google Cloud. Retrieved 2020-11-24.
  8. Shute et al. 2012, p. 19: ‘Summary: We've moved a large and critical application suite from MySQL to F1.’

Bibliography

Further reading