Spanner (database)

Last updated
Cloud Spanner Booth at Google Cloud Summit GoogleCloudSpannerBooth.jpg
Cloud Spanner Booth at Google Cloud Summit

Spanner is a distributed SQL database management and storage service developed by Google. [1] It provides features such as global transactions, strongly consistent reads, and automatic multi-site replication and failover. Spanner is used in Google F1, the database for its advertising business Google Ads, as well as Gmail and Google Photos. [2] [3]

Contents

Features

Spanner stores large amounts of mutable structured data. Spanner allows users to perform arbitrary queries using SQL with relational data while maintaining strong consistency and high availability for that data with synchronous replication.

Key features of Spanner:

History

Spanner was first described in 2012 for internal Google data centers. [4]

Spanner's SQL capability was added in 2017 and documented in a SIGMOD 2017 paper. [5] It became available as part of Google Cloud Platform in 2017, under the name "Cloud Spanner". [6]

Architecture

Spanner uses the Paxos algorithm as part of its operation to shard (partition) data across up to hundreds of servers. [1] It makes heavy use of hardware-assisted clock synchronization using GPS clocks and atomic clocks to ensure global consistency. [1] TrueTime is the brand name for Google's distributed cloud infrastructure, which provides Spanner with the ability to generate monotonically increasing timestamps in data centers around the world. [7]

Google's F1 SQL database management system (DBMS) is built on top of Spanner, [2] replacing Google's custom MySQL variant. [8]

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location ; or maybe dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

<span class="mw-page-title-main">Google data centers</span> Facilities containing Google servers

Google data centers are the large data center facilities Google uses to provide their services, which combine large drives, computer nodes organized in aisles of racks, internal and external networking, environmental controls, and operations software.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Apache Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple data centers, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

Clustrix, Inc. is a San Francisco-based private company founded in 2006 that developed a database management system marketed as NewSQL.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

<span class="mw-page-title-main">CockroachDB</span> Distributed database management system

CockroachDB is a source-available distributed SQL database management system developed by Cockroach Labs. The relational functionality is built on top of a distributed, transactional, consistent key-value store that can survive a variety of different underlying infrastructure failures, and is wire-compatible with PostgreSQL which means users can take advantage of a wide range of drivers and tools from the extensive PostgreSQL ecosystem. A CockroachDB cluster consists of a number of nodes that can be spread across failure domains such as data centres or public cloud regions. A cluster can be scaled both horizontally and vertically. It can provide high levels of resilience and availability and can be run in a variety of environments such as bare metal, VMs, containers and Kubernetes, both in private data centers and in the cloud. CockroachDB gets its name from cockroaches, as they are known for being disaster-resistant.

<span class="mw-page-title-main">PACELC theorem</span> Theorem in theoretical computer science

In database theory, the PACELC theorem is an extension to the CAP theorem. It states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and loss of consistency (C).

Sanjay Ghemawat is an Indian American computer scientist and software engineer. He is currently a Senior Fellow at Google in the Systems Infrastructure Group. Ghemawat's work at Google, much of it in close collaboration with Jeff Dean, has included big data processing model MapReduce, the Google File System, and databases Bigtable and Spanner. Wired have described him as one of the "most important software engineers of the internet age".

TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. Designed to be MySQL compatible, it is developed and supported primarily by PingCAP and licensed under Apache 2.0. It is also available as a paid product. TiDB drew its initial design inspiration from Google's Spanner and F1 papers.

A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or Raft algorithms to achieve consensus across multiple nodes.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

References

  1. 1 2 3 Corbett et al. 2012.
  2. 1 2 Shute et al. 2012.
  3. "Announcing Cloud Spanner Price Performance Updates".
  4. Clark, Jack (September 18, 2012). "Google reveals Spanner, the database tech that can span the planet". ZDNet. Retrieved August 4, 2021.
  5. Spanner: Becoming a SQL System. 9 May 2017. pp. 331–343. doi:10.1145/3035918.3056103. ISBN   9781450341974. S2CID   3055672.
  6. Srivastava, Deepti (February 14, 2017). "Introducing Cloud Spanner: a global database service for mission-critical applications". Google Cloud Blog. Retrieved August 4, 2021.
  7. "Cloud Spanner: TrueTime and external consistency". Google Cloud. Retrieved 2020-11-24.
  8. Shute et al. 2012, p. 19: ‘Summary: We've moved a large and critical application suite from MySQL to F1.’

Bibliography

Further reading