Amazon SimpleDB

Last updated
SimpleDB Logo AWS Simple Icons Database Amazon SimpleDB Item.svg
SimpleDB Logo

Amazon SimpleDB is a distributed database written in Erlang [1] by Amazon.com. It is used as a web service in concert with Amazon Elastic Compute Cloud (EC2) and Amazon S3 and is part of Amazon Web Services. It was announced on December 13, 2007. [2]

Contents

As with EC2 and S3, Amazon charges fees for SimpleDB storage, transfer, and throughput over the Internet. On December 1, 2008, Amazon introduced new pricing with Free Tier [3] for 1 GB of data & 25 machine hours. Transfer to other Amazon Web Services is free of charge. [2]

Limitations

SimpleDB provides eventual consistency, which is a weaker form of consistency, compared to other database management systems. This is often considered a limitation, because it is harder to reason about, which makes it harder to write correct programs that make use of SimpleDB. This limitation is the result of a fundamental design trade-off. By foregoing consistency, the system is able to achieve two other highly desirable properties:

  1. availability – components of the system may fail, but the service will continue to operate correctly.
  2. partition tolerance – components in the system are connected to one another by a computer network. If components are not able to contact one another using the network (a condition known as a network partition), operation of the system will continue.

Component failures are assumed to be inevitable; thus, both of these properties were deemed necessary in order to provide a reliable web service. The CAP theorem states that it is not possible for a system to exhibit these properties along with consistency; thus, the designers needed to settle for a weaker form of consistency.

Published limitations: [4]

Store limitations

AttributeMaximum
domains250 active domains per account. More can be requested by filling out a form. [5]
size of each domain10 GB
attributes per domain1,000,000,000
attributes per item256 attributes
size per attribute1024 bytes

Query limitations

AttributeMaximum
items returned in a query response2500 items
seconds a query may run5 s
attribute names per query predicate1 attribute name
comparisons per predicate22 operators
predicates per query expression20 predicates

Features

Conditional Put and Delete

Conditional put and conditional delete are new operations that were added in February 2010. They address a problem that arises when accessing SimpleDB concurrently. Consider a simple program that uses SimpleDB to store a counter, i.e. a number that can be incremented. The program must do three things:

  1. Retrieve the current value of the counter from SimpleDB.
  2. Add one to the value.
  3. Store the new value in the same place as the old value in SimpleDB.

If this program runs while no other programs access SimpleDB, it will work correctly; however, it is often desirable for software applications (particularly web applications) to access the same data concurrently. When the same data is accessed concurrently, a race condition arises, which would result in undetectable data loss.

Continuing the previous example, consider two processes, A and B, running the same program. Suppose SimpleDB services requests for data, as described in step 1, from both A and B. A and B see the same value. Let's say that the current value of the counter is 0. Because of steps 2 and 3, A will try to store 1. B will try to do the same; thus, the final counter value will be 1, even though the expected final counter value is 2, because the system attempted two increment operations, one by A, and another by B.

This problem can be solved by the use of conditional put. Suppose we change step 3 as follows: instead of unconditionally storing the new value, the program asks SimpleDB to store the new value only if the value that it currently holds is the same as the value that was retrieved in step 1. Then, we can be sure that the counter's value actually increases. This introduces some additional complexity; if SimpleDB was not able to store the new value because the current value was not as expected, the program must repeat steps 1–3 until the conditional put operation actually changes the stored value.

Consistent Read

Consistent read was a new feature that was released at the same time as conditional put and conditional delete. As the name suggests, consistent read addresses problems that arise due to SimpleDB's eventual consistency model (See the Limitations section). Consider the following sequence of operations:

  1. Program A stores some data in SimpleDB.
  2. Immediately after, A requests the data it just stored.

SimpleDB's eventual consistency guarantee does not allow us to say that the data retrieved in step 2 reflects the updates that were made in step 1. Eventual consistency only guarantees that step 2 reflects the complete set of updates in step 1, or none of those updates. Consistent read can be used to ensure that the data retrieved in step 2 reflect changes in step 1.

The reason that inconsistent results can arise when the consistent read operation is not used is that SimpleDB stores data in multiple locations (for availability), and the new data in step 1 might not be written at all locations when SimpleDB receives the data request in step 2. In that case, it is possible that the data request in step 2 is serviced at one of the locations where the new data has not been written.

Amazon discourages the use of consistent read, unless it is required for correctness. The reason for this recommendation is that the rate at which consistent read operations are serviced is lower than for regular reads.

Relationship to DynamoDB

There has been some talk of SimpleDB being superseded by DynamoDB (it is no longer being "iterated on", [6] though Amazon does not plan to remove it). DynamoDB appears to be its successor. [7] [8]

See also

Related Research Articles

In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be consistent and the results of reading, writing, or updating memory will be predictable. Consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. Consistency is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

Memcached is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read. Memcached is free and open-source software, licensed under the Revised BSD license. Memcached runs on Unix-like operating systems and on Microsoft Windows. It depends on the libevent library.

<span class="mw-page-title-main">Linearizability</span> Property of some operation(s) in concurrent programming

In concurrent programming, an operation is linearizable if it consists of an ordered list of invocation and response events, that may be extended by adding response events such that:

  1. The extended list can be re-expressed as a sequential history.
  2. That sequential history is a subset of the original unextended list.
<span class="mw-page-title-main">Amazon Web Services</span> On-demand cloud computing company

Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered, pay-as-you-go basis. Clients will often use this in combination with autoscaling. These cloud computing web services provide various services related to networking, compute, storage, middleware, IoT and other processing capacity, as well as software tools via AWS server farms. This frees clients from managing, scaling, and patching hardware and operating systems. One of the foundational services is Amazon Elastic Compute Cloud (EC2), which allows users to have at their disposal a virtual cluster of computers, with extremely high availability, which can be interacted with over the internet via REST APIs, a CLI or the AWS console. AWS's virtual computers emulate most of the attributes of a real computer, including hardware central processing units (CPUs) and graphics processing units (GPUs) for processing; local/RAM memory; hard-disk/SSD storage; a choice of operating systems; networking; and pre-loaded application software such as web servers, databases, and customer relationship management (CRM).

Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. Eventual consistency, also called optimistic replication, is widely deployed in distributed systems and has origins in early mobile computing projects. A system that has achieved eventual consistency is often said to have converged, or achieved replica convergence. Eventual consistency is a weak guarantee – most stronger models, like linearizability, are trivially eventually consistent.

Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network. Amazon S3 can store any type of object, which allows uses like storage for Internet applications, backups, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage. AWS launched Amazon S3 in the United States on March 14, 2006, then in Europe in November 2007.

<span class="mw-page-title-main">Amazon Elastic Compute Cloud</span> Cloud computing platform

Amazon Elastic Compute Cloud (EC2) is a part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), that allows users to rent virtual computers on which to run their own computer applications. EC2 encourages scalable deployment of applications by providing a web service through which a user can boot an Amazon Machine Image (AMI) to configure a virtual machine, which Amazon calls an "instance", containing any software desired. A user can create, launch, and terminate server-instances as needed, paying by the second for active servers – hence the term "elastic". EC2 provides users with control over the geographical location of instances that allows for latency optimization and high levels of redundancy. In November 2010, Amazon switched its own retail website platform to EC2 and AWS.

<span class="mw-page-title-main">Amazon Simple Queue Service</span> Cloud-based message queuing service

Amazon Simple Queue Service is a distributed message queuing service introduced by Amazon.com as a beta in late 2004, and generally available in mid 2006. It supports programmatic sending of messages via web service applications as a way to communicate over the Internet. SQS is intended to provide a highly scalable hosted message queue that resolves issues arising from the common producer–consumer problem or connectivity between producer and consumer.

Dynamo is a set of techniques that together can form a highly available key-value structured storage system or a distributed data store. It has properties of both databases and distributed hash tables (DHTs). It was created to help address some scalability issues that Amazon experienced during the holiday season of 2004. By 2007, it was used in Amazon Web Services, such as its Simple Storage Service (S3).

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

<span class="mw-page-title-main">Redis</span> Open-source in-memory key–value database

Redis is an open-source in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAi.

<span class="mw-page-title-main">Amazon Elastic Block Store</span> Cloud-based raw storage service

Amazon Elastic Block Store (EBS) provides raw block-level storage that can be attached to Amazon EC2 instances and is used by Amazon Relational Database Service (RDS). It is one of the two block-storage options offered by AWS, with the other being the EC2 Instance Store.

Amazon Relational Database Service is a distributed relational database service by Amazon Web Services (AWS). It is a web service running "in the cloud" designed to simplify the setup, operation, and scaling of a relational database for use in applications. Administration processes like patching the database software, backing up databases and enabling point-in-time recovery are managed automatically. Scaling storage and compute resources can be performed by a single API call to the AWS control plane on-demand. AWS does not offer an SSH connection to the underlying virtual machine as part of the managed service.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

<span class="mw-page-title-main">Amazon DynamoDB</span> NoSQL database service

Amazon DynamoDB is a fully managed proprietary NoSQL database offered by Amazon.com as part of the Amazon Web Services portfolio. DynamoDB offers a fast persistent Key-Value Datastore with built-in support for replication, autoscaling, encryption at rest, and on-demand backup among other features.

FoundationDB is a free and open-source multi-model distributed NoSQL database developed by Apple Inc. with a shared-nothing architecture. The product was designed around a "core" database, with additional features supplied in "layers." The core database exposes an ordered key–value store with transactions. The transactions are able to read or write multiple keys stored on any machine in the cluster while fully supporting ACID properties. Transactions are used to implement a variety of data models via layers.

This is a timeline of Amazon Web Services, which offers a suite of cloud computing services that make up an on-demand computing platform.

Amazon ElastiCache is a fully managed in-memory data store and cache service by Amazon Web Services (AWS). The service improves the performance of web applications by retrieving information from managed in-memory caches, instead of relying entirely on slower disk-based databases. ElastiCache supports two open-source in-memory caching engines: Memcached and Redis.

In theoretical computer science, the PACELC theorem is an extension to the CAP theorem. It states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and loss of consistency (C).

Amazon DocumentDB is a managed proprietary NoSQL database service that supports document data structures, with some compatibility with MongoDB version 3.6 and version 4.0. As a document database, Amazon DocumentDB can store, query, and index JSON data. It is available on Amazon Web Services. As of March 2023, AWS introduced some compliance with MongoDB 5.0 but lacks time series collection support.

References

  1. What You Need To Know About Amazon SimpleDB
  2. 1 2 "AWS | Amazon SimpleDB – Simple Database Service". Amazon Web Services, Inc.
  3. SimpleDB - Free Tier - A shift in AWS pricing Archived 2008-12-25 at the Wayback Machine
  4. "Limits", SimpleDB Developer Guide, Amazon (API latest version).
  5. Request to Increase Allocation of Amazon SimpleDB Domains. Aws.amazon.com. Retrieved on 2013-08-09.
  6. "AWS Developer Forums: SimpleDB future? ..." forums.aws.amazon.com.
  7. http://aws.amazon.com/dynamodb/faqs/#How_does_Amazon_DynamoDB_differ_from_Amazon_SimpleDB_Which_should_I_use Dynamo created " to address the limitations of SimpleDB."
  8. "Amazon DynamoDB – a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications - All Things Distributed". www.allthingsdistributed.com. 18 January 2012.