Memcached

Last updated
Memcached
Developer(s) Danga Interactive
Initial releaseMay 22, 2003 (2003-05-22)
Stable release
1.6.25 [1]   OOjs UI icon edit-ltr-progressive.svg / 19 March 2024;15 days ago (19 March 2024)
Repository
Written in C
Operating system Cross-platform
Type distributed memory caching system
License Revised BSD license [2]
Website memcached.org   OOjs UI icon edit-ltr-progressive.svg

Memcached (pronounced variously mem-cash-dee or mem-cashed) is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source (such as a database or API) must be read. Memcached is free and open-source software, licensed under the Revised BSD license. [2] Memcached runs on Unix-like operating systems (Linux and macOS) and on Microsoft Windows. It depends on the libevent library.

Contents

Memcached's APIs provide a very large hash table distributed across multiple machines. When the table is full, subsequent inserts cause older data to be purged in least recently used (LRU) order. [3] [4] Applications using Memcached typically layer requests and additions into RAM before falling back on a slower backing store, such as a database.

Memcached has no internal mechanism to track misses which may happen. However, some third party utilities provide this functionality.

Memcached was first developed by Brad Fitzpatrick for his website LiveJournal, on May 22, 2003. [5] [6] It was originally written in Perl, then later rewritten in C by Anatoly Vorobey, then employed by LiveJournal. [7] Memcached is now used by many other systems, including YouTube, [8] Reddit, [9] Facebook, [10] [11] Pinterest, [12] [13] Twitter, [14] Wikipedia, [15] and Method Studios. [16] Google App Engine, Google Cloud Platform, Microsoft Azure, IBM Bluemix and Amazon Web Services also offer a Memcached service through an API. [17] [18] [19] [20]

Software architecture

The system uses a client–server architecture. The servers maintain a key–value associative array; the clients populate this array and query it by key. Keys are up to 250 bytes long and values can be at most 1 megabyte in size.

Clients use client-side libraries to contact the servers which, by default, expose their service at port 11211. Both TCP and UDP are supported. Each client knows all servers; the servers do not communicate with each other. If a client wishes to set or read the value corresponding to a certain key, the client's library first computes a hash of the key to determine which server to use. This gives a simple form of sharding and scalable shared-nothing architecture across the servers. The server computes a second hash of the key to determine where to store or read the corresponding value. The servers keep the values in RAM; if a server runs out of RAM, it discards the oldest values. Therefore, clients must treat Memcached as a transitory cache; they cannot assume that data stored in Memcached is still there when they need it. Other databases, such as MemcacheDB, Couchbase Server, provide persistent storage while maintaining Memcached protocol compatibility.

If all client libraries use the same hashing algorithm to determine servers, then clients can read each other's cached data.

A typical deployment has several servers and many clients. However, it is possible to use Memcached on a single computer, acting simultaneously as client and server. The size of its hash table is often very large. It is limited to available memory across all the servers in the cluster of servers in a data center. Where high-volume, wide-audience Web publishing requires it, this may stretch to many gigabytes. Memcached can be equally valuable for situations where either the number of requests for content is high, or the cost of generating a particular piece of content is high.

Security

Most deployments of Memcached are within trusted networks where clients may freely connect to any server. However, sometimes Memcached is deployed in untrusted networks or where administrators want to exercise control over the clients that are connecting. For this purpose Memcached can be compiled with optional SASL authentication support. The SASL support requires the binary protocol.

A presentation at BlackHat USA 2010 revealed that a number of large public websites had left Memcached open to inspection, analysis, retrieval, and modification of data. [21]

Even within a trusted organisation, the flat trust model of memcached may have security implications. For efficient simplicity, all Memcached operations are treated equally. Clients with a valid need for access to low-security entries within the cache gain access to all entries within the cache, even when these are higher-security and that client has no justifiable need for them. If the cache key can be either predicted, guessed or found by exhaustive searching, its cache entry may be retrieved.

Some attempt to isolate setting and reading data may be made in situations such as high volume web publishing. A farm of outward-facing content servers have read access to memcached containing published pages or page components, but no write access. Where new content is published (and is not yet in memcached), a request is instead sent to content generation servers that are not publicly accessible to create the content unit and add it to memcached. The content server then retries to retrieve it and serve it outwards.

Used as a DDoS attack vector

In February 2018, CloudFlare reported that misconfigured memcached servers were used to launch DDoS attacks in large scale. [22] The memcached protocol over UDP has a huge amplification factor, of more than 51000. [23] Victims of the DDoS attacks include GitHub, which was flooded with 1.35 Tbit/s peak incoming traffic. [24]

This issue was mitigated in Memcached version 1.5.6, which disabled UDP protocol by default. [25]

Example code

Note that all functions described on this page are pseudocode only. Memcached calls and programming languages may vary based on the API used.

Converting database or object creation queries to use Memcached is simple. Typically, when using straight database queries, example code would be as follows:

functionget_foo(intuserid)data=db_select("SELECT * FROM users WHERE userid = ?",userid)returndata

After conversion to Memcached, the same call might look like the following

functionget_foo(intuserid)/* first try the cache */data=memcached_fetch("userrow:"+userid)ifnotdata/* not found : request database */data=db_select("SELECT * FROM users WHERE userid = ?",userid)/* then store in cache until next get */memcached_add("userrow:"+userid,data)endreturndata

The client would first check whether a Memcached value with the unique key "userrow:userid" exists, where userid is some number. If the result does not exist, it would select from the database as usual, and set the unique key using the Memcached API add function call.

However, if only this API call were modified, the server would end up fetching incorrect data following any database update actions: the Memcached "view" of the data would become out of date. Therefore, in addition to creating an "add" call, an update call would also be needed using the Memcached set function.

functionupdate_foo(intuserid,stringdbUpdateString)/* first update database */result=db_execute(dbUpdateString)ifresult/* database update successful : fetch data to be stored in cache */data=db_select("SELECT * FROM users WHERE userid = ?",userid)/* the previous line could also look like data = createDataFromDBString(dbUpdateString) *//* then store in cache until next get */memcached_set("userrow:"+userid,data)

This call would update the currently cached data to match the new data in the database, assuming the database query succeeds. An alternative approach would be to invalidate the cache with the Memcached delete function, so that subsequent fetches result in a cache miss. Similar action would need to be taken when database records were deleted, to maintain either a correct or incomplete cache.

An alternate cache-invalidation strategy is to store a random number in an agreed-upon cache entry and to incorporate this number into all keys that are used to store a particular kind of entry. To invalidate all such entries at once, change the random number. Existing entries (which were stored using the old number) will no longer be referenced and so will eventually expire or be recycled.

functionstore_xyz_entry(intkey,stringvalue)/* Retrieve the random number - use zero if none exists yet.      *  The key-name used here is arbitrary. */seed=memcached_fetch(":xyz_seed:")ifnotseedseed=0/* Build the key used to store the entry and store it.      *  The key-name used here is also arbitrary. Notice that the "seed" and the user's "key"      *  are stored as separate parts of the constructed hashKey string: ":xyz_data:(seed):(key)."       *  This is not mandatory, but is recommended. */stringhashKey=sprintf(":xyz_data:%d:%d",seed,key)memcached_set(hashKey,value)/* "fetch_entry," not shown, follows identical logic to the above. */functioninvalidate_xyz_cache()existing_seed=memcached_fetch(":xyz_seed:")/* Coin a different random seed */doseed=rand()untilseed!=existing_seed/* Now store it in the agreed-upon place. All future requests will use this number.       *  Therefore, all existing entries become un-referenced and will eventually expire. */memcached_set(":xyz_seed:",seed)

Usage

See also

Related Research Articles

Web development is the work involved in developing a website for the Internet or an intranet. Web development can range from developing a simple single static page of plain text to complex web applications, electronic businesses, and social network services. A more comprehensive list of tasks to which Web development commonly refers, may include Web engineering, Web design, Web content development, client liaison, client-side/server-side scripting, Web server and network security configuration, and e-commerce development.

The SQL SELECT statement returns a result set of rows, from one or more tables.

Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

A single-page application (SPA) is a web application or website that interacts with the user by dynamically rewriting the current web page with new data from the web server, instead of the default method of a web browser loading entire new pages. The goal is faster transitions that make the website feel more like a native app.

The MySQLi Extension is a relational database driver used in the PHP scripting language to provide an interface with MySQL databases.

MemcacheDB is a persistence enabled variant of memcached. MemcacheDB has not been actively maintained since 2009. It is a general-purpose distributed memory caching system often used to speed up dynamic database-driven websites by caching data and objects in memory. It was developed by Steve Chu and Howard Chu. The main difference between MemcacheDB and memcached is that MemcacheDB has its own key-value database system. based on Berkeley DB, so it is meant for persistent storage rather than limited to a non-persistent cache. A version of MemcacheDB using Lightning Memory-Mapped Database (LMDB) is also available, offering greater performance. MemcacheDB is accessed through the same protocol as memcached, so applications may use any memcached API as a means of accessing a MemcacheDB database.

Web storage, sometimes known as DOM storage, is a standard JavaScript API provided by web browsers. It enables websites to store persistent data on users' devices similar to cookies, but with much larger capacity and no information sent in HTTP headers. There are two main web storage types: local storage and session storage, behaving similarly to persistent cookies and session cookies respectively. Web Storage is standardized by the World Wide Web Consortium (W3C) and WHATWG, and is supported by all major browsers.

<span class="mw-page-title-main">Redis</span> Source available in-memory key–value database

Redis is a formerly open-source, now "source available", in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

CUBRID ( "cube-rid") is an open-source SQL-based relational database management system (RDBMS) with object extensions developed by CUBRID Corp. for OLTP. The name CUBRID is a combination of the two words cube and bridge, cube standing for a space for data and bridge standing for data bridge.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

InfinityDB is an all-Java embedded database engine and client/server DBMS with an extended java.util.concurrent.ConcurrentNavigableMap interface that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-tree architecture that enables client programmers to reach high levels of performance without risk of failures.

A cache stampede is a type of cascading failure that can occur when massively parallel computing systems with caching mechanisms come under a very high load. This behaviour is sometimes also called dog-piling.

<span class="mw-page-title-main">Rendezvous hashing</span>

Rendezvous or highest random weight (HRW) hashing is an algorithm that allows clients to achieve distributed agreement on a set of options out of a possible set of options. A typical application is when clients need to agree on which sites objects are assigned to.

In computing, Hazelcast is a unified real-time data platform based on Java that combines a fast data store with stream processing. It is also the name of the company developing the product. The Hazelcast company is funded by venture capital and headquartered in Palo Alto, California.

In cryptography, the Salted Challenge Response Authentication Mechanism (SCRAM) is a family of modern, password-based challenge–response authentication mechanisms providing authentication of a user to a server. As it is specified for Simple Authentication and Security Layer (SASL), it can be used for password-based logins to services like LDAP, HTTP, SMTP, POP3, IMAP and JMAP (e-mail), XMPP (chat), or MongoDB and PostgreSQL (databases). For XMPP, supporting it is mandatory.

Elliptics is a distributed key–value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups. Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files.

Infinispan is a distributed cache and key-value NoSQL data store software developed by Red Hat. Java applications can embed it as library, use it as a service in WildFly or any non-java applications can use it, as remote service through TCP/IP.

Amazon ElastiCache is a fully managed in-memory data store and cache service by Amazon Web Services (AWS). The service improves the performance of web applications by retrieving information from managed in-memory caches, instead of relying entirely on slower disk-based databases. ElastiCache supports two open-source in-memory caching engines: Memcached and Redis.

<span class="mw-page-title-main">Apache Ignite</span>

Apache Ignite is a distributed database management system for high-performance computing.

References

  1. "Release 1.6.25". 19 March 2024. Retrieved 22 March 2024.
  2. 1 2 "Memcached license". GitHub . Retrieved 2014-06-27.
  3. "Google Code Archive - Long-term storage for Google Code Project Hosting". Code.google.com. Retrieved 2017-06-25.
  4. "Google Code Archive - Long-term storage for Google Code Project Hosting". Code.google.com. Retrieved 2017-06-25.
  5. . Community.livejournal.com (2003-05-22). Retrieved on 2013-09-18.
  6. . Community.livejournal.com (2003-05-27). Retrieved on 2013-09-18.
  7. "lj_dev: memcached". 2013-02-25. Archived from the original on 2013-02-25. Retrieved 2017-06-25.
  8. Cuong Do Cuong (Engineering manager at YouTube/Google) (June 23, 2007). Seattle Conference on Scalability: YouTube Scalability (Online Video - 26th minute). Seattle: Google Tech Talks.
  9. Whitaker, Keir (2010-05-17). "Steve Huffman on Lessons Learned at Reddit | Carsonified". Archived from the original on 2010-05-17. Retrieved 2017-06-25.
  10. "Scaling memcached at Facebook". Facebook.com. 2008-12-12. Retrieved 2017-06-25.
  11. Scaling Memcache at Facebook. USENIX. 2002. ISBN   9781931971003 . Retrieved 2017-06-25.
  12. "Building Pinterest in the cloud". Pinterest Careers. 2013-06-19. Retrieved 2018-03-09.
  13. "A comprehensive, fast, pure-Python memcached client". Github.com. 2018-01-08. Retrieved 2018-03-09.
  14. "It's Not Rocket Science, But It's Our Work". Blog.twitter.com. 2008-06-01. Retrieved 2017-06-25.
  15. "memcached". MediaWiki. Retrieved 2017-06-25.
  16. Rez BoF, SIGGRAPH 2019, archived from the original on 2021-12-12, retrieved 2019-08-09
  17. "Memcache Examples | App Engine standard environment for Python | Google Cloud Platform". Code.google.com. 2017-03-22. Retrieved 2017-06-25.
  18. "About In-Role Cache for Azure Cache". Msdn.microsoft.com. 2015-08-25. Retrieved 2017-06-25.
  19. Verge, Jason (2014-09-23). "Redis Labs: We Have 3,000 Paying Cloud In-Memory NoSQL Customers". Data Center Knowledge. Retrieved 2016-09-10.
  20. "AWS | Amazon ElastiCache – in-memory data store and cache". Aws.amazon.com. Retrieved 2017-06-25.
  21. "SensePost | Blackhat write-up: Go-derper and mining memcaches". Archived from the original on 2018-12-21. Retrieved 2016-09-02.
  22. "Memcrashed - Major amplification attacks from UDP port 11211". CloudFlare. 27 Feb 2018. Retrieved 3 March 2018.
  23. Jeffrey, Cal (Mar 1, 2018). "GitHub falls victim to largest DDoS attack ever recorded".
  24. "February 28th DDoS Incident Report". March 1, 2018. Retrieved 3 March 2018.
  25. "Memcached 1.5.6 Release Notes". GitHub . 2018-02-27. Retrieved 3 March 2018.
  26. "Speedy MySQL 5.6 takes aim at NoSQL, MariaDB". Theregister.co.uk. Retrieved 2017-06-25.
  27. David Felcey (2014-08-13). "Getting Started With The Coherence Memcached Adaptor | Oracle Coherence Blog". Blogs.oracle.com. Archived from the original on 2017-02-23. Retrieved 2017-06-25.
  28. "Using the Memcached protocol endpoint with Infinispan". infinispan.org. Retrieved 2022-04-19.