Distributed cache

Last updated

In computing, a distributed cache is an extension of the traditional concept of cache used in a single locale. A distributed cache may span multiple servers so that it can grow in size and in transactional capacity. It is mainly used to store application data residing in database and web session data. The idea of distributed caching [1] has become feasible now because main memory has become very cheap and network cards have become very fast, with 1 Gbit now standard everywhere and 10 Gbit gaining traction.[ when? ] Also, a distributed cache works well on lower cost machines usually employed for web servers as opposed to database servers which require expensive hardware. [2] An emerging internet architecture known as Information-centric networking (ICN) is one of the best examples of a distributed cache network. The ICN is a network level solution hence the existing distributed network cache management schemes are not well suited for ICN. [3] In the supercomputer environment, distributed cache is typically implemented in the form of burst buffer.

Contents

Examples

See also

Related Research Articles

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

The Domain Name System (DNS) is a hierarchical and distributed naming system for computers, services, and other resources in the Internet or other Internet Protocol (IP) networks. It associates various information with domain names assigned to each of the associated entities. Most prominently, it translates readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols. The Domain Name System has been an essential component of the functionality of the Internet since 1985.

<span class="mw-page-title-main">Load balancing (computing)</span> Set of techniques to improve the distribution of workloads across multiple computing resources

In computing, load balancing is the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

<span class="mw-page-title-main">Cache coherence</span> Computer architecture term concerning shared resource data

In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with CPUs in a multiprocessing system.

Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

<span class="mw-page-title-main">Distributed hash table</span> Decentralized distributed system with lookup service

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be consistent and the results of reading, writing, or updating memory will be predictable. Consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. Consistency is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

Memcached is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read. Memcached is free and open-source software, licensed under the Revised BSD license. Memcached runs on Unix-like operating systems and on Microsoft Windows. It depends on the libevent library.

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed ; the more items added, the larger the probability of false positives.

<span class="mw-page-title-main">Content delivery network</span> Layer in the internet ecosystem addressing bottlenecks

A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially relative to end users. CDNs came into existence in the late 1990s as a means for alleviating the performance bottlenecks of the Internet as the Internet was starting to become a mission-critical medium for people and enterprises. Since then, CDNs have grown to serve a large portion of the Internet content today, including web objects, downloadable objects, applications, live streaming media, on-demand streaming media, and social media sites.

In computing, cache algorithms are optimizing instructions, or algorithms, that a computer program or a hardware-maintained structure can utilize in order to manage a cache of information stored on the computer. Caching improves performance by keeping recent or often-used data items in memory locations that are faster or computationally cheaper to access than normal memory stores. When the cache is full, the algorithm must choose which items to discard to make room for the new ones.

In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only keys need to be remapped on average where is the number of keys and is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

<span class="mw-page-title-main">IBM storage</span> Product portfolio of IBM

The IBM Storage product portfolio includes disk, flash, tape, NAS storage products, storage software and services. IBM's approach is to focus on data management.

Database caching is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.

In computer science, memory virtualization decouples volatile random access memory (RAM) resources from individual systems in the data centre, and then aggregates those resources into a virtualized memory pool available to any computer in the cluster. The memory pool is accessed by the operating system or applications running on top of the operating system. The distributed memory pool can then be utilized as a high-speed cache, a messaging layer, or a large, shared memory resource for a CPU or a GPU application.

In contrast to IP-based, host-oriented, Internet architecture, Content-Centric Networking (CCN) emphasizes content by making it directly addressable and routable. Endpoints communicate based on named data instead of IP addresses. CCN is characterized by the basic exchange of content request messages and content return messages. It is considered an information-centric networking (ICN) architecture.

<span class="mw-page-title-main">Hazelcast</span> In-memory data grid

In computing, Hazelcast IMDG is an open source in-memory data grid based on Java. It is also the name of the company developing the product. The Hazelcast company is funded by venture capital and headquartered in Palo Alto, California.

In computing, cache algorithms are optimizing instructions‍—‌or algorithms‍—‌that a computer program or a hardware-maintained structure can follow in order to manage a cache of information stored on the computer. When the cache is full, the algorithm must choose which items to discard to make room for the new ones. Due to the inherent caching capability of nodes in Information-centric networking ICN, the ICN can be viewed as a loosely connect network of caches, which has unique requirements of Caching policies. Unlike proxy servers, in Information-centric networking the cache is a network level solution. Therefore, it has rapidly changing cache states and higher request arrival rates; moreover, smaller cache sizes further impose different kind of requirements on the content eviction policies. In particular, eviction policies for Information-centric networking should be fast and lightweight. Various cache replication and eviction schemes for different Information-centric networking architectures and applications are proposed.

Local differential privacy (LDP) is a model of differential privacy with the added restriction that even if an adversary has access to the personal responses of an individual in the database, that adversary will still be unable to learn too much about the user's personal data. This is contrasted with global differential privacy, a model of differential privacy that incorporates a central aggregator with access to the raw data.

<span class="mw-page-title-main">Federated learning</span> Decentralized machine learning

Federated learning is a machine learning technique that trains an algorithm via multiple independent sessions, each using its own dataset. This approach stands in contrast to traditional centralized machine learning techniques where local datasets are merged into one training session, as well as to approaches that assume that local data samples are identically distributed.

References

  1. Paul, S; Z Fei (1 February 2001). "Distributed caching with centralized control". Computer Communications. 24 (2): 256–268. CiteSeerX   10.1.1.38.1094 . doi:10.1016/S0140-3664(00)00322-4.
  2. Khan, Iqbal. "Distributed Caching on the Path To Scalability". MSDN (July 2009). Retrieved 30 March 2012.
  3. Bilal, Muhammad; et al. (2017). "A Cache Management Scheme for Efficient Content Eviction and Replication in Cache Networks". IEEE Access. 5: 1692–1701. arXiv: 1702.04078 . Bibcode:2017arXiv170204078B. doi:10.1109/ACCESS.2017.2669344. S2CID   14517299.