Database caching

Last updated

Database caching is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.

Contents

When these applications are deployed on multi-tier environments that involve browser-based clients, web application servers and backend databases, [1] [2] middle-tier database caching is used to achieve high scalability and performance. [2]

In a three tier architecture, the application software tier and data storage tier can be in different hosts. Throughput of an application can be limited by the network speed. This limitation can be minimized by having the database at the application tier. Because commercial database software makes extensive use of system resources, it is not always practical to have the application and the database at the same host. In this case, a more light-weight database application can be used to cache data from the commercial database management system.

Benefits

Database caching improves scalability by distributing query workload from backend to multiple cheap front-end systems. It allows flexibility in the processing of data; for example, the data of Platinum customers can be cached while that of ordinary customers are not. Caching can improve availability of data, by providing continued service for applications that depend only on cached tables even if the backend server is unavailable. Another benefit is improved data access speeds brought about by locality of data and smoothing out load peaks by avoiding round-trips between middle-tier and data-tier. [3]

Potential design elements

Pitfalls in implementations

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location ; or maybe dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.

<span class="mw-page-title-main">Cache coherence</span> Computer architecture term concerning shared resource data

In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with CPUs in a multiprocessing system.

Scalability is the property of a system to handle a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system.

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

Online transaction processing (OLTP) is a type of database system used in transaction-oriented applications, such as many operational systems. "Online" refers to that such systems are expected to respond to user requests and process them in real-time. The term is contrasted with online analytical processing (OLAP) which instead focuses on data analysis.

A web framework (WF) or web application framework (WAF) is a software framework that is designed to support the development of web applications including web services, web resources, and web APIs. Web frameworks provide a standard way to build and deploy web applications on the World Wide Web. Web frameworks aim to automate the overhead associated with common activities performed in web development. For example, many web frameworks provide libraries for database access, templating frameworks, and session management, and they often promote code reuse. Although they often target development of dynamic web sites, they are also applicable to static websites.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

The IBM SAN Volume Controller (SVC) is a block storage virtualization appliance that belongs to the IBM System Storage product family. SVC implements an indirection, or "virtualization", layer in a Fibre Channel storage area network (SAN).

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

CUBRID ( "cube-rid") is an open-source SQL-based relational database management system (RDBMS) with object extensions developed by CUBRID Corp. for OLTP. The name CUBRID is a combination of the two words cube and bridge, cube standing for a space for data and bridge standing for data bridge.

Bristol Standard Asynchronous/Synchronous Protocol (BSAP) is an industrial automation protocol developed by Bristol Babcock and managed by Emerson. It is a master-slave protocol suited to both synchronous high speed local networks and asynchronous low speed wide area networks. BSAP offers high message security for communication over telephone lines and radio networks by using effective error checking method and constantly exchanging the communication statistics. The polling scheme used by BSAP makes sure that each node in the network has an equal priority for requests and responses. BSAP also provides a mechanism of global addressing of all nodes connected for special messages like time synchronization apart from the normal individual addressing schemes. BSAP supports remote database access methods for reading and writing the memory in groups. It also supports multiple messaging schemes in which each node can transfer the node to other networks transparently and can transmit the responses in similar manner.

Sherpa is a cloud storage platform developed by Yahoo!. It is a hosted, distributed, and geographically replicated key-value data store. The service is a NoSQL system that has been developed by Yahoo!, to address the scalability, availability, and latency needs of the conglomerate’s websites. Sherpa has abilities such as elastic growth, multi-tenancy, global footprint for local low-latency access, asynchronous replication, representational state transfer (REST) based web service APIs, novel per-record consistency knobs, high availability, compression, secondary indexes, and record-level replication.

eXtremeDB is a high-performance, low-latency, ACID-compliant embedded database management system using an in-memory database system (IMDS) architecture and designed to be linked into C/C++ based programs. It works on Windows, Linux, and other real-time and embedded operating systems.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

Transbase is a relational database management system, developed and maintained by Transaction Software GmbH, Munich. The development of Transbase was started in the 1980s by Rudolf Bayer under the name "Merkur" at the department of Computer Science of the Technical University of Munich (TUM).

In computing, Hazelcast is a unified real-time data platform based on Java that combines a fast data store with stream processing. It is also the name of the company developing the product. The Hazelcast company is funded by venture capital and headquartered in Palo Alto, California.

Elliptics is a distributed key–value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups. Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files.

Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases use a host of techniques to cope.

References

  1. Larson, Per-åke; Goldstein, Jonathan (2004). "MTCache: Transparent mid-tier database caching". CiteSeerX   10.1.1.95.875 .{{cite journal}}: Cite journal requires |journal= (help)
  2. 1 2 Altinel, Mehmet; Luo, Qiong; Krishnamurthy, Sailesh; Mohan, C.; Pirahesh, Hamid; Lindsay, Bruce G.; Woo, Honguk; Brown, Larry (2002). "DBCache: Database Caching For Web Application Servers" (PDF). CiteSeerX   10.1.1.104.8991 .{{cite journal}}: Cite journal requires |journal= (help)
  3. "Middle-tier Database Caching for e-Business". CiteSeerX   10.1.1.140.8455 .{{cite journal}}: Cite journal requires |journal= (help)
  4. "Why Databases Should Bypass the Linux Page Cache". 13 March 2024. Retrieved 2 April 2024.