DCache

Last updated

dCache is a system for storing and retrieving huge amounts of data, distributed among a large number of heterogeneous server nodes, under a single virtual filesystem tree with a variety of standard access methods. dCache is open source software [1] built in Java and is used by, among others, ten out of fourteen Tier1 sites to CERN [2] to store data from the Large Hadron Collider.

dCache provides methods for exchanging data with backend (tertiary) storage systems as well as space management, pool attraction, dataset replication, hot spot determination and recovery from disk or node failures. Connected to a tertiary storage system, the cache simulates unlimited direct access storage space. Data exchanges to and from the underlying hierarchical storage management system are performed automatically and invisibly to the user. Beside through protocols specific to high-energy physics, data in dCache can be accessed via NFSv4.1 as well as through WebDAV.

Related Research Articles

<span class="mw-page-title-main">Computer data storage</span> Storage of digital data readable by computers

Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

In software engineering, multitier architecture is a client–server architecture in which presentation, application processing and data management functions are physically separated. The most widespread use of multitier architecture is the three-tier architecture.

<span class="mw-page-title-main">Memory hierarchy</span> Computer memory architecture

In computer organisation, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies. Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level programming constructs involving locality of reference.

A database engine is the underlying software component that a database management system (DBMS) uses to create, read, update and delete (CRUD) data from a database. Most database management systems include their own application programming interface (API) that allows the user to interact with their underlying engine without going through the user interface of the DBMS.

In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memory, but that the address space is shared—i.e., the same physical address on two processors refers to the same location in memory. Distributed global address space (DGAS), is a similar term for a wide class of software and hardware implementations, in which each node of a cluster has access to shared memory in addition to each node's private memory.

NetApp, Inc. is an intelligent data infrastructure company that provides unified data storage, integrated data services, and cloud operations (CloudOps) solutions to enterprise customers. The company is based in San Jose, California. It has ranked in the Fortune 500 from 2012 to 2021. Founded in 1992 with an initial public offering in 1995, NetApp offers cloud data services for management of applications and data both online and physically.

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

Hierarchical storage management (HSM), also known as tiered storage, is a data storage and data management technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as solid-state drive arrays, are more expensive than slower devices, such as hard disk drives, optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.

The IBM SAN Volume Controller (SVC) is a block storage virtualization appliance that belongs to the IBM System Storage product family. SVC implements an indirection, or "virtualization", layer in a Fibre Channel storage area network (SAN).

<span class="mw-page-title-main">JADE (programming language)</span> Software development and deployment platform

JADE is a proprietary object-oriented software development and deployment platform product from the New Zealand-based Jade Software Corporation, first released in 1996. It consists of the JADE programming language, Integrated development environment and debugger, integrated application server and object database management system.

The Nordic Data Grid Facility, or NDGF, is a common e-Science infrastructure provided by the Nordic countries for scientific computing and data storage. It is the first and so far only internationally distributed WLCG Tier1 center, providing computing and storage services to experiments at CERN.

In computer science, memory virtualization decouples volatile random access memory (RAM) resources from individual systems in the data centre, and then aggregates those resources into a virtualized memory pool available to any computer in the cluster. The memory pool is accessed by the operating system or applications running on top of the operating system. The distributed memory pool can then be utilized as a high-speed cache, a messaging layer, or a large, shared memory resource for a CPU or a GPU application.

In computer storage, a global file system is a distributed file system that can be accessed from multiple locations, typically across a wide-area network, and provides concurrent access to a global namespace from all locations. In order for a file system to be considered global, it must allow for files to be created, modified, and deleted from any location. This access is typically provided by a cloud storage gateway at each edge location, which provides access using the NFS or SMB network file sharing protocols.

<span class="mw-page-title-main">Data grid</span> Set of services used to access, modify and transfer geographical data

A data grid is an architecture or set of services that allows users to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request.

<span class="mw-page-title-main">Rendezvous hashing</span>

Rendezvous or highest random weight (HRW) hashing is an algorithm that allows clients to achieve distributed agreement on a set of options out of a possible set of options. A typical application is when clients need to agree on which sites objects are assigned to.

In computer engineering, directory-based cache coherence is a type of cache coherence mechanism, where directories are used to manage caches in place of bus snooping. Bus snooping methods scale poorly due to the use of broadcasting. These methods can be used to target both performance and scalability of directory systems.

<span class="mw-page-title-main">Apache Ignite</span>

Apache Ignite is a distributed database management system for high-performance computing.

ONTAP or Data ONTAP or Clustered Data ONTAP (cDOT) or Data ONTAP 7-Mode is NetApp's proprietary operating system used in storage disk arrays such as NetApp FAS and AFF, ONTAP Select, and Cloud Volumes ONTAP. With the release of version 9.0, NetApp decided to simplify the Data ONTAP name and removed the word "Data" from it, removed the 7-Mode image, therefore, ONTAP 9 is the successor of Clustered Data ONTAP 8.

References

  1. "Github". GitHub . Retrieved 11 October 2013.
  2. "CERN twiki: Tier-0 and Tier-1 Grid services - WLCG Storage Deployment". CERN. Retrieved 2 April 2015.