Cloud Data Management Interface

Last updated
Cloud Data Management Interface
AbbreviationCDMI
StatusPublished
Year started2009
Latest version1.1.1
Organization Storage Networking Industry Association
Base standards Hypertext Transfer Protocol
Related standards Network File System
Domain Cloud computing
Website www.snia.org/cloud

The Cloud Data Management Interface (CDMI) is a SNIA standard that specifies a protocol for self-provisioning, administering and accessing cloud storage. [1]

Contents

CDMI defines RESTful HTTP operations for assessing the capabilities of the cloud storage system, allocating and accessing containers and objects, managing users and groups, implementing access control, attaching metadata, making arbitrary queries, using persistent queues, specifying retention intervals and holds for compliance purposes, using a logging facility, billing, moving data between cloud systems, and exporting data via other protocols such as iSCSI and NFS. Transport security is obtained via TLS.

Capabilities

Compliant implementations must provide access to a set of configuration parameters known as capabilities. These are either boolean values that represent whether or not a system supports things such as queues, export via other protocols, path-based storage and so on, or numeric values expressing system limits, such as how much metadata may be placed on an object. As a minimal compliant implementation can be quite small, with few features, clients need to check the cloud storage system for a capability before attempting to use the functionality it represents. Resource allocation assignments limited to the data management interface protocols must possess access bypass capabilities which extend beyond the layered framework. [2] This integral function is vital to the prevention of transport layer session hijacking by unauthorized entities which may circumvent standard interfacing security parameters. [3]

Containers

A CDMI client may access objects, including containers, by either name or object id (OID), assuming the CDMI server supports both methods. When storing objects by name, it is natural to use nested named containers; the resulting structure corresponds exactly to a traditional filesystem directory structure.

Objects

Objects are similar to files in a traditional file system, but are enhanced with an increased amount and capacity for metadata. As with containers, they may be accessed by either name or OID. When accessed by name, clients use URLs that contain the full pathname of objects to create, read, update and delete them. When accessed by OID, the URL specifies an OID string in the cdmi-objectid container; this container presents a flat name space conformant with standard object storage system semantics.

Subject to system limits, objects may be of any size or type and have arbitrary user-supplied metadata attached to them. Systems that support query allow arbitrary queries to be run against the metadata.

Domains, Users and Groups

CDMI supports the concept of a domain, similar in concept to a domain in the Windows Active Directory model. Users and groups created in a domain share a common administrative database and are known to each other on a "first name" basis, i.e. without reference to any other domain or system.

Domains also function as containers for usage and billing summary data.

Access Control

CDMI exactly follows the ACL and ACE model used for file authorization operations by NFSv4. This makes it also compatible with Microsoft Windows systems.

Metadata

CDMI draws much of its metadata model from the XAM specification. Objects and containers have "storage system metadata", "data system metadata" and arbitrary user specified metadata, in addition to the metadata maintained by an ordinary filesystem (atime etc.).

Queries

CDMI specifies a way for systems to support arbitrary queries against CDMI containers, with a rich set of comparison operators, including support for regular expressions.

Queues

CDMI supports the concept of persistent FIFO (first-in, first-out) queues. These are useful for job scheduling, order processing and other tasks in which lists of things must be processed in order.

Compliance

Both retention intervals and retention holds are supported by CDMI. A retention interval consists of a start time and a retention period. During this time interval, objects are preserved as immutable and may not be deleted. A retention hold is usually placed on an object because of judicial action and has the same effect: objects may not be changed nor deleted until all holds placed on them are removed.

Logging

CDMI clients can sign up for logging of system, security and object access events on servers that support it. This feature allows clients to see events locally as the server logs them.

Billing

Summary information suitable for billing clients for on-demand services can be obtained by authorized users from systems that support it.

Serialization

Serialization of objects and containers allows export of all data and metadata on a system and importation of that data into another cloud system.

Foreign protocols

CDMI supports export of containers as NFS or CIFS shares. Clients that mount these shares see the container hierarchy as an ordinary filesystem directory hierarchy, and the objects in the containers as normal files. Metadata outside of ordinary filesystem metadata may or may not be exposed.

Provisioning of iSCSI LUNs is also supported.

Client SDKs

See also

Related Research Articles

Object Linking and Embedding (OLE) is a proprietary technology developed by Microsoft that allows embedding and linking to documents and other objects. For developers, it brought OLE Control Extension (OCX), a way to develop and use custom user interface elements. On a technical level, an OLE object is any object that implements the IOleObject interface, possibly along with a wide range of other interfaces, depending on the object's needs.

<span class="mw-page-title-main">Virtual file system</span> Abstract layer on top of a more concrete file system

A virtual file system (VFS) or virtual filesystem switch is an abstract layer on top of a more concrete file system. The purpose of a VFS is to allow client applications to access different types of concrete file systems in a uniform way. A VFS can, for example, be used to access local and network storage devices transparently without the client application noticing the difference. It can be used to bridge the differences in Windows, classic Mac OS/macOS and Unix filesystems, so that applications can access files on local file systems of those types without having to know what type of file system they are accessing.

<span class="mw-page-title-main">Hierarchical Data Format</span> Set of file formats

Hierarchical Data Format (HDF) is a set of file formats designed to store and organize large amounts of data. Originally developed at the U.S. National Center for Supercomputing Applications, it is supported by The HDF Group, a non-profit corporation whose mission is to ensure continued development of HDF5 technologies and the continued accessibility of data stored in HDF.

Filesystem in Userspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

IBM Storage Protect is a data protection platform that gives enterprises a single point of control and administration for backup and recovery. It is the flagship product in the IBM Spectrum Protect family.

Content-addressable storage (CAS), also referred to as content-addressed storage or fixed-content storage, is a way to store information so it can be retrieved based on its content, not its name or location. It has been used for high-speed storage and retrieval of fixed content, such as documents stored for compliance with government regulations. Content-addressable storage is similar to content-addressable memory.

Semantic file systems are file systems used for information persistence which structure the data according to their semantics and intent, rather than the location as with current file systems. It allows the data to be addressed by their content. Traditional hierarchical file-systems tend to impose a burden, for example when a sub-directory layout is contradicting a user's perception of where files would be stored. Having a tag-based interface alleviates this hierarchy problem and enables users to query for data in an intuitive fashion.

Ceph is a free and open-source software-defined storage platform that provides object storage, block storage, and file storage built on a common distributed cluster foundation. Ceph provides completely distributed operation without a single point of failure and scalability to the exabyte level, and is freely available. Since version 12 (Luminous), Ceph does not rely on any other conventional filesystem and directly manages HDDs and SSDs with its own storage backend BlueStore and can expose a POSIX filesystem.

<span class="mw-page-title-main">Windows Search</span> Desktop search platform by Microsoft

Windows Search is a content index and desktop search platform by Microsoft introduced in Windows Vista as a replacement for the previous Indexing Service of Windows 2000, Windows XP, and Windows Server 2003, designed to facilitate local and remote queries for files and non-file items in the Windows Shell and in compatible applications. It was developed after the postponement of WinFS and introduced to Windows several benefits of that platform.

Mark A. Carlson was a software engineer known in the systems management industry for his work in management standards and technology. Mark was the first employee of a small startup in Boulder, Colorado called Redcape Policy Software. Sun Microsystems acquired the company and its technology in 1998 and subsequently promoted it as Jiro, a common management framework based on Java and Jini.

The Linear Tape File System (LTFS) is a file system that allows files stored on magnetic tape to be accessed in a similar fashion to those on disk or removable flash drives. It requires both a specific format of data on the tape media and software to provide a file system interface to the data.

A cloud storage gateway is a hybrid cloud storage device, implemented in hardware or software, which resides at the customer premises and translates cloud storage APIs such as SOAP or REST to block-based storage protocols such as iSCSI or Fibre Channel or file-based interfaces such as NFS or SMB.

Object storage is a computer data storage approach that manages data as "blobs" or "objects", as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. Each object is typically associated with a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level, the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.

<span class="mw-page-title-main">Distributed Data Management Architecture</span> Open, published architecture for creating, managing and accessing data on a remote computer

Distributed Data Management Architecture (DDM) is IBM's open, published software architecture for creating, managing and accessing data on a remote computer. DDM was initially designed to support record-oriented files; it was extended to support hierarchical directories, stream-oriented files, queues, and system command processing; it was further extended to be the base of IBM's Distributed Relational Database Architecture (DRDA); and finally, it was extended to support data description and conversion. Defined in the period from 1980 to 1993, DDM specifies necessary components, messages, and protocols, all based on the principles of object-orientation. DDM is not, in itself, a piece of software; the implementation of DDM takes the form of client and server products. As an open architecture, products can implement subsets of DDM architecture and products can extend DDM to meet additional requirements. Taken together, DDM products implement a distributed file system.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.

BisQue is a free, open source web-based platform for the exchange and exploration of large, complex datasets. It is being developed at the Vision Research Lab at the University of California, Santa Barbara. BisQue specifically supports large scale, multi-dimensional multimodal-images and image analysis. Metadata is stored as arbitrarily nested and linked tag/value pairs, allowing for domain-specific data organization. Image analysis modules can be added to perform complex analysis tasks on compute clusters. Analysis results are stored within the database for further querying and processing. The data and analysis provenance is maintained for reproducibility of results. BisQue can be easily deployed in cloud computing environments or on computer clusters for scalability. BisQue has been integrated into the NSF Cyberinfrastructure project CyVerse. The user interacts with BisQue via any modern web browser.

Nirvana was virtual object storage software developed and maintained by General Atomics.

References

  1. "Cloud Data Management Interface". SNIA. Retrieved 26 June 2011.
  2. Metheny, M (2017). Federal Cloud Computing: The Definitive Guide for Cloud Service Providers. Syngress. pp. 202–245.
  3. da Fonseca, N (2015). Cloud Services, Networking, and Management. John Wiley & Sons. pp. 70–98.