Cloud Data Management Interface

Last updated
Cloud Data Management Interface
AbbreviationCDMI
StatusPublished
Year started2009
Latest version1.1.1
Organization Storage Networking Industry Association
Base standards Hypertext Transfer Protocol
Related standards Network File System
Domain Cloud computing
Website www.snia.org/cloud

The Cloud Data Management Interface (CDMI) is a SNIA standard that specifies a protocol for self-provisioning, administering and accessing cloud storage. [1]

Contents

CDMI defines RESTful HTTP operations for assessing the capabilities of the cloud storage system, allocating and accessing containers and objects, managing users and groups, implementing access control, attaching metadata, making arbitrary queries, using persistent queues, specifying retention intervals and holds for compliance purposes, using a logging facility, billing, moving data between cloud systems, and exporting data via other protocols such as iSCSI and NFS. Transport security is obtained via TLS.

Capabilities

Compliant implementations must provide access to a set of configuration parameters known as capabilities. These are either boolean values that represent whether or not a system supports things such as queues, export via other protocols, path-based storage and so on, or numeric values expressing system limits, such as how much metadata may be placed on an object. As a minimal compliant implementation can be quite small, with few features, clients need to check the cloud storage system for a capability before attempting to use the functionality it represents. Resource allocation assignments limited to the data management interface protocols must possess access bypass capabilities which extend beyond the layered framework. [2] This integral function is vital to the prevention of transport layer session hijacking by unauthorized entities which may circumvent standard interfacing security parameters. [3]

Containers

A CDMI client may access objects, including containers, by either name or object id (OID), assuming the CDMI server supports both methods. When storing objects by name, it is natural to use nested named containers; the resulting structure corresponds exactly to a traditional filesystem directory structure.

Objects

Objects are similar to files in a traditional file system, but are enhanced with an increased amount and capacity for metadata. As with containers, they may be accessed by either name or OID. When accessed by name, clients use URLs that contain the full pathname of objects to create, read, update and delete them. When accessed by OID, the URL specifies an OID string in the cdmi-objectid container; this container presents a flat name space conformant with standard object storage system semantics.

Subject to system limits, objects may be of any size or type and have arbitrary user-supplied metadata attached to them. Systems that support query allow arbitrary queries to be run against the metadata.

Domains, Users and Groups

CDMI supports the concept of a domain, similar in concept to a domain in the Windows Active Directory model. Users and groups created in a domain share a common administrative database and are known to each other on a "first name" basis, i.e. without reference to any other domain or system.

Domains also function as containers for usage and billing summary data.

Access Control

CDMI exactly follows the ACL and ACE model used for file authorization operations by NFSv4. This makes it also compatible with Microsoft Windows systems.

Metadata

CDMI draws much of its metadata model from the XAM specification. Objects and containers have "storage system metadata", "data system metadata" and arbitrary user specified metadata, in addition to the metadata maintained by an ordinary filesystem (atime etc.).

Queries

CDMI specifies a way for systems to support arbitrary queries against CDMI containers, with a rich set of comparison operators, including support for regular expressions.

Queues

CDMI supports the concept of persistent FIFO (first-in, first-out) queues. These are useful for job scheduling, order processing and other tasks in which lists of things must be processed in order.

Compliance

Both retention intervals and retention holds are supported by CDMI. A retention interval consists of a start time and a retention period. During this time interval, objects are preserved as immutable and may not be deleted. A retention hold is usually placed on an object because of judicial action and has the same effect: objects may not be changed nor deleted until all holds placed on them are removed.

Logging

CDMI clients can sign up for logging of system, security and object access events on servers that support it. This feature allows clients to see events locally as the server logs them.

Billing

Summary information suitable for billing clients for on-demand services can be obtained by authorized users from systems that support it.

Serialization

Serialization of objects and containers allows export of all data and metadata on a system and importation of that data into another cloud system.

Foreign protocols

CDMI supports export of containers as NFS or CIFS shares. Clients that mount these shares see the container hierarchy as an ordinary filesystem directory hierarchy, and the objects in the containers as normal files. Metadata outside of ordinary filesystem metadata may or may not be exposed.

Provisioning of iSCSI LUNs is also supported.

Client SDKs

See also

Related Research Articles

Object Linking & Embedding (OLE) is a proprietary technology developed by Microsoft that allows embedding and linking to documents and other objects. For developers, it brought OLE Control Extension (OCX), a way to develop and use custom user interface elements. On a technical level, an OLE object is any object that implements the IOleObject interface, possibly along with a wide range of other interfaces, depending on the object's needs.

Virtual file system

A virtual file system (VFS) or virtual filesystem switch is an abstract layer on top of a more concrete file system. The purpose of a VFS is to allow client applications to access different types of concrete file systems in a uniform way. A VFS can, for example, be used to access local and network storage devices transparently without the client application noticing the difference. It can be used to bridge the differences in Windows, classic Mac OS/macOS and Unix filesystems, so that applications can access files on local file systems of those types without having to know what type of file system they are accessing.

Hierarchical Data Format Set of file formats

Hierarchical Data Format (HDF) is a set of file formats designed to store and organize large amounts of data. Originally developed at the National Center for Supercomputing Applications, it is supported by The HDF Group, a non-profit corporation whose mission is to ensure continued development of HDF5 technologies and the continued accessibility of data stored in HDF.

Filesystem in USErspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in June 2020, Fugaku, as well as previous top supercomputers such as Titan and Sequoia.

IBM Spectrum Protect is a data protection platform that gives enterprises a single point of control and administration for backup and recovery. It is the flagship product in the IBM Spectrum Protect family.

Content-addressable storage, also referred to as content-addressed storage or abbreviated CAS, is a way to store information so it can be retrieved based on its content, not its location. It has been used for high-speed storage and retrieval of fixed content, such as documents stored for compliance with government regulations. Content-addressable storage is like content-addressable memory.

Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

Ceph is an open-source software-defined storage platform that implements object storage on a single distributed computer cluster and provides 3-in-1 interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalability to the exabyte level, and to be freely available. Since version 12 Ceph does not rely on other filesystems and can directly manage HDDs and SSDs with its own storage backend BlueStore and can completely self reliantly expose a POSIX filesystem.

StorNext File System (SNFS), colloquially referred to as "StorNext" is a shared disk file system made by Quantum Corporation. StorNext enables multiple Windows, Linux and Apple workstations to access shared block storage over a Fibre Channel network. With the StorNext file system installed, these computers can read and write to the same storage volume at the same time enabling what is known as a "file-locking SAN." StorNext is used in environments where large files must be shared, and accessed simultaneously by users without network delays, or where a file must be available for access by multiple readers starting at different times. Common use cases include multiple video editor environments in feature film, television and general video post production.

XAM, or the eXtensible Access Method, is a storage standard developed and maintained by the Storage Networking Industry Association (SNIA). It is in the process of being ratified as an ANSI standard. XAM is an API for fixed content aware storage devices. XAM replaces the various proprietary interfaces that have been used for this purpose in the past. Content generating applications now have a standard means of saving and finding their content across a broad array of storage devices.

Windows Search Desktop search platform by Microsoft

Windows Search is a content index desktop search platform by Microsoft introduced in Windows Vista as a replacement for both the previous Indexing Service of Windows 2000 and the optional MSN Desktop Search for Windows XP and Windows Server 2003, designed to facilitate local and remote queries for files and non-file items in compatible applications including Windows Explorer. It was developed after the postponement of WinFS and introduced to Windows constituents originally touted as benefits of that platform.

Mark A. Carlson is a software engineer known in the systems management industry for his work in management standards and technology. Mark was the first employee of a small startup in Boulder, Colorado called Redcape Policy Software. Sun Microsystems acquired the company and its technology in 1998 and subsequently promoted it as Jiro, a common management framework based on Java and Jini.

The Linear Tape File System (LTFS) is a file system that allows files stored on magnetic tape to be accessed in a similar fashion to those on disk or removable flash drives. It requires both a specific format of data on the tape media and software to provide a file system interface to the data.

A cloud storage gateway is a network appliance or server which resides at the customer premises and translates cloud storage APIs such as SOAP or REST to block-based storage protocols such as iSCSI or Fibre Channel or file-based interfaces such as NFS or SMB.

Object storage is a computer data storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level, the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.

Distributed Data Management Architecture

Distributed Data Management Architecture (DDM) is IBM's open, published software architecture for creating, managing and accessing data on a remote computer. DDM was initially designed to support record-oriented files; it was extended to support hierarchical directories, stream-oriented files, queues, and system command processing; it was further extended to be the base of IBM's Distributed Relational Database Architecture (DRDA); and finally, it was extended to support data description and conversion. Defined in the period from 1980 to 1993, DDM specifies necessary components, messages, and protocols, all based on the principles of object-orientation. DDM is not, in itself, a piece of software; the implementation of DDM takes the form of client and server products. As an open architecture, products can implement subsets of DDM architecture and products can extend DDM to meet additional requirements. Taken together, DDM products implement a distributed file system.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Google originally designed Kubernetes, but the Cloud Native Computing Foundation now maintains the project.

Nirvana was virtual object storage software developed and maintained by General Atomics.

References

  1. "Cloud Data Management Interface". SNIA. Retrieved 26 June 2011.
  2. Metheny, M (2017). Federal Cloud Computing: The Definitive Guide for Cloud Service Providers. Syngress. pp. 202–245.
  3. da Fonseca, N (2015). Cloud Services, Networking, and Management. John Wiley & Sons. pp. 70–98.