Data defined storage (also referred to as a data centric approach) is a marketing term for managing, protecting, and realizing the value from data by combining application, information and storage tiers. [1]
This is a process in which users, applications, and devices gain access to a repository of captured metadata that allows them to access, query and manipulate relevant data, transforming it into information while also establishing a flexible and scalable platform for storing the underlying data. The technology is said to abstract the data entirely from the storage, trying to provide fully transparent access for users.
Data defined storage explains information about metadata with an emphasis on the content, meaning and value of information over the media, type and location of data. Data-centric management enables organizations to adopt a single, unified approach to managing data across large, distributed locations, which includes the use of content and metadata indexing. The technology pillars include:
Data defined storage focuses on the benefits of both object storage and software-defined storage technologies. However, object and software-defined storage can only be mapped to media independent data storage, which enables a media agnostic infrastructure - utilizing any type of storage, including low cost commodity storage to scale out to petabyte-level capacities. Data defined storage unifies all data repositories and exposes globally distributed stores through the global namespace, eliminating data silos and improving storage utilization.
The first marketing campaign to use the term data defined storage was from the company Tarmin, for its product GridBank. The term may have been mentioned as early as 2013. [2]
The term was used for object storage with open protocol access for file system virtualization, such as CIFS, NFS, FTP as well as REST APIs and other cloud protocols such as Amazon S3, CDMI and OpenStack.
A disk image is a snapshot of a storage device's structure and data typically stored in one or more computer files on another storage device.
WebDAV is a set of extensions to the Hypertext Transfer Protocol (HTTP), which allows user agents to collaboratively author contents directly in an HTTP web server by providing facilities for concurrency control and namespace operations, thus allowing Web to be viewed as a writeable, collaborative medium and not just a read-only medium. WebDAV is defined in RFC 4918 by a working group of the Internet Engineering Task Force (IETF).
A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; they also fit within the URI system. They are widely used to identify academic, professional, and government information, such as journal articles, research reports, data sets, and official publications.
Enterprise information integration (EII) is the ability to support a unified view of data and information for an entire organization. In a data virtualization application of EII, a process of information integration, using data abstraction to provide a unified interface for viewing all the data within an organization, and a single set of structures and naming conventions to represent this data; the goal of EII is to get a large set of heterogeneous data sources to appear to a user or system as a single, homogeneous data source.
In library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and application of preservation methods and technologies, and combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.
Enterprise content management (ECM) extends the concept of content management by adding a timeline for each content item and, possibly, enforcing processes for its creation, approval, and distribution. Systems using ECM generally provide a secure repository for managed items, analog or digital. They also include one methods for importing content to manage new items, and several presentation methods to make items available for use. Although ECM content may be protected by digital rights management (DRM), it is not required. ECM is distinguished from general content management by its cognizance of the processes and procedures of the enterprise for which it is created.
Fedora is a digital asset management (DAM) content repository architecture upon which institutional repositories, digital archives, and digital library systems might be built. Fedora is the underlying architecture for a digital repository, and is not a complete management, indexing, discovery, and delivery application. It is a modular architecture built on the principle that interoperability and extensibility are best achieved by the integration of data, interfaces, and mechanisms as clearly defined modules.
The following outline is provided as an overview of and topical guide to technology:
Although information has been bought and sold since ancient times, the idea of an information marketplace is relatively recent. The nature of such markets is still evolving, which complicates development of sustainable business models. However, certain attributes of information markets are beginning to be understood, such as diminished participation costs, opportunities for customization, shifting customer relations, and a need for order.
Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
The Handle System is a proprietary registry assigning persistent identifiers, or handles, to information resources, and for resolving "those handles into the information necessary to locate, access, and otherwise make use of the resources". As with handles used elsewhere in computing, Handle System handles are opaque, and encode no information about the underlying resource, being bound only to metadata regarding the resource. Consequently, the handles are not rendered invalid by changes to the metadata.
In computer storage, a global file system is a distributed file system that can be accessed from multiple locations, typically across a wide-area network, and provides concurrent access to a global namespace from all locations. In order for a file system to be considered global, it must allow for files to be created, modified, and deleted from any location. This access is typically provided by a cloud storage gateway at each edge location, which provides access using the NFS or SMB network file sharing protocols.
EMC Atmos is a cloud storage services platform developed by EMC Corporation. Atmos can be deployed as either a hardware appliance or as software in a virtual environment. The Atmos technology uses an object storage architecture designed to manage petabytes of information and billions of objects across multiple geographic locations as a single system.
A metadata repository is a database created to store metadata. Metadata is information about the structures that contain the actual data. Metadata is often said to be "data about data", but this is misleading. Data profiles are an example of actual "data about data". Metadata adds one layer of abstraction to this definition– it is data about the structures that contain data. Metadata may describe the structure of any data, of any subject, stored in any format.
Object storage is a computer data storage approach that manages data as "blobs" or "objects", as opposed to other storage architectures like file systems, which manage data as a file hierarchy, and block storage, which manages data as blocks within sectors and tracks. Each object is typically associated with a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level, the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.
Scality is a global technology provider of software-defined storage (SDS) solutions, specializing in distributed file and object storage with cloud data management. Scality maintains offices in Paris (France), London (UK), San Francisco and Washington DC (USA), and Tokyo (Japan) and has employees in 14 countries.
Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.
Nirvana was virtual object storage software developed and maintained by General Atomics.
OpenIO offered object storage for a wide range of high-performance applications. OpenIO was founded in 2015 by Laurent Denel (CEO), Jean-François Smigielski (CTO) and five other co-founders; it leveraged open source software, developed since 2006, based on a grid technology that enabled dynamic behaviour and supported heterogenous hardware. In October 2017 OpenIO was completed a $5 million funding rounds. In July 2020 OpenIO had been acquired by OVH and withdrawn from the market to become the core technology of OVHcloud object storage offering.