Data defined storage

Last updated

Data defined storage (also referred to as a data centric approach) is a marketing term for managing, protecting, and realizing value from data by combining application, information and storage tiers. [1]

Contents

This is achieved through a process in which users, applications, and devices gain access to a repository of captured metadata. This access enables them to access, query and manipulate the relevant data, transforming it into information, while also providing a flexible and scalable platform for storing the underlying data. The technology abstracts the data entirely from the storage, allowing fully transparent access for users.

Core technology

Data defined storage focuses on metadata with an emphasis on the content, meaning and value of information over the media, type and location of data. Data centric management enables organizations to take a single, unified approach to managing data across large, distributed locations which includes the use of content and metadata indexing. The technology pillars include:

  1. Media Independent Data Storage: Data defined storage removes media centric data storage boundaries within and across solid-state drive, hard disk drive, cloud storage and tape storage platforms, enables linear scale out functionality through a grid based Map Reduce architecture that leverages enterprise object storage technology and provides transparent data access across globally distributed repositories for high volume storage performance.
  2. Data Security & Identity Management: Data defined storage allows organizations to gain end-to-end identity management down to the individual user and device level to address growing enterprise mobility requirements and enhanced data security and information governance.
  3. Distributed metadata repository: Data defined storage enables organizations to virtualize aggregate file systems into a single global namespace. At ingestion; file, full text index and custom metadata is collected and stored in a distributed metadata repository. This repository is leveraged to enable speed and accuracy of search and discovery, and to extract value leading to informed business decisions and analytics.

Data defined storage builds on the benefits of both object storage and software-defined storage technologies, however, object and software-defined storage can only be mapped to media independent data storage, which enables a media agnostic infrastructure - utilizing any type of storage, including low cost commodity storage to scale out to petabyte-level capacities. Data defined storage unifies all data repositories and exposes globally distributed stores through the global namespace, eliminating data silos and improving storage utilization.

Usage

The first marketing using this term was by Tarmin, in its GridBank product. The data defined storage term might have been mentioned in 2013. [2] Tamrin issued press releases about customers. [3] [4]

The data defined storage term was used for object storage with open protocol access for file system virtualization, such as CIFS, NFS, FTP as well as REST APIs and other cloud protocols such as Amazon S3, CDMI and OpenStack.

See also

Related Research Articles

Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

WebDAV is a set of extensions to the Hypertext Transfer Protocol (HTTP), which allows user agents to collaboratively author contents directly in an HTTP web server by providing facilities for concurrency control and namespace operations, thus allowing Web to be viewed as a writeable, collaborative medium and not just a read-only medium. WebDAV is defined in RFC 4918 by a working group of the Internet Engineering Task Force (IETF).

<span class="mw-page-title-main">Digital object identifier</span> ISO standard unique string identifier for a digital object

A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; they also fit within the URI system. They are widely used to identify academic, professional, and government information, such as journal articles, research reports, data sets, and official publications. DOIs have also been used to identify other types of information resources, like commercial videos.

<span class="mw-page-title-main">Data management</span> Disciplines related to managing data as a resource

Data management comprises all disciplines related to handling data as a valuable resource, it is the practice of managing an organization’s data so it can be analyzed for decision making.

In library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and application of preservation methods and technologies, and combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.

Enterprise content management (ECM) extends the concept of content management by adding a timeline for each content item and, possibly, enforcing processes for its creation, approval, and distribution. Systems using ECM generally provide a secure repository for managed items, analog or digital. They also include one methods for importing content to bring manage new items, and several presentation methods to make items available for use. Although ECM content may be protected by digital rights management (DRM), it is not required. ECM is distinguished from general content management by its cognizance of the processes and procedures of the enterprise for which it is created.

<span class="mw-page-title-main">Archival science</span> Science of storage, registration and preservation of historical data

Archival science, or archival studies, is the study and theory of building and curating archives, which are collections of documents, recordings, photographs and various other materials in physical or digital formats.

<span class="mw-page-title-main">Fedora Commons</span>

Fedora is a digital asset management (DAM) content repository architecture upon which institutional repositories, digital archives, and digital library systems might be built. Fedora is the underlying architecture for a digital repository, and is not a complete management, indexing, discovery, and delivery application. It is a modular architecture built on the principle that interoperability and extensibility are best achieved by the integration of data, interfaces, and mechanisms as clearly defined modules.

The following outline is provided as an overview of and topical guide to technology:

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:

The Handle System is the Corporation for National Research Initiatives's proprietary registry assigning persistent identifiers, or handles, to information resources, and for resolving "those handles into the information necessary to locate, access, and otherwise make use of the resources".

In computer storage, a global file system is a distributed file system that can be accessed from multiple locations, typically across a wide-area network, and provides concurrent access to a global namespace from all locations. In order for a file system to be considered global, it must allow for files to be created, modified, and deleted from any location. This access is typically provided by a cloud storage gateway at each edge location, which provides access using the NFS or SMB network file sharing protocols.

EMC Atmos is a cloud storage services platform developed by EMC Corporation. Atmos can be deployed as either a hardware appliance or as software in a virtual environment. The Atmos technology uses an object storage architecture designed to manage petabytes of information and billions of objects across multiple geographic locations as a single system.

A metadata repository is a database created to store metadata. Metadata is information about the structures that contain the actual data. Metadata is often said to be "data about data", but this is misleading. Data profiles are an example of actual "data about data". Metadata adds one layer of abstraction to this definition– it is data about the structures that contain data. Metadata may describe the structure of any data, of any subject, stored in any format.

Object storage is a computer data storage approach that manages data as "blobs" or "objects", as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. Each object is typically associated with a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level, the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.

Scality is a global technology provider of software-defined storage (SDS) solutions, specializing in distributed file and object storage with cloud data management. Scality maintains offices in Paris (France), London (UK), San Francisco and Washington DC (USA), and Tokyo (Japan) and has employees in 14 countries.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation.

Nirvana was virtual object storage software developed and maintained by General Atomics.

<span class="mw-page-title-main">OpenIO</span>

OpenIO offered object storage for a wide range of high-performance applications. OpenIO was founded in 2015 by Laurent Denel (CEO), Jean-François Smigielski (CTO) and five other co-founders; it leveraged open source software, developed since 2006, based on a grid technology that enabled dynamic behaviour and supported heterogenous hardware. In October 2017 OpenIO completed a $5 million funding round. In July 2020 OpenIO had been acquired by OVH and withdrawn from the market to become the core technology of OVHcloud object storage offering.

References

  1. Peters, Mark. "Unlocking the Power of Data with Data-Defined Storage" (PDF). ESG. Archived from the original (PDF) on 2014-11-29. Retrieved 30 June 2013.
  2. Goyal, Ambuj. "Edge2013 General Session Keynote Speech". IBM Edge. Archived from the original on 2016-04-13. Retrieved 2016-11-27.
  3. Miller, Dan (12 July 2013). "Tarmin and IBM help Premier Oil manage rapidly growing unstructured data". PR Newswire. Archived from the original on 31 July 2013. Retrieved 1 August 2013.
  4. Miller, Dan (17 December 2012). "Leading U.K. MSP brightsolid sees a shining future with Tarmin". PR Newswire. Archived from the original on 29 November 2014. Retrieved 1 August 2013.