Metadata engine

Last updated

A metadata engine collects, stores and analyzes information about data and metadata (data about data) in use within a knowledge domain. [1] It virtualizes the view of data for an application by separating the data (physical) path from the metadata (logical) path so that data management can be performed independently of, where the data physically resides. This expands the domain beyond a single storage device to span all devices within its namespace.

The main use of a metadata engine is to enable flexible IT infrastructure or environments, by making applications more storage aware and conversely, a storage device application or data aware.

Purpose

Metadata engines virtualize the view of data. Combined with client access protocols such as NFS v4.2 and SMB 2.1 or SMB 3.0 (Samba), a metadata engine can create a global namespace to allow applications to see a logical view of their data, and can orchestrate data movement between different types or tiers of storage without disrupting the application's access to its data. [2]

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

In computing, a file server is a computer attached to a network that provides a location for shared disk access, i.e. storage of computer files that can be accessed by workstations within a computer network. The term server highlights the role of the machine in the traditional client–server scheme, where the clients are the workstations using the storage. A file server does not normally perform computational tasks or run programs on behalf of its client workstations.

<span class="mw-page-title-main">Network-attached storage</span> Computer data storage server

Network-attached storage (NAS) is a file-level computer data storage server connected to a computer network providing data access to a heterogeneous group of clients. The term "NAS" can refer to both the technology and systems involved, or a specialized device built for such functionality.

In computing, a named pipe is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication (IPC). The concept is also found in OS/2 and Microsoft Windows, although the semantics differ substantially. A traditional pipe is "unnamed" and lasts only as long as the process. A named pipe, however, can last as long as the system is up, beyond the life of the process. It can be deleted if no longer used. Usually a named pipe appears as a file, and generally processes attach to it for IPC.

In computing, the term virtual directory has a couple of meanings. It may simply designate a folder which appears in a path but which is not actually a subfolder of the preceding folder in the path. However, this article will discuss the term in the context of directory services and identity management.

<span class="mw-page-title-main">File system</span> Format or program for storing files and directories

In computing, a file system or filesystem is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stopped and the next began, or where any piece of data was located when it was time to retrieve it. By separating the data into pieces and giving each piece a name, the data are easily isolated and identified. Taking its name from the way a paper-based data management system is named, each group of data is called a "file". The structure and logic rules used to manage the groups of data and their names is called a "file system."

Distributed File System (DFS) is a set of client and server services that allow an organization using Microsoft Windows servers to organize many distributed SMB file shares into a distributed file system. DFS has two components to its service: Location transparency and Redundancy. Together, these components enable data availability in the case of failure or heavy load by allowing shares in multiple different locations to be logically grouped under one folder, the "DFS root".

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

A logical partition (LPAR) is a subset of a computer's hardware resources, virtualized as a separate computer. In effect, a physical machine can be partitioned into multiple logical partitions, each hosting a separate instance of an operating system.

On Microsoft Windows, a special folder is a folder that is presented to the user through an interface as an abstract concept instead of an absolute folder path. Special folders make it possible for any application to ask the operating system where an appropriate location for certain kinds of files can be found; independently of which version or user language of Windows is being used.

In computer science, storage virtualization is "the process of presenting a logical view of the physical storage resources to" a host computer system, "treating all storage media in the enterprise as a single pool of storage."

In computing, Microsoft's Windows Vista and Windows Server 2008 introduced in 2007/2008 a new networking stack named Next Generation TCP/IP stack, to improve on the previous stack in several ways. The stack includes native implementation of IPv6, as well as a complete overhaul of IPv4. The new TCP/IP stack uses a new method to store configuration settings that enables more dynamic control and does not require a computer restart after a change in settings. The new stack, implemented as a dual-stack model, depends on a strong host-model and features an infrastructure to enable more modular components that one can dynamically insert and remove.

In computing, virtualization or virtualisation is the act of creating a virtual version of something at the same abstraction level, including virtual computer hardware platforms, storage devices, and computer network resources.

In computer science, memory virtualization decouples volatile random access memory (RAM) resources from individual systems in the data centre, and then aggregates those resources into a virtualized memory pool available to any computer in the cluster. The memory pool is accessed by the operating system or applications running on top of the operating system. The distributed memory pool can then be utilized as a high-speed cache, a messaging layer, or a large, shared memory resource for a CPU or a GPU application.

Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can provide a single customer view of the overall data.

EMC Atmos is a cloud storage services platform developed by EMC Corporation. Atmos can be deployed as either a hardware appliance or as software in a virtual environment. The Atmos technology uses an object storage architecture designed to manage petabytes of information and billions of objects across multiple geographic locations as a single system.

Data defined storage is a marketing term for managing, protecting, and realizing value from data by combining application, information and storage tiers.

Object storage is a computer data storage approach that manages data as "blobs" or "objects", as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. Each object is typically associated with a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level, the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.

Nirvana was virtual object storage software developed and maintained by General Atomics.

ONTAP or Data ONTAP or Clustered Data ONTAP (cDOT) or Data ONTAP 7-Mode is NetApp's proprietary operating system used in storage disk arrays such as NetApp FAS and AFF, ONTAP Select, and Cloud Volumes ONTAP. With the release of version 9.0, NetApp decided to simplify the Data ONTAP name and removed the word "Data" from it, and remove the 7-Mode image, therefore, ONTAP 9 is the successor of Clustered Data ONTAP 8.

References

  1. Kendall, Aaron. "Metadata-Driven Design: Designing a Flexible Engine for API Data Retrieval". InfoQ. Retrieved 25 April 2017.
  2. Flynn, David (16 March 2017). "Supercharge Your Existing Storage with a Metadata Engine". Data Center Knowledge. Retrieved 25 April 2017.