Nirvana (software)

Last updated
Nirvana
Developer(s) General Atomics
Initial releaseAugust 8, 2003;19 years ago (2003-08-08)
Stable release
5.0 / September 15, 2016;6 years ago (2016-09-15)
Preview release
5.1 / September 15, 2016;6 years ago (2016-09-15)
Written in C
Operating system Linux, Microsoft Windows, OS X, Solaris (operating system)
Platform X86-64, POWER8, SPARC
Type Metadata and data management software
License Proprietary commercial software
Website www.ga.com/nirvana

Nirvana was virtual object storage software developed and maintained by General Atomics.

Contents

It can also be described as metadata, data placement and data management software that lets organizations manage unstructured data on multiple storage devices located anywhere in the world in order to orchestrate global data intensive workflows, and search for and locate data no matter where it is located or when it was created. Nirvana does this by capturing system and user-defined metadata to enable detailed search and enact policies to control data movement and protection. Nirvana also maintains data provenance, audit, security and access control.

Nirvana can reduce storage costs by identifying data to be moved to lower cost storage and data that no longer needs to be stored.

History

Nirvana is the result of research started in 1995 at the San Diego Supercomputer Center (SDSC) (which was founded by and run at the time by General Atomics [1] ), in response to a DARPA sponsored project for a Massive Data Analysis System. [2] Led by General Atomics computational plasma physicist Dr. Reagan Moore, development continued through the cooperative efforts of General Atomics and the SDSC on the Storage Resource Broker (SRB), with the support of the National Science Foundation (NSF). SRB 1.1 was delivered in 1998, [3] demonstrating a logical distributed file system with a single Global Namespace across geographically distributed storage systems.

In 2003, General Atomics turned over operation of the SDSC to the University of California San Diego (UCSD) and Dr. Moore became a full-time professor there establishing the Data Intensive Computing Environments (DICE) Center, continuing development of SRB. In that same year, General Atomics acquired the exclusive license to develop a commercial version of SRB, calling it Nirvana. [4] The DICE team ended development of SRB in 2006 and started a rules oriented data management project called iRODS [5] for open source distribution. Dr. Moore and his DICE team relocated to the University of North Carolina at Chapel Hill where iRODS is now maintained by the iRODS Consortium. [6] General Atomics continued development of Nirvana at their San Diego headquarters, focusing on capabilities to serve government and commercial users, including high scalability, fail-over, performance, implementation, maintenance and support.

Nirvana History1.png
Nirvana History

In 2009, General Atomics won a data management contract with the US Department of Defense (DOD) High Performance Computing Modernization Program. [7] The requirements of this contract focused General Atomics to expand Nirvana’s performance, scalability, security and ease of use. A major deliverable involved integrating Nirvana with Oracle Corporation's SAM-QFS filesystem to provide a policy-based Hierarchical Storage Management (HSM) system with near real-time event synchronization. General Atomics also announced that digital marketing firm infoGROUP deployed Nirvana to create a Global Name Space across three of infoGROUP’s computer operations centers in the Omaha area. [8]

In 2012, General Atomics released Nirvana version 4.3. [9]

In 2014, General Atomics changed the Nirvana business model from a large government contract, fee for service model, to a standard commercial software model.

In 2015, General Atomics initiated a strategic relationship with pixitmedia/arcastream in the United Kingdom, integrating Nirvana with pixitmedia and arcastream’s products. [10]

In 2016, General Atomics released Nirvana version 5.0. [11]

In May 2018, probes of Nirvana marketing and support URLs under the General Atomics corporate umbrella (www.Nirvanastorage.com, [12] www.ga.com/nirvana [13] and https://www.nirvanaware.com [14] ) and more recently branded integration offerings such as "Nirvana EasyHSM" (www.ga.com/easyhsm [15] (mentioned in a Jan. 2017 marketing slideshare at [16] )) return "cannot be found" from www.ga.com or connection timeout. A "Nirvana" keyword search at www.ga.com returns only pages with archived indications. Nirvana pages and press releases archived by General Atomics are retrievable via http://www.ga.com/?Key=Search&q=nirvana [17]

Architecture and operation

Nirvana is client-server software composed of Location Agents that reside on, or access, Storage Resources. A Storage Resource can be a networked-attached storage (NAS) system, object storage system or cloud storage service. Nirvana catalogs the location of the files and objects in these storage resources into its Metadata Catalog (MCAT) and tags the files with storage system metadata (Owner, File Name, File Size and Creation, Change, Modification and Access Timestamps) and additional user-defined, domain specific metadata. System and user-defined metadata can be used to search for a file or object (or groups of files and objects) and also control access to and move those files and objects from one storage resource to another. The MCAT creates a single Global Namespace across all Storage Resources connected to it so users and administrators can search for, access, and move data across multiple heterogeneous storage systems from multiple vendors across geographically dispersed data centers. The MCAT is connected to and interacts with a relational database management system to support its operation. Multiple MCATs can be deployed for horizontal scale-out and failover. Various Clients can interact with Nirvana including the supplied Web browser and Java based GUI Clients, a Command Line Interface, a native Windows virtual network drive interface, and user-developed applications via supplied APIs.

Nirvana Architecture 1.png
Nirvana Architecture

Nirvana operation is controlled by three daemons; Metadata, Sync and ILM. The Metadata Daemon can extract metadata automatically from an instrument creating data, from within the file's actual data using predefined and customizable templates and metadata parsing policies, or capturing user input via the GUI or Command Line Interface. The Sync Daemon, running in the background, detects when files are added to, or deleted from, the underlying Storage Resource filesystems. When filesystem changes are observed by the Sync Daemon, the changes are registered and updated in MCAT. The ILM Daemon routinely queries the MCAT and executes actions including migration, replication, or backup on a specified schedule. For example, an administrator can set a policy to free up space on an expensive primary storage system by migrating that data to distributed retention locations based on criteria such as: storage consumption watermarks (percent full), all data associated with a specific project, or data that hasn't been accessed in over one year. The policies are extremely flexible. User-defined metadata attributes (e.g. Project, Principal investigator, Data source, Location, Temperature, etc.) can also be used to move data. Nirvana ILM policy execution occurs behind the scenes, transparent to end-users or applications.

Use cases

Data-aware cloud storage gateway

Nirvana's ILM functionality can be used as a cloud storage gateway, where data stored locally, on premises, can be moved to popular cloud storage services based on Nirvana's various metadata attributes and policies. In 2015, General Atomics and ArcaStream announced a cloud storage appliance that uses IBM's Spectrum Scale for on premises storage and integrates with cloud storage providers Amazon S3, and Google Cloud Storage. [18]

Nirvana can be used to conduct search queries to find data of interest using both system and user-defined metadata. Queries are either entered in the Command Line Interface or through the Web browser client shown below.

Nirvana GUI.png
Nirvana Web Browser Graphical User Interface

Virtual collections

Nirvana can automate the grouping and distribution of data files into a virtual collection - based on user-friendly logical rules. For example, user-defined metadata can be used to identify data files needing to be transferred between collaborators with domain-specific attributes (experiment, study, project, etc.).

Data provenance

In many fields, it is helpful to know the provenance and processing pipeline used to produce derived results. Nirvana tracks data within workflows, through all transformations, analyses, and interpretations. With Nirvana, data can be shared and used with verified provenance of the conditions under which it was generated – so results are reproducible and analyzable for defects.

Audit

Nirvana can be used to audit every transaction on a data file within a workflow. An audit trail can be stored containing information such as date of transaction, success or error code, user performing transaction, type of transaction and notes, etc. Audit trails, like everything else with Nirvana, can be easily queried and filtered.

Security and access control

Nirvana can be used to control access to data by setting up specific access control lists by user, group etc. using user-defined metadata attributes (Project, Study, etc.) and by setting access privilege levels where users assigned higher levels can see more information than others assigned lower levels. Nirvana supports single sign-on and access by integrating with the Lightweight Directory Access Protocol (LDAP) and Active Directory, using challenge–response authentication, Grid Security Infrastructure (GSI), and Kerberos. Data can only be viewed and modified by users authorized to do so.

File system analysis

Nirvana can be used to analyze the makeup of a shared filesystem to determine what type of data is being stored, how much space it takes up, when it was last accessed, and who stored it. With this information, storage administrators can determine the most appropriate type of storage system to use and when to move unused data to lower cost archive storage. In the example below, Nirvana's analysis of data stored on an expensive enterprise NAS storage system showed most data hadn't been accessed in over 2 years. The analysis further showed that most files were very small, and over half the storage was consumed by just two users. Using this data, the organization replaced their enterprise storage system with less expensive object storage to better manage the many small, seldom accessed, files. [19]

Nirvana File Analysis.png
Nirvana File Analysis

Related Research Articles

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as default filesystem.

<span class="mw-page-title-main">Network-attached storage</span> Computer data storage server

Network-attached storage (NAS) is a file-level computer data storage server connected to a computer network providing data access to a heterogeneous group of clients. The term "NAS" can refer to both the technology and systems involved, or a specialized device built for such functionality.

<span class="mw-page-title-main">File system</span> Format or program for storing files and directories

In computing, a file system or filesystem is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stopped and the next began, or where any piece of data was located when it was time to retrieve it. By separating the data into pieces and giving each piece a name, the data are easily isolated and identified. Taking its name from the way a paper-based data management system is named, each group of data is called a "file". The structure and logic rules used to manage the groups of data and their names is called a "file system."

Storage Resource Broker (SRB) is data grid management computer software used in computational science research projects. SRB is a logical distributed file system based on a client-server architecture which presents users with a single global logical namespace or file hierarchy. Essentially, the software enables a user to use a single mechanism to work with multiple data sources.

Filesystem in Userspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

IBM Spectrum Protect is a data protection platform that gives enterprises a single point of control and administration for backup and recovery. It is the flagship product in the IBM Spectrum Protect family.

GPFS is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 TOP500 list of supercomputers. Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem called Alpine has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5TB/s of sequential I/O and 2.2TB/s of random I/O.

Mounting is a process by which a computer's operating system makes files and directories on a storage device available for users to access via the computer's file system.

Filesystem-level encryption, often called file-based encryption, FBE, or file/folder encryption, is a form of disk encryption where individual files or directories are encrypted by the file system itself.

<span class="mw-page-title-main">Ceph (software)</span> Open-source storage platform

Ceph is an open-source software-defined storage platform that implements object storage on a single distributed computer cluster and provides 3-in-1 interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalability to the exabyte level, and to be freely available. Since version 12, Ceph does not rely on other filesystems and can directly manage HDDs and SSDs with its own storage backend BlueStore and can completely self reliantly expose a POSIX filesystem.

StorNext File System (SNFS), colloquially referred to as StorNext is a shared disk file system made by Quantum Corporation. StorNext enables multiple Windows, Linux and Apple workstations to access shared block storage over a Fibre Channel network. With the StorNext file system installed, these computers can read and write to the same storage volume at the same time enabling what is known as a "file-locking SAN." StorNext is used in environments where large files must be shared, and accessed simultaneously by users without network delays, or where a file must be available for access by multiple readers starting at different times. Common use cases include multiple video editor environments in feature film, television and general video post production.

Content storage management (CSM) is a technique for the evolution of traditional media archive technology used by media companies and content owners to store and protect valuable file-based media assets. CSM solutions focus on active management of content and media assets regardless of format, type and source, interfaces between proprietary content source/destination devices and any format and type of commodity IT centric storage technology. These digital media files most often contain video but in rarer cases may be still pictures or sound. A CSM system may be directed manually but is more often directed by upper-level systems, which may include media asset management (MAM), automation, or traffic.

<span class="mw-page-title-main">Active Archive Alliance</span> Trade association

The Active Archive Alliance is a trade association that promotes a method of tiered storage. This method provides users access to data across a virtual file system that migrates data between multiple storage systems and media types including solid-state drive/flash, hard disk drives, magnetic tape, optical disk, and cloud. The result of an active archive implementation is that data can be stored on the most appropriate media type for the given retention and restoration requirements of that data. This allows less time sensitive or infrequently accessed data to be stored on less expensive media and eliminates the need for an administrator to manually migrate data between storage systems. Additionally, since storage systems such as tape libraries have low power consumption, the operational expense of storing data in an active archive is significantly reduced.

The Cloud Data Management Interface (CDMI) is a SNIA standard that specifies a protocol for self-provisioning, administering and accessing cloud storage.

Data defined storage is a marketing term for managing, protecting, and realizing value from data by combining application, information and storage tiers. This is achieved through a process of combination, where users, applications and devices gain access to a repository of captured metadata that allows organizations to access, query and manipulate the critical components of the data to transform it into information, while providing a flexible and scalable platform for storage of the underlying data. The technology abstracts the data entirely from the storage, allowing full transparent access to users.

Object storage is a computer data storage that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level, the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.

Scality is a company based in San Francisco, California that develops software-defined object storage. RING is the company's commercial product. Scality RING software deploys on industry-standard servers to store objects and files. Scality also offers a number of open source tools called Zenko, including Zenko CloudServer, compatible with the Amazon S3 interface.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation.

References

  1. "SDSC Timeline" (PDF). Retrieved 25 January 2016.
  2. "MDAS - Massive Data Analysis System" . Retrieved 25 January 2016.
  3. Baru, Chaitanya; Moore, Reagan; Rajasekar, Arcot; Wan, Michael (2010). "The SDSC storage resource broker". CASCON First Decade High Impact Papers: 189–200. CiteSeerX   10.1.1.203.4142 . doi:10.1145/1925805.1925816. S2CID   15937740. (Reprint from November 30 – December 3, 1998)
  4. "General Atomics Acquires Exclusive License from UCSD for Commercialization of Unique Data Management Software" . Retrieved 25 January 2016.
  5. "iRODS (integrated Rule-Oriented Data System)". irods.org. Retrieved 2016-03-17.
  6. "iRODS (integrated Rule-Oriented Data System)". irods.org/about. Retrieved 2017-07-31.
  7. "General Atomics Wins $22.5 Million DoD Contract for Storage Lifecycle Management (SLM) across Six High Performance Computing Sites" . Retrieved 25 January 2016.
  8. "infoGROUP® Architects Innovative Global Namespace with Nirvana® SRB® 2008" . Retrieved 25 January 2016.
  9. "Nirvana SRB 2012 R3® Is Enhanced With Significant Caching Performance, Synchronization and Database Migration Improvements" . Retrieved 25 January 2016.
  10. "arcastream and General Atomics Introduce World's First Data-Aware Cloud Storage Gateway" . Retrieved 25 January 2016.
  11. "General Atomics Releases Next Generation Data System Advancing Data Intensive Scientific and Media Workflows". General Atomics & Affiliated Companies. Retrieved 2018-05-26.
  12. "Nirvana Storage". General Atomics. Archived from the original on 24 July 2008. Retrieved 26 May 2018.
  13. "Nirvana SRB" . Retrieved 26 May 2018.
  14. "Nirvana Customer Support". General Atomics. Retrieved 26 May 2018.
  15. "Nirvana EasyHSM". General Atomics. Retrieved 26 May 2018.
  16. Sfiligoi, Igor (2017-01-17). "EasyHSM Overview" via www.slideshare.net retrieved 26 May 2018.{{cite journal}}: Cite journal requires |journal= (help)
  17. "General Atomics & Affiliated Companies" . Retrieved 2018-05-26.
  18. "ArcaStream and General Atomics Introduce World's First Data-Aware Cloud Storage Gateway" . Retrieved 25 January 2016.
  19. "Storage Data Analysis with Nirvana SRB Presented for 2014 IEEE MSST Conference Santa Clara, CA June 2-6 2014" (PDF).