Information repository

Last updated

In information technology, an information repository or simply a repository is "a central place in which an aggregation of data is kept and maintained in an organized way, usually in computer storage." [1] It "may be just the aggregation of data itself into some accessible place of storage or it may also imply some ability to selectively extract data." [1]

Contents

Universal digital library

The concept of a universal digital library was described as "within reach" by a 2012 European Union Copyright Directive [2] which told about Google's attempts to "mass-digitize" what are termed "orphan works" (i.e. out-of-print copyrighted works).

The U.S. Copyright Office and the European Union Copyright law have been working on this. Google has reached agreements in France which "lets the publisher choose which works can be scanned or sold." By contrast, Google has been trying in the USA for a "free to digitize and sell any works unless the copyright holders opted out" deal and is still unsuccessful. [3]

Information repository

Attempts to develop what was called an information repository have been underway for decades:

Federated information repository

A federated information repository is an easy way to deploy a secondary tier of data storage that can comprise multiple, networked data storage technologies running on diverse operating systems, where data that no longer needs to be in primary storage is protected, classified according to captured metadata, processed, de-duplicated, and then purged, automatically, based on data service level objectives and requirements. In federated information repositories, data storage resources are virtualized as composite storage sets and operate as a federated environment. [7]

Federated information repositories were developed to mitigate problems arising from data proliferation and eliminate the need for separately deployed data storage solutions because of the concurrent deployment of diverse storage technologies running diverse operating systems. They feature centralized management for all deployed data storage resources. They are self-contained, support heterogeneous storage resources, support resource management to add, maintain, recycle, and terminate media, track of off-line media, and operate autonomously.

Automated data management

Since one of the main reasons for the implementation of a federated information repository is to reduce the maintenance workload placed on IT staff by traditional data storage systems, federated information repositories are automated. Automation is accomplished via policies that can process data based on time, events, data age, and data content. Policies manage the following:

Data is processed according to media type, storage pool, and storage technology.

Because federated information repositories are intended to reduce IT staff workload, they are designed to be easy to deploy and offer configuration flexibility, virtually limitless extensibility, redundancy, and reliable failover.

Data recovery

Federated information repositories feature robust, client based data search and recovery capabilities that, based on permissions, enable end users to search the information repository, view information repository contents, including data on off-line media, and recover individual files or multiple files to either their original network computer or another network computer.

Related Research Articles

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

<span class="mw-page-title-main">Digitization</span> Converting information into digital form

Digitization is the process of converting information into a digital format. The result is the representation of an object, image, sound, document, or signal obtained by generating a series of numbers that describe a discrete set of points or samples. The result is called digital representation or, more specifically, a digital image, for the object, and digital form, for the signal. In modern practice, the digitized data is in the form of binary numbers, which facilitates processing by digital computers and other operations, but digitizing simply means "the conversion of analog source material into a numerical format"; the decimal or any other number system can be used instead.

<span class="mw-page-title-main">CICS</span> IBM mainframe transaction monitor

IBM CICS is a family of mixed-language application servers that provide online transaction management and connectivity for applications on IBM mainframe systems under z/OS and z/VSE.

<span class="mw-page-title-main">Data management</span> Disciplines related to managing data as a resource

Data management comprises all disciplines related to handling data as a valuable resource, it is the practice of managing an organization’s data so it can be analyzed for decision making.

In computer storage, a tape library is a physical area that holds magnetic data tapes. In an earlier era, tape libraries were maintained people known as tape librarians and computer operators and the proper operation of the library was crucial to the running of batch processing jobs. Although tape libraries of this era were not automated, the use of tape management system software could assist in running them.

In library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and application of preservation methods and technologies, and combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.

The Storage Management Initiative Specification, commonly called SMI-S, is a computer data storage management standard developed and maintained by the Storage Networking Industry Association (SNIA). It has also been ratified as an ISO standard. SMI-S is based upon the Common Information Model and the Web-Based Enterprise Management standards defined by the Distributed Management Task Force, which define management functionality via HTTP. The most recent approved version of SMI-S is available on the SNIA website.

<span class="mw-page-title-main">Outline of library and information science</span> Overview of and topical guide to library science

The following outline is provided as an overview of and topical guide to library and information science:

The California Digital Library (CDL) was founded by the University of California in 1997. Under the leadership of then UC President Richard C. Atkinson, the CDL's original mission was to forge a better system for scholarly information management and improved support for teaching and research. In collaboration with the ten University of California Libraries and other partners, CDL assembled one of the world's largest digital research libraries. CDL facilitates the licensing of online materials and develops shared services used throughout the UC system. Building on the foundations of the Melvyl Catalog, CDL has developed one of the largest online library catalogs in the country and works in partnership with the UC campuses to bring the treasures of California's libraries, museums, and cultural heritage organizations to the world. CDL continues to explore how services such as digital curation, scholarly publishing, archiving and preservation support research throughout the information lifecycle.

Cloud storage is a model of computer data storage in which data, said to be on "the cloud", is stored remotely in logical pools and is accessible to users over a network, typically the Internet. The physical storage spans multiple servers, and the physical environment is typically owned and managed by a cloud computing provider. These cloud storage providers are responsible for keeping the data available and accessible, and the physical environment secured, protected, and running. People and organizations buy or lease storage capacity from the providers to store user, organization, or application data.

A digital library, also called an online library, an internet library, a digital repository, a library without walls, or a digital collection, is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital media formats or a library accessible through the internet. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts. In addition to storing content, digital libraries provide means for organizing, searching, and retrieving the content contained in the collection. Digital libraries can vary immensely in size and scope, and can be maintained by individuals or organizations. The digital content may be stored locally, or accessed remotely via computer networks. These information retrieval systems are able to exchange information with each other through interoperability and sustainability.

<span class="mw-page-title-main">Cloud computing</span> Form of shared Internet-based computing

Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. Large clouds often have functions distributed over multiple locations, each of which is a data center. Cloud computing relies on sharing of resources to achieve coherence and typically uses a pay-as-you-go model, which can help in reducing capital expenses but may also lead to unexpected operating expenses for users.

Mark A. Carlson was a software engineer known in the systems management industry for his work in management standards and technology. Mark was the first employee of a small startup in Boulder, Colorado called Redcape Policy Software. Sun Microsystems acquired the company and its technology in 1998 and subsequently promoted it as Jiro, a common management framework based on Java and Jini.

The Linear Tape File System (LTFS) is a file system that allows files stored on magnetic tape to be accessed in a similar fashion to those on disk or removable flash drives. It requires both a specific format of data on the tape media and software to provide a file system interface to the data.

<span class="mw-page-title-main">Converged infrastructure</span> Way of structuring an IT system

Converged infrastructure is a way of structuring an information technology (IT) system which groups multiple components into a single optimized computing package. Components of a converged infrastructure may include servers, data storage devices, networking equipment and software for IT infrastructure management, automation and orchestration.

A memory institution is an organization maintaining a repository of public knowledge, a generic term used about institutions such as libraries, archives, heritage institutions, aquaria and arboreta, and zoological and botanical gardens, as well as providers of digital libraries and data aggregation services which serve as memories for given societies or mankind. Memory institutions serve the purpose of documenting, contextualizing, preserving and indexing elements of human culture and collective memory. These institutions allow and enable society to better understand themselves, their past, and how the past impacts their future. These repositories are ultimately preservers of communities, languages, cultures, customs, tribes, and individuality. Memory institutions are repositories of knowledge, while also being actors of the transitions of knowledge and memory to the community. These institutions ultimately remain some form of collective memory. Increasingly such institutions are considered as a part of a unified documentation and information science perspective.

<span class="mw-page-title-main">Converged storage</span>

Converged storage is a storage architecture that combines storage and computing resources into a single entity. This can result in the development of platforms for server centric, storage centric or hybrid workloads where applications and data come together to improve application performance and delivery. The combination of storage and compute differs to the traditional IT model in which computation and storage take place in separate or siloed computer equipment. The traditional model requires discrete provisioning changes, such as upgrades and planned migrations, in the face of server load changes, which are increasingly dynamic with virtualization, where converged storage increases the supply of resources along with new VM demands in parallel.

Software-defined storage (SDS) is a marketing term for computer data storage software for policy-based provisioning and management of data storage independent of the underlying hardware. Software-defined storage typically includes a form of storage virtualization to separate the storage hardware from the software that manages it. The software enabling a software-defined storage environment may also provide policy management for features such as data deduplication, replication, thin provisioning, snapshots and backup.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation.

The SNIA Emerald Program Power Efficiency Measurement Specification, is a storage specification developed and maintained by the Storage Networking Industry Association (SNIA) and cross-referenced by the Environmental Protection Agency’s EnergyStar program. The specification consists of a storage types taxonomy, system under test workload and energy measurement method, measured metrics for active and idle operational states, and presence tests for capacity optimization technologies. The measured metric data is generated through the use of well-defined standard testing and data reduction procedures prescribed in the SNIA Emerald Specification.

References

  1. 1 2 Rouse, Margaret (April 2005). "Definition: repository". whatis.com. TechTarget. Retrieved 1 May 2019.
  2. Pamela Samuelson (May 1, 2012). "A universal digital library is within reach". Los Angeles Times .
  3. Eric Pfanner (August 25, 2011). "In France, Publisher and Google Reach Deal". The New York Times.
  4. "IBM software to integrate systems". The New York Times. May 17, 1989.
  5. John Markoff (December 11, 2003). "For Doodlers and Pack Rates, a Multi-Media Binder". The New York Times .
  6. F. Romall (May 12, 1996). "Mt. Vernon Library Marks Its 100th Year". The New York Times.
  7. Armstrong, Mark (9 August 2007). "Benefits of a Federated Information Repository as a Secondary Storage Tier". SNIA Enterprise Information World 2007 Conference. Storage Networking Industry Association (SNIA). Archived from the original on 2008-11-21. Retrieved 1 May 2019.