Data format management

Last updated

Data format management (DFM) is the application of a systematic approach to the selection and use of the data formats used to encode information for storage on a computer.

In practical terms, data format management is the analysis of data formats and their associated technical, legal or economic attributes which can either enhance or detract from the ability of a digital asset or a given information systems to meet specified objectives.

Data format management is necessary as the amount of information and number of people creating it grows. This is especially the case as the information with which users are working is difficult to generate, store, costly to acquire, or to be shared.

Data format management as an analytic tool or approach is data format neutral.

Historically individuals, organization and businesses have been categorized by their type of computer or their operating system. Today, however, it is primarily productivity software, such as spreadsheet or word processor programs, and the way these programs store information that also defines an entity. For instance, when browsing the web it is not important which kind of computer is responsible for hosting a site, only that the information it publishes is in a format that is readable by the viewing browser. In this instance the data format of the published information has more to do with defining compatibilities than the underlying hardware or operating system.

Productivity software is application software used for producing information. Its names arose from the fact that it increases productivity, especially of individual office workers, from typists to knowledge workers, although its scope is now wider than that. Office suites, which brought word processing, spreadsheet, and relational database programs to the desktop in the 1980s, are the core example of productivity software. They revolutionized the office with the magnitude of the productivity increase they brought as compared with the pre-1980s office environments of typewriters, paper filing, and handwritten lists and ledgers. Some 78% of "middle-skill" occupations now require the use of productivity software. In the 2010s, productivity software has become even more consumerized than it already was, as computing becomes ever more integrated into daily personal life.

Several initiatives have been established to record those data formats commonly used and the software available to read them, for example the Pronom project at the UK National Archives.

See also

Data curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for reuse and preservation. Data curation includes "all the processes needed for principled and controlled data creation, maintenance, and management, together with the capacity to add value to data". In science, data curation may indicate the process of extraction of important information from scientific texts, such as research articles by experts, to be converted into an electronic format, such as an entry of a biological database.

Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and its metadata. Data can be described as the elements or units in which knowledge and information is created, and metadata are the summarizing subsets of the elements of data; or the data about the data. The main goal of data preservation is to protect data from being lost or destroyed and to contribute to the reuse and progression of the data.

In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and technologies, and it combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time. The Association for Library Collections and Technical Services Preservation and Reformatting Section of the American Library Association, defined digital preservation as combination of "policies, strategies and actions that ensure access to digital content over time." According to the Harrod's Librarian Glossary, digital preservation is the method of keeping digital material alive so that they remain usable as technological advances render original hardware and software specification obsolete.

Related Research Articles

Software non-tangible executable component of a computer

Computer software, or simply software, is a collection of data or computer instructions that tell the computer how to work. This is in contrast to physical hardware, from which the system is built and actually performs the work. In computer science and software engineering, computer software is all information processed by computer systems, programs and data. Computer software includes computer programs, libraries and related non-executable data, such as online documentation or digital media. Computer hardware and software require each other and neither can be realistically used on its own.

Computer program Instructions to be executed by a computer

A computer program is a collection of instructions that performs a specific task when executed by a computer. A computer requires programs to function.

Data storage hardware for recording data

Data storage is the recording (storing) of information (data) in a storage medium. DNA and RNA, handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Recording is accomplished by virtually any form of energy. Electronic data storage requires electrical power to store and retrieve data.

Package manager software tools for handling software packages

A package manager or package management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer's operating system in a consistent manner.

Esri company

Esri is an international supplier of geographic information system (GIS) software, web GIS and geodatabase management applications. The company is headquartered in Redlands, California.

Digital obsolescence

Digital obsolescence is a situation where a digital resource is no longer readable because of its archaic format: the physical media, the reader, the hardware, or the software that runs on it is no longer available.

File system way of storing all data on a data storage device

In computing, a file system or filesystem controls how data is stored and retrieved. Without a file system, information placed in a storage medium would be one large body of data with no way to tell where one piece of information stops and the next begins. By separating the data into pieces and giving each piece a name, the information is easily isolated and identified. Taking its name from the way paper-based information systems are named, each group of data is called a "file". The structure and logic rules used to manage the groups of information and their names is called a "file system".

In computer security, a vulnerability is a weakness which can be exploited by a threat actor, such as an attacker, to perform unauthorized actions within a computer system. To exploit a vulnerability, an attacker must have at least one applicable tool or technique that can connect to a system weakness. In this frame, vulnerability is also known as the attack surface.

UVC-based preservation is an archival strategy for handling the preservation of digital objects. It employs the use of a Universal Virtual Computer (UVC)—a virtual machine (VM) specifically designed for archival purposes, that allows both emulation and migration to a language-neutral format like XML.

Records management, also known as records and information management, is an organizational function devoted to the management of information in an organization throughout its life cycle, from the time of creation or inscription to its eventual disposition. This includes identifying, classifying, storing, securing, retrieving, tracking and destroying or permanently preserving records. The ISO 15489-1: 2001 standard defines records management as "[the] field of management responsible for the efficient and systematic control of the creation, receipt, maintenance, use and disposition of records, including the processes for capturing and maintaining evidence of and information about business activities and transactions in the form of records".

Digital permanence addresses the history and development of digital storage techniques, specifically quantifying the expected lifetime of data stored on various digital media and the factors which influence the permanence of digital data. It is often a mix of ensuring the data itself can be retained on a particular form of media and that the technology remains viable. Where possible, as well as describing expected lifetimes, factors affecting data retention will be detailed including potential technology issues.

Fedora Commons software for digital asset management

Fedora is a digital asset management (DAM) architecture upon which institutional repositories, digital archives, and digital library systems might be built. Fedora is the underlying architecture for a digital repository, and is not a complete management, indexing, discovery, and delivery application. It is a modular architecture built on the principle that interoperability and extensibility are best achieved by the integration of data, interfaces, and mechanisms as clearly defined modules.

The digital dark age is a lack of historical information in the digital age as a direct result of outdated file formats, software, or hardware that becomes corrupt, scarce, or inaccessible as technologies evolve and data decay. Future generations may find it difficult or impossible to retrieve electronic documents and multimedia, because they have been recorded in an obsolete and obscure file format. The name derives from the term Dark Ages in the sense that there could be a relative lack of records in the digital age, as documents are transferred to digital formats and original copies are lost. An early mention of the term was at a conference of the International Federation of Library Associations and Institutions (IFLA) in 1997.[1] The term was also mentioned in 1998 at the Time and Bits conference,[2][3] which was co-sponsored by the Long Now Foundation and the Getty Conservation Institute.

Digital curation is the selection, preservation, maintenance, collection and archiving of digital assets. Digital curation establishes, maintains and adds value to repositories of digital data for present and future use. This is often accomplished by archivists, librarians, scientists, historians, and scholars. Enterprises are starting to use digital curation to improve the quality of information and data within their operational and strategic processes. Successful digital curation will mitigate digital obsolescence, keeping the information accessible to users indefinitely.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.

Metadata data about data

Metadata is "data [information] that provides information about other data". Many distinct types of metadata exist, among these descriptive metadata, structural metadata, administrative metadata, reference metadata and statistical metadata.

A digital library, digital repository, or digital collection, is an online database of digital objects that can include text, still images, audio, video, or other digital media formats. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts. In addition to storing content, digital libraries provide means for organizing, searching, and retrieving the content contained in the collection.

The following outline is provided as an overview of and topical guide to databases:

Information technology (IT) is the use of computers to store, retrieve, transmit, and manipulate data, or information, often in the context of a business or other enterprise. IT is considered to be a subset of information and communications technology (ICT). An information technology system is generally an information system, a communications system or, more specifically speaking, a computer system – including all hardware, software and peripheral equipment – operated by a limited group of users.