Agnostic (data)

Last updated

In computing, a device or software program is said to be agnostic or data agnostic if the method or format of data transmission is irrelevant to the device or program's function. This means that the device or program can receive data in multiple formats or from multiple sources, and still process that data effectively.

Contents

Definition

Many devices or programs need data to be presented in a specific format to process the data. For example, Apple Inc devices generally require applications to be downloaded from their App Store. [1] This is a non-data agnostic method, as it uses a specified file type, downloaded from a specific location, and does not function unless those requirements are met.

Non-data agnostic devices and programs can present problems. For example, if your file contains the right type of data (such as text), but in the wrong format, you may have to create a new file and enter the text manually in the proper format in order to use that program. Various file conversion programs exist because people need to convert their files to a different format in order to use them effectively. [2] [3] [4] [5]

Implementation

Data agnostic devices and programs work to solve these problems in a variety of ways. Devices can treat files in the same way whether they are downloaded over the internet or transferred over a USB or other cable.

Devices and programs [6] can become more data-agnostic by using a generic storage format to create, read, update, and delete files. Formats like XML and JSON can store information in a data agnostic manner. For example, XML is data agnostic in that it can save any type of information. However, if you use Data Transform Definitions (DTD) or XML Schema Definitions (XSD) to define what data should be placed where, it becomes non-data agnostic; it produces an error if the wrong type of data is placed in a field.

Once you have your data saved in a generic storage format, this source can act as an entity synchronization layer. The generic storage format can interface with a variety of different programs, with the data extraction method formatting the data in a way that the specific program can understand. This allows two programs that require different data formats to access the same data. Multiple devices and programs can create, read, update, and delete (CRUD) the same information from the same storage location without formatting errors.

When multiple programs are accessing the same records, they may have different defined fields for the same type of concept. Where the fields are differently labelled but contain the same data, the program pulling the information can ensure the correct data is used. If one program contains fields and information that another does not, those fields can be saved to the record and pulled for that program, but ignored by other programs. As the entity synchronization layer is data agnostic, additional fields can be added without worrying about recoding the whole database, and concepts created in other programs (that do not contain that field) are fine.

Since the information formatting is imposed on the data by the program extracting it, the format can be customized to the device or program extracting and displaying that data. The information extracted from the entity synchronization layer can therefore be dynamically rendered to display on the user's device, regardless of the device or program being used.

Having data agnostic devices and programs allows you to transfer data easily between them, without having to convert that data. Companies like Great Ideaz [7] provide data agnostic services by storing the data in an entity synchronization layer. This acts as a compatibility layer, as TSQL statements can retrieve, update, sort, and write data regardless of the format employed. It also allows you to synchronize data between multiple applications, as the applications can all pull data from the same location. This prevents compatibility problems between different programs that have to access the same data, as well as reducing data replication.

Benefits

Keeping your devices and programs as data agnostic as possible has some clear advantages. Since the data is stored in an agnostic format, developers do not need to hard-code ways to deal with all different kinds of data. A table with information about dogs and one with information about cats can be treated in the same way; extract the field definitions and the field content from the data agnostic storage format and display it based on the field definitions. Using the same code for the different concepts to CRUD, the amount of code is significantly reduced, and what remains is tested with each concept you extract from the entity synchronization layer.

The field definitions and formatting can be stored in the entity synchronization layer with the data they are acting on. Allows fields and formatting to change, without having to hardcode and compile programs. The data and formatting are then generated dynamically by the code used to extract the data and the formatting information.

The data itself only needs to be distinguished when it is being acted on or displayed in a specific way. If the data is being transferred between devices or databases, it does not need to be interpreted as a specific object. Whenever the data can be treated as agnostic, the coding is simplified, as it only has to deal with one case (the data agnostic case) rather than multiple (png, pdf, etc.). When the data must be displayed or acted on, then it is interpreted based on the field definitions and formatting information, and returned to a data agnostic format as soon as possible to reduce the number of individual cases that must be accounted for.

Risks

There are, however, a few problems introduced when attempting to make a device or program data agnostic. Since only one piece of code is being used for CRUD operations (regardless of the type of concept), there is a single point of failure. If that code breaks down, the whole system is broken. This risk is mitigated because the code is tested so many times (as it is used every time a record is stored or retrieved).

Additionally, data agnostic storage mediums can increase load speed, as the code has to search for the field definitions and display format as well as the specific data to be displayed. The load speed can be improved by pre-shredding the data. This uses a copy of the record with the data already extracted to index the fields, instead of having to extract the fields and formatting information at the same time as the data. While this improves the speed, it adds a non-data agnostic element to the process; however, it can be created easily through code generation.

Related Research Articles

An audio file format is a file format for storing digital audio data on a computer system. The bit layout of the audio data is called the audio coding format and can be uncompressed, or compressed to reduce the file size, often using lossy compression. The data can be a raw bitstream in an audio coding format, but it is usually embedded in a container format or an audio data format with defined storage layer.

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data stored and accessed electronically through the use of a database management system. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance.

<span class="mw-page-title-main">Text editor</span> Computer software used to edit plain text documents

A text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software. Text editors are provided with operating systems and software development packages, and can be used to change files such as configuration files, documentation files and programming language source code.

Digital Imaging and Communications in Medicine (DICOM) is the standard for the communication and management of medical imaging information and related data. DICOM is most commonly used for storing and transmitting medical images enabling the integration of medical imaging devices such as scanners, servers, workstations, printers, network hardware, and picture archiving and communication systems (PACS) from multiple manufacturers. It has been widely adopted by hospitals and is making inroads into smaller applications such as dentists' and doctors' offices.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on reoccurring schedules either as single jobs or aggregated into a batch of jobs.

The BMP file format or bitmap, is a raster graphics image file format used to store bitmap digital images, independently of the display device, especially on Microsoft Windows and OS/2 operating systems.

Any change in a computing system, such as a new feature or new component, is transparent if the system after change adheres to previous external interface as much as possible while changing its internal behaviour. The purpose is to shield from change all systems on the other end of the interface. Confusingly, the term refers to overall invisibility of the component, it does not refer to visibility of component's internals. The term transparent is widely used in computing marketing in substitution of the term invisible, since the term invisible has a bad connotation while the term transparent has a good connotation. The vast majority of the times, the term transparent is used in a misleading way to refer to the actual invisibility of a computing process, which is also described by the term opaque, especially with regards to data structures. Because of this misleading and counter-intuitive definition, modern computer literature tends to prefer use of "agnostic" over "transparent".

In computing, an icon is a pictogram or ideogram displayed on a computer screen in order to help the user navigate a computer system. The icon itself is a quickly comprehensible symbol of a software tool, function, or a data file, accessible on the system and is more like a traffic sign than a detailed illustration of the actual entity it represents. It can serve as an electronic hyperlink or file shortcut to access the program or data. The user can activate an icon using a mouse, pointer, finger, or voice commands. Their placement on the screen, also in relation to other icons, may provide further information to the user about their usage. In activating an icon, the user can move directly into and out of the identified function without knowing anything further about the location or requirements of the file or code.

WinFS was the code name for a canceled data storage and management system project based on relational databases, developed by Microsoft and first demonstrated in 2003 as an advanced storage subsystem for the Microsoft Windows operating system, designed for persistence and management of structured, semi-structured and unstructured data.

A container format or metafile is a file format that allows multiple data streams to be embedded into a single file, usually along with metadata for identifying and further detailing those streams. Notable examples of container formats include archive files and formats used for multimedia playback. Among the earliest cross-platform container formats were Distinguished Encoding Rules and the 1985 Interchange File Format.

<span class="mw-page-title-main">Data (computer science)</span> Quantities, characters, or symbols on which operations are performed by a computer

In computer science, data is any sequence of one or more symbols; datum is a single symbol of data. Data requires interpretation to become information. Digital data is data that is represented using the binary number system of ones (1) and zeros (0), instead of analog representation. In modern (post-1960) computer systems, all data is digital.

A video search engine is a web-based search engine which crawls the web for video content. Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers. Some engines also allow users to search by video format type and by length of the clip. The video search results are usually accompanied by a thumbnail view of the video.

CANopen is a communication protocol and device profile specification for embedded systems used in automation. In terms of the OSI model, CANopen implements the layers above and including the network layer. The CANopen standard consists of an addressing scheme, several small communication protocols and an application layer defined by a device profile. The communication protocols have support for network management, device monitoring and communication between nodes, including a simple transport layer for message segmentation/desegmentation. The lower level protocol implementing the data link and physical layers is usually Controller Area Network (CAN), although devices using some other means of communication can also implement the CANopen device profile.

Microsoft Sync Framework is a data synchronization platform from Microsoft that can be used to synchronize data across multiple data stores. Sync Framework includes a transport-agnostic architecture, into which data store-specific synchronization providers, modelled on the ADO.NET data provider API, can be plugged in. Sync Framework can be used for offline access to data, by working against a cached set of data and submitting the changes to a master database in a batch, as well as to synchronize changes to a data source across all consumers and peer-to-peer synchronization of multiple data sources. Sync Framework features built-in capabilities for conflict detection – whether data to be changed has already been updated – and can flag them for manual inspection or use defined policies to try to resolve the conflict. Sync Services includes an embedded SQL Server Compact database to store metadata about the synchronization relationships as well as about each sync attempt. The Sync Framework API is surfaced both in managed code, for use with .NET Framework applications, as well as unmanaged code, for use with COM applications. It was scheduled to ship with Visual Studio 2008 in late November 2007.

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.

IEC 60870 part 5 is one of the IEC 60870 set of standards which define systems used for telecontrol in electrical engineering and power system automation applications. Part 5 provides a communication profile for sending basic telecontrol messages between two systems, which uses permanent directly connected data circuits between the systems. The IEC Technical Committee 57 have developed a protocol standard for telecontrol, teleprotection, and associated telecommunications for electric power systems. The result of this work is IEC 60870-5. Five documents specify the base IEC 60870-5:

The following is provided as an overview of and topical guide to databases:

A Vendor Neutral Archive (VNA) is a medical imaging technology in which images and documents are stored (archived) in a standard format with a standard interface, such that they can be accessed in a vendor-neutral manner by other systems.

References

  1. Costello, Sam. "How to Get Apps That Are Not In The App Store". Lifewire. Retrieved 4 February 2021.
  2. "Free Online File Converter" . Retrieved 4 February 2021.
  3. "Cloud Convert File Converter" . Retrieved 4 February 2021.
  4. "Convertio File Converter" . Retrieved 4 February 2021.
  5. "Free Tools File Converter" . Retrieved 28 September 2022.
  6. "What Are Data Agnostic Services (DAS)?". Great Ideaz. Retrieved 4 February 2021.
  7. "trellispark Breakthroughs". Great Ideaz. Retrieved 4 February 2021.