Human-readable medium and data

Last updated
ISBN represented as EAN-13 bar code showing both human-readable and machine-readable data EAN-13-ISBN-13.svg
ISBN represented as EAN-13 bar code showing both human-readable and machine-readable data

In computing, a human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans, resulting in human-readable data. It is often encoded as ASCII or Unicode text, rather than as binary data.

In most contexts, the alternative to a human-readable representation is a machine-readable format or medium of data primarily designed for reading by electronic, mechanical or optical devices, or computers. For example, Universal Product Code (UPC) barcodes are very difficult to read for humans, but very effective and reliable with the proper equipment, whereas the strings of numerals that commonly accompany the label are the human-readable form of the barcode information. Since any type of data encoding can be parsed by a suitably programmed computer, the decision to use binary encoding rather than text encoding is usually made to conserve storage space. Encoding data in a binary format typically requires fewer bytes of storage and increases efficiency of access (input and output) by eliminating format parsing or conversion.

With the advent of standardized, highly structured markup languages, such as Extensible Markup Language (XML), the decreasing costs of data storage, and faster and cheaper data communication networks, compromises between human-readability and machine-readability are now more common-place than they were in the past. This has led to humane markup languages and modern configuration file formats that are far easier for humans to read. In addition, these structured representations can be compressed very effectively for transmission or storage.

Human-readable protocols greatly reduce the cost of debugging. [1]

Various organizations have standardized the definition of human-readable and machine-readable data and how they are applied in their respective fields of application, e.g., the Universal Postal Union. [2]

Often the term human-readable is also used to describe shorter names or strings, that are easier to comprehend or to remember than long, complex syntax notations, such as some Uniform Resource Locator strings. [3]

Occasionally "human-readable" is used to describe ways of encoding an arbitrary integer into a long series of English words. Compared to decimal or other compact binary-to-text encoding systems, English words are easier for humans to read, remember, and type in. [4]

See also

Related Research Articles

<span class="mw-page-title-main">Plain text</span> Term for computer data consisting only of unformatted characters of readable material

In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from UnicodeTransformation Format – 8-bit.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

<span class="mw-page-title-main">Machine-readable medium and data</span> Medium capable of storing data in a format readable by a machine

In communications and computing a machine-readable medium, or computer-readable medium, is a medium capable of storing data in a format easily readable by a digital computer or mechanical device . The result is called machine-readable data or computer-readable data.

<span class="mw-page-title-main">S-expression</span> Data serialization format

In computer programming, an S-expression is an expression in a like-named notation for nested list (tree-structured) data. S-expressions were invented for and popularized by the programming language Lisp, which uses them for source code as well as data.

<span class="mw-page-title-main">Executable</span> A file that causes a computer to follow indicated instructions

In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data file that must be interpreted (parsed) by a program to be meaningful.

vCard, also known as VCF, is a file format standard for electronic business cards. vCards can be attached to e-mail messages, sent via Multimedia Messaging Service (MMS), on the World Wide Web, instant messaging, NFC or through QR code. They can contain name and address information, phone numbers, e-mail addresses, URLs, logos, photographs, and audio clips.

YAML is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax which intentionally differs from Standard Generalized Markup Language (SGML). It uses both Python-style indentation to indicate nesting, and a more compact format that uses [...] for lists and {...} for maps but forbids tab characters to use as indentation thus only some JSON files are valid YAML 1.2.

A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format. A CSV file typically stores tabular data in plain text, in which case each line will have the same number of fields.

MARC is a standard set of digital formats for the machine-readable description of items catalogued by libraries, such as books, DVDs, and digital resources. Computerized library catalogs and library management software need to structure their catalog records as per an industry-wide standard, which is MARC, so that bibliographic information can be shared freely between computers. The structure of bibliographic records almost universally follows the MARC standard. Other standards work in conjunction with MARC, for example, Anglo-American Cataloguing Rules (AACR)/Resource Description and Access (RDA) provide guidelines on formulating bibliographic data into the MARC record structure, while the International Standard Bibliographic Description (ISBD) provides guidelines for displaying MARC records in a standard, human-readable form.

Various binary formats have been proposed as compact representations for XML. Using a binary XML format generally reduces the verbosity of XML documents thereby also reducing the cost of parsing, but hinders the use of ordinary text editors and third-party tools to view and edit the document. There are several competing formats, but none has yet emerged as a de facto standard, although the World Wide Web Consortium adopted EXI as a Recommendation on 10 March 2011.

<span class="mw-page-title-main">JSON</span> Open standard file format and data interchange

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers.

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary data or is not 8-bit clean. PGP documentation uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.

Within communication protocols, TLV is an encoding scheme used for optional informational elements in a certain protocol. A TLV-encoded data stream contains code related to the record type, the record value's length, and finally the value itself.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.

Paper data storage refers to the use of paper as a data storage device. This includes writing, illustrating, and the use of data that can be interpreted by a machine or is the result of the functioning of a machine. A defining feature of paper data storage is the ability of humans to produce it with only simple tools and interpret it visually.

References

  1. RFC 3339  : "Date and Time on the Internet: Timestamps". section "5.2. Human Readability". 2002.
  2. "OCR and Human readable representation of data on postal items, labels and forms". Universal Postal Union. Archived from the original on 2007-07-16.
  3. "Human-readable URLs". Plone Foundation. Archived from the original on 2010-03-05. Retrieved 2009-10-01.
  4. RFC 1751 "A Convention for Human-Readable 128-bit Keys"