JBOB

Last updated

JBOB, an acronym for Just a Bunch Of Bytes, is a term is used to describe unstructured data that does not have a fixed format. This is a variation on the term JBOD (Just a Bunch Of Disks) that is used to describe standard hard drives that are used in a storage array.

Many computer files have a defined structure such as fixed length records with the data divided into records that are the same length. Structured data might have records of different lengths but each record is prefixed with a RDW (Record Descriptor Word) that indicates the length of that data as well as other attributes. JBOB data has no structure. Records are defined by the presence of characters in the data. For example, a report might have hundreds of records (or lines) but the length of each record is defined by the presence of a carriage return (and/or line feed). Mainframe computers have traditionally dealt with structured data but unstructured (JBOB) data is much more common in PC environments. The critical difference is that it is difficult, if not impossible, to advance to say, the 100th record without examining every character of the 99 records that precede it. With fixed length records, it is possible to calculate the exact position of a particular record. Even with variable length records, the length of each record is given so navigation is easier.

Since records are determined by the content of the data, metadata is required, such as what is/are the record termination character(s), and is usually stored externally to the actual data or file. The processing of JBOB data is usually more difficult and may require special knowledge by the computer program. Metadata might also be required for structured data, such as the fixed record length or the largest variable length record, but there usually exists standard utility software to read/write structured data since the format is a known structure.


Related Research Articles

Electronic data interchange (EDI) is the concept of businesses electronically communicating information that was traditionally communicated on paper, such as purchase orders and invoices. Technical standards for EDI exist to facilitate parties transacting such instruments without having to make special arrangements.

String (computer science) Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.

A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and MS-DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. There are for most text files a need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records.

Exif Metadata standard in digital images

Exchangeable image file format is a standard that specifies the formats for images, sound, and ancillary tags used by digital cameras, scanners and other systems handling image and sound files recorded by digital cameras. The specification uses the following existing file formats with the addition of specific metadata tags: JPEG discrete cosine transform (DCT) for compressed image files, TIFF Rev. 6.0 for uncompressed image files, and RIFF WAV for audio files. It is not used in JPEG 2000 or GIF.

In computer science, a record is a basic data structure. Records in a database or spreadsheet are usually called "rows".

A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format. A CSV file typically stores tabular data in plain text, in which case each line will have the same number of fields.

MARCstandards are a set of digital formats for the description of items catalogued by libraries, such as books. Working with the Library of Congress, American computer scientist Henriette Avram developed MARC in the 1960s to create records that could be read by computers and shared among libraries. By 1971, MARC formats had become the US national standard for dissemination of bibliographic data. Two years later, they became the international standard. There are several versions of MARC in use around the world, the most predominant being MARC 21, created in 1999 as a result of the harmonization of U.S. and Canadian MARC formats, and UNIMARC. UNIMARC is maintained by the Permanent UNIMARC Committee of the International Federation of Library Associations and Institutions (IFLA), and is widely used in Europe. The MARC 21 family of standards now includes formats for authority records, holdings records, classification schedules, and community information, in addition to the format for bibliographic records.

File system Format or program for storing files and directories

In computing, a file system or filesystem controls how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stops and the next begins. By separating the data into pieces and giving each piece a name, the data is easily isolated and identified. Taking its name from the way paper-based data management system is named, each group of data is called a "file." The structure and logic rules used to manage the groups of data and their names is called a "file system."

In the context of IBM mainframe computers in the S/360 line, a data set or dataset is a computer file having a record organization. Use of this term began with, e.g., DOS/360, OS/360, and is still used by their successors, including the current z/OS. Documentation for these systems historically preferred this term rather than file.

In computer science, a record-oriented filesystem is a file system where data is stored as collections of records. This is in contrast to a byte-oriented filesystem, where the data is treated as an unformatted stream of bytes. There are several different possible record formats; the details vary depending on the particular system. In general the formats can be fixed-length or variable length, with different physical organizations or padding mechanisms; metadata may be associated with the file records to define the record length, or the data may be part of the record. Different access methods for records may be provided, for example records may be retrieved in sequential order, by key, or by record number.

Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated in documents.

Shapefile

The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.

Magnetic tape data storage is a system for storing digital information on magnetic tape using digital recording.

Data extraction is the act or process of retrieving data out of data sources for further data processing or data storage. The import into the intermediate extracting system is thus usually followed by data transformation and possibly the addition of metadata prior to export to another stage in the data workflow.

PREservation Metadata: Implementation Strategies (PREMIS) is the de facto digital preservation metadata standard.

SDTM defines a standard structure for human clinical trial (study) data tabulations and for nonclinical study data tabulations that are to be submitted as part of a product application to a regulatory authority such as the United States Food and Drug Administration (FDA). The Submission Data Standards team of Clinical Data Interchange Standards Consortium (CDISC) defines SDTM.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.

Metadata Data about data

Metadata is "data that provides information about other data". In other words, it is "data about data". Many distinct types of metadata exist, including descriptive metadata, structural metadata, administrative metadata, reference metadata and statistical metadata.

A machine-readable document is a document whose content can be readily processed by computers. Such documents are distinguished from machine-readable data by virtue of having sufficient structure to provide the necessary context to support the business processes for which they are created.