Advanced Scientific Data Format

Last updated
Advanced Scientific Data Format
ASDF tree logo.svg
Internet media type application/octet-stream
Magic number #ASDF
Initial releaseSeptember 19, 2015;8 years ago (2015-09-19)
Type of formatscientific data
Contained by YAML
Extended from FITS
Website asdf-standard.readthedocs.io

Advanced Scientific Data Format (ASDF) is a proposed replacement to the FITS standard for astronomical images and other scientific data. [1] The metadata is contained in a YAML (Human-readable data serialization format) header followed by binary or ASCII data.

Contents

ASDF is used, notably, as an interchange format for the data processing pipeline of the James Webb Space Telescope.

Improvements to FITS

The Flexible Image Transport System (FITS) standard is a widely used data format in astronomy that incorporates metadata and ASCII or binary data in the same file. [2] However, the FITS standard has several limitations that make it difficult to use for complicated and hierarchical data. For example, the FITS 'cards' have keywords limited to only 8 characters, which can make it difficult to properly describe the value associated with it and the value for each keyword and its comment cannot be longer than 68 characters. [1] By using YAML, more sophisticated and nested data structures may be used in ASDF than FITS.

Related Research Articles

In computing and telecommunications, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly graphic characters, also known as printing characters, except perhaps for "space" characters. In the ASCII standard there are 33 control characters, such as code 7, BEL, which rings a terminal bell.

Multipurpose Internet Mail Extensions (MIME) is a standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

Flexible Image Transport System (FITS) is an open standard defining a digital file format useful for storage, transmission and processing of data: formatted as multi-dimensional arrays, or tables. FITS is the most commonly used digital file format in astronomy. The FITS standard was designed specifically for astronomical data, and includes provisions such as describing photometric and spatial calibration information, together with image origin metadata.

In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.

<span class="mw-page-title-main">Newline</span> Special characters in computing signifying the end of a line of text

A newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.

uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at the University of California, Berkeley in 1980, for encoding binary data for transmission in email systems.

<span class="mw-page-title-main">Comma-separated values</span> File format used to store data

Comma-separated values (CSV) is a text file format that uses commas to separate values, and newlines to separate records. A CSV file stores tabular data in plain text, where each line of the file typically represents one data record. Each record consists of the same number of fields, and these are separated by commas in the CSV file. If the field delimiter itself may appear within a field, fields can be surrounded with quotation marks.

<span class="mw-page-title-main">Hierarchical Data Format</span> Set of file formats

Hierarchical Data Format (HDF) is a set of file formats designed to store and organize large amounts of data. Originally developed at the U.S. National Center for Supercomputing Applications, it is supported by The HDF Group, a non-profit corporation whose mission is to ensure continued development of HDF5 technologies and the continued accessibility of data stored in HDF.

<span class="mw-page-title-main">Binary file</span> Non-human-readable computer file encoded in binary form

A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files containing formatted text, such as older Microsoft Word document files, contain the text of the document but also contain formatting information in binary form.

A FourCC is a sequence of four bytes used to uniquely identify data formats. It originated from the OSType or ResType metadata system used in classic Mac OS and was adopted for the Amiga/Electronic Arts Interchange File Format and derivatives. The idea was later reused to identify compressed data types in QuickTime and DirectShow.

APE tags comprise one extant convention used to store information (metadata) about a given digital audio file. Each APE tag constitutes a discrete element that describes a single attribute of the file's contents. Each consists of a key/value pair; the key is simply a label that names the attribute, such as Year, Title, Artist, or Track Number, etc.), and associated with it is a corresponding value, namely, some information descriptive of this file, in terms of the attribute in question. APE tags can be used with .ape-formatted recordings, as well as with sound files of other audio file formats.

Bencode is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data.

A hex editor is a computer program that allows for manipulation of the fundamental binary data that constitutes a computer file. The name 'hex' comes from 'hexadecimal', a standard numerical format for representing binary data. A typical computer file occupies multiple areas on the storage medium, whose contents are combined to form the file. Hex editors that are designed to parse and edit sector data from the physical segments of floppy or hard disks are sometimes called sector editors or disk editors.

Netpbm is an open-source package of graphics programs and a programming library. It is used mainly in the Unix world, where one can find it included in all major open-source operating system distributions, but also works on Microsoft Windows, macOS, and other operating systems.

<span class="mw-page-title-main">STL (file format)</span> Standard Tessellation Language. File format for 3D printing and scanning applications.

STL is a file format native to the stereolithography CAD software created by 3D Systems. Chuck Hull, the inventor of stereolithography and 3D Systems’ founder, reports that the file extension is an abbreviation for stereolithography.

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the communication channel does not allow binary data or is not 8-bit clean. PGP documentation uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.

<span class="mw-page-title-main">SREC (file format)</span> File format developed by Motorola

Motorola S-record is a file format, created by Motorola in the mid-1970s, that conveys binary information as hex values in ASCII text form. This file format may also be known as SRECORD, SREC, S19, S28, S37. It is commonly used for programming flash memory in microcontrollers, EPROMs, EEPROMs, and other types of programmable logic devices. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a HEX file. The HEX file is then imported by a programmer to "burn" the machine code into non-volatile memory, or is transferred to the target system for loading and execution.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.

<span class="mw-page-title-main">Astropy</span> Python language software

Astropy is a collection of software packages written in the Python programming language and designed for use in astronomy. The software is a single, free, core package for astronomical utilities due to the increasingly widespread usage of Python by astronomers, and to foster interoperability between various extant Python astronomy packages. Astropy is included in several large Python distributions; it is part of package managers for Linux and macOS, the Anaconda Python Distribution, Enthought Canopy and Ureka.

References

  1. 1 2 Greenfield, P.; Droettboom, M.; Bray, E. (2015). "ASDF: A new data format for astronomy". Astronomy and Computing. 12: 240–251. Bibcode:2015A&C....12..240G. doi: 10.1016/j.ascom.2015.06.004 .
  2. Wells, D. C.; Greisen, E. W.; Harten, R. H. (June 1981). "FITS: A Flexible Image Transport System". Astronomy and Astrophysics Supplement Series. 44: 363–370. Bibcode:1981A&AS...44..363W.