FourCC

Last updated

A FourCC ("four-character code") is a sequence of four bytes (typically ASCII) used to uniquely identify data formats. It originated from the OSType or ResType metadata system used in classic Mac OS and was adopted for the Amiga/Electronic Arts Interchange File Format and derivatives. The idea was later reused to identify compressed data types in QuickTime and DirectShow.

Contents

History

In 1984, the earliest version of a Macintosh OS, System 1, was released. It used the single-level Macintosh File System with metadata fields including file types, creator (application) information, and forks to store additional resources. It was possible to change this information without changing the data itself, so that they could be interpreted differently. Identical codes were used throughout the system, as type tags for all kinds of data. [1] [2]

In 1985, Electronic Arts introduced the Interchange File Format (IFF) meta-format (family of file formats), originally devised for use on the Amiga. These files consisted of a sequence of "chunks", which could contain arbitrary data, each chunk prefixed by a four-byte ID. The IFF specification explicitly mentions that the origins of the FourCC idea lie with Apple. [3]

This IFF was adopted by a number of developers including Apple for AIFF files and Microsoft for RIFF files (which were used as the basis for the AVI and WAV file formats). Apple referred to many of these codes as OSTypes. Microsoft and Windows developers refer to their four-byte identifiers as FourCCs or Four-Character Codes. FourCC codes were also adopted by Microsoft to identify data formats used in DirectX, specifically within DirectShow and DirectX Graphics.

In Apple systems

Since Mac OS X Panther, OSType signatures are one of several sources that may be examined to determine a Uniform Type Identifier and are no longer used as the primary data type signature. Mac OS X (macOS) prefers the more colloquial convention of labelling file types using file name extensions. At the time of the change, the change was a source of great contention among older users, who believed that Apple was reverting to a more primitive way that misplaces metadata in the filename.

Filesystem-associated type codes are not readily accessible for users to manipulate, although they can be viewed and changed with certain software, most notably the macOS command line tools GetFileInfo and SetFile which are installed as part of the developer tools into /Developer/Tools, or the ResEdit utility available for older Macs. [4] [5]

Technical details

The byte sequence is usually restricted to ASCII printable characters, with space characters reserved for padding shorter sequences. Case sensitivity is preserved, unlike in file extensions. FourCCs are sometimes encoded in hexadecimal (e.g., "0x31637661" for 'avc1') [6] [7] [8] and sometimes encoded in a human-readable way (e.g., "mp4a"). Some FourCCs however, do contain non-printable characters, and are not human-readable without special formatting for display; for example, 10bit Y'CbCr 4:2:2 video can have a FourCC of ('Y', '3', 10, 10) [9] which ffmpeg displays as rawvideo (Y3[10] [10] / 0x0A0A3359), yuv422p10le.

Four-byte identifiers are useful because they can be made up of four human-readable characters with mnemonic qualities, while still fitting in the four-byte memory space typically allocated for integers in 32-bit systems (although endian issues may make them less readable). Thus, the codes can be used efficiently in program code as integers, as well as giving cues in binary data streams when inspected.

Compiler support

FourCC is written in big endian relative to the underlying ASCII character sequence, so that it appears in the correct byte order when read as a string. Many C compilers, including GCC, define a multi-character literal behavior of right-aligning to the least significant byte, so that '1234' becomes 0x31323334 in ASCII. [10] This is the conventional way of writing FourCC codes used by Mac OS programmers for OSType. (Classic Mac OS was exclusively big-endian.)

On little-endian machines, a byte-swap on the value is required to make the result correct. Taking the avc1 example from above: although the literal 'avc1' already converts to the integer value 0x61766331, a little-endian machine would have reversed the byte order and stored the value as 31 63 76 61. To yield the correct byte sequence 61 76 63 31, the pre-swapped value 0x31637661 is used.

Common uses

One of the most well-known uses of FourCCs is to identify the video codec or video coding format in AVI files. Common identifiers include DIVX , XVID , and H264 . For audio coding formats, AVI and WAV files use a two-byte identifier, usually written in hexadecimal (such as 0055 for MP3). In QuickTime files, these two-byte identifiers are prefixed with the letters "ms" to form a four-character code. RealMedia files also use four-character codes, however, the actual codes used differ from those found in AVI or QuickTime files.

Other file formats that make important use of the four-byte ID concept are the Standard MIDI File (SMF) format, the PNG image file format, the 3DS (3D Studio Max) mesh file format and the ICC profile format.

Four-character codes are also used in applications other than file formats, for example:

Other uses for OSTypes include:

See also

Related Research Articles

An audio file format is a file format for storing digital audio data on a computer system. The bit layout of the audio data is called the audio coding format and can be uncompressed, or compressed to reduce the file size, often using lossy compression. The data can be a raw bitstream in an audio coding format, but it is usually embedded in a container format or an audio data format with defined storage layer.

<span class="mw-page-title-main">ISO 9660</span> File system for CD-R and CD-ROM optical discs

ISO 9660 is a file system for optical disc media. The file system is an international standard available from the International Organization for Standardization (ISO). Since the specification is available for anybody to purchase, implementations have been written for many operating systems.

Audio Video Interleave is a proprietary multimedia container format and Windows standard introduced by Microsoft in November 1992 as part of its Video for Windows software. AVI files can contain both audio and video data in a file container that allows synchronous audio-with-video playback. Like the DVD video format, AVI files support multiple streaming audio and video, although these features are seldom used.

<span class="mw-page-title-main">Endianness</span> Order of bytes in a computer word

In computing, endianness is the order in which bytes within a word of digital data are transmitted over a data communication medium or stored (upwardly) in computer memory, counting only byte significance compared to earliness. Endianness is primarily expressed as big-endian (BE) or little-endian (LE), terms introduced by Danny Cohen into computer science for data ordering in an Internet Experiment Note published in 1980. The adjective endian has its origin in the writings of 18th century Anglo-Irish writer Jonathan Swift. In the 1726 novel Gulliver's Travels, he portrays the conflict between sects of Lilliputians divided into those breaking the shell of a boiled egg from the big end or from the little end. By analogy, a CPU may read a digital word big end first, or little end first.

A video file format is a type of file format for storing digital video data on a computer system. Video is almost always stored using lossy compression to reduce the file size.

Interchange File Format (IFF) is a generic digital container file format originally introduced by Electronic Arts in 1985 to facilitate transfer of data between software produced by different companies.

Audio Interchange File Format (AIFF) is an audio file format standard used for storing sound data for personal computers and other electronic audio devices. The format was developed by Apple Inc. in 1988 based on Electronic Arts' Interchange File Format and is most commonly used on Apple Macintosh computer systems.

Resource Interchange File Format (RIFF) is a generic file container format for storing data in tagged chunks. It is primarily used for audio and video, though it can be used for arbitrary data.

The resource fork is a fork or section of a file on Apple's classic Mac OS operating system, which was also carried over to the modern macOS for compatibility, used to store structured data along with the unstructured data stored within the data fork.

<span class="mw-page-title-main">Creator code</span>

A creator code is a mechanism introduced in the classic Mac OS to link a data file to the application program which created it. The similar type code held the file type, like "TEXT". Together, the type and creator indicated what application should be used to open a file, similar to the file extensions in other operating systems.

A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file (EOF) marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. Most text files need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records.

8-Bit Sampled Voice (8SVX) is an audio file format standard developed by Electronic Arts for the Amiga computer series. It is a data subtype of the IFF file container format. It typically contains linear pulse-code modulation (LPCM) digital audio.

In computer programming, a magic number is any of the following:

The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. An implementation of ar is included as one of the GNU Binutils.

Apple events are the message-based interprocess communication mechanism in Mac OS, first making an appearance in System 7 and supported by every version of the classic Mac OS since then and by macOS. Apple events describe "high-level" events such as "open document" or "print file", whereas earlier OSs had supported much more basic events, namely "click" and "keypress". Apple events form the basis of the Mac OS scripting system, the Open Scripting Architecture.

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

Flash Video is a container file format used to deliver digital video content over the Internet using Adobe Flash Player version 6 and newer. Flash Video content may also be embedded within SWF files. There are two different Flash Video file formats: FLV and F4V. The audio and video data within FLV files are encoded in the same way as SWF files. The F4V file format is based on the ISO base media file format, starting with Flash Player 9 update 3. Both formats are supported in Adobe Flash Player and developed by Adobe Systems. FLV was originally developed by Macromedia. In the early 2000s, Flash Video was the de facto standard for web-based streaming video. Users include Hulu, VEVO, Yahoo! Video, metacafe, Reuters.com, and many other news providers.

<span class="mw-page-title-main">GUID Partition Table</span> Computer disk partitioning standard

The GUID Partition Table (GPT) is a standard for the layout of partition tables of a physical computer storage device, such as a hard disk drive or solid-state drive, using universally unique identifiers, which are also known as globally unique identifiers (GUIDs). Forming a part of the Unified Extensible Firmware Interface (UEFI) standard, it is nevertheless also used for some BIOSs, because of the limitations of master boot record (MBR) partition tables, which use 32 bits for logical block addressing (LBA) of traditional 512-byte disk sectors.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.

CineForm Intermediate is an open source video codec developed for CineForm Inc by David Taylor, David Newman and Brian Schunck. On March 30, 2011, the company was acquired by GoPro which in particular wanted to use the 3D film capabilities of the CineForm 444 Codec for its 3D HERO System.

References

  1. The Type/Creator Database
  2. "Signatures of Macintosh Files". Logiciels & Services Duhem. Retrieved December 1, 2015.
  3. Morrison, Jerry (January 14, 1985). ""EA IFF 85" Standard for Interchange Format Files". Electronic Arts.
  4. "GetFileInfo", Darwin reference (man page), Apple
  5. "SetFile", Darwin reference (man page), Apple
  6. online-metadata.com. "What Is A Codec Tag?". What Is A Codec Tag?. Retrieved June 9, 2019.
  7. "git.videolan.org Git - ffmpeg.git/blob - libavformat/isom.c". git.videolan.org. Retrieved June 9, 2019.
  8. "FFmpeg/FFmpeg search". GitHub. Retrieved June 9, 2019.
  9. "FFmpeg: libavcodec/raw.c Source File". ffmpeg.org. Retrieved June 9, 2019.
  10. "The C Preprocessor: Implementation-defined behavior". gcc.gnu.org.
  11. "ACPI ID Registry". uefi.org.
  12. "OSStatus — Apple API Errors". www.osstatus.com.

General references