This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)
|
CD-Text is an extension of the Red Book Compact Disc specifications standard for audio CDs. It allows storage of additional information (e.g. album name, song name, and artist name) on a standards-compliant audio CD.
The specification for CD-Text was included in the Multi-Media Commands Set 3 R01 (MMC-3) standard, released in September 1996 and backed by Sony. [1] It was also added to new revisions of the Red Book. [2] The actual text is stored in a format compatible with Interactive Text Transmission System (ITTS), defined in the IEC 61866 standard. [3] The ITTS standard is also applied in the MiniDisc format, as well as in Digital Audio Broadcasting technology and Digital Compact Cassette.
Optical discs |
---|
The CD-Text information is stored in the subchannels R to W on the disc. This information is usually stored in the subchannels in the lead-in area of the disc, where there is roughly 5 kilobytes of space available. It can also be stored on the main program area of the disc (where the audio tracks are), which can store about 31 megabytes. [1] Since the R to W channels are not used in the Red Book specification of audio CDs, they are not read by all CD players, which prevents some devices from reading CD-Text information. [1]
CD-text data is defined in a scattered manner between MMC-3 and Sony documentation. The below uses GNU libcdio's description. [4]
On the lowest level, CD-text is stored in 18-byte "pack" units; this part is defined in MMC-3 Annex J. Each pack consists of 4 bytes of header (type indicator, track number reference, sequential counter, block number and character position indicator [BNCPI]), 12 bytes of payload, and 2 bytes of CRC. The type indicator ranges from 0x80 to 0x8F, the 13 defined values being: [5]
Type | Keyword | Description | Section | Format |
---|---|---|---|---|
0x84 | ARRANGER | Name(s) of the arranger(s) | Any | Character |
0x83 | COMPOSER | Name(s) of the composer(s) | Any | Character |
0x86 | DISK_ID | Disc Identification information | Disk | Binary |
0x87 | GENRE | Genre Identification and Genre information | Disk | Binary |
0x8e | ISRC | International Standard Recording Code of each track | Track | Character |
0x85 | MESSAGE | Message from the content provider and/or artist | Any | Character |
0x81 | PERFORMER | Name(s) of the performer(s) | Any | Character |
0x82 | SONGWRITER | Name(s) of the songwriter(s) | Any | Character |
0x80 | TITLE | Title of album name or track titles | Any | Character |
0x88 | TOC_INFO | Table-of-content information | Disk | Binary |
0x89 | TOC_INFO2 | Second table-of-content information | Disk | Binary |
0x8e | UPC_EAN | UPC/EAN code of the album | Disc | Character |
0x8f | SIZE_INFO | Size information of the block | Any | Binary |
The BNPCI is used to define information that does not fit in one pack. This can be text or binary data. The BNCPI also indicates whether the text is single-byte or double-byte data in the top bit. This determines how null-terminated strings are defined – one or two bytes of 0x00. [4] (Note: the DBCS mode is rarely, if ever, used. Its special null handling is not necessary for computer DBCS code pages, as they are "hybrid" with ASCII and compatible in the NUL behavior. UTF-16 could be the intended use.)
For block types listed above as "character" (per MMC-3), the payload is a simple null-terminated string. (MMC-3 is written confusingly here – it describes the encoding as "ASCII" in the pack type table despite mentioning the BNCPI flag modifying its behavior later.) The descriptions of the binary fields are vague, but the developers of GNU libcdio has either matched them to sections of MMC-3 or written new descriptions based on Sony's sample. [4]
Another layer of encoding specification is found at this payload level, in the SIZE_INFO block. Here the first byte may be used to indicate the encoding, ASCII, Latin-1, or "MS-JIS". This is supported by the original Sony authoring tools. [4]
The compact disc (CD) is a digital optical disc data storage format that was co-developed by Philips and Sony to store and play digital audio recordings. It uses the Compact Disc Digital Audio format which typically provides 74 minutes of audio on a disc. In later years, the compact disc was adapted for non-audio computer data storage purposes as CD-ROM and its derivatives. First released in Japan in October 1982, the CD was the second optical disc technology to be invented, after the much larger LaserDisc (LD). By 2007, 200 billion CDs had been sold worldwide.
ISO 9660 is a file system for optical disc media. The file system is an international standard available from the International Organization for Standardization (ISO). Since the specification is available for anybody to purchase, implementations have been written for many operating systems.
UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit.
Compact Disc Digital Audio, also known as Digital Audio Compact Disc or simply as Audio CD, is the standard format for audio compact discs. The standard is defined in the Red Book technical specifications, which is why the format is also dubbed "Redbook audio" in some contexts. CDDA utilizes pulse-code modulation (PCM) and uses a 44,100 Hz sampling frequency and 16-bit resolution, and was originally specified to store up to 74 minutes of stereo audio per disc.
AES3 is a standard for the exchange of digital audio signals between professional audio devices. An AES3 signal can carry two channels of pulse-code-modulated digital audio over several transmission media including balanced lines, unbalanced lines, and optical fiber.
In telecommunications and computing, bit rate is the number of bits that are conveyed or processed per unit of time.
The null character is a control character with the value zero. It is present in many character sets, including those defined by the Baudot and ITA2 codes, ISO/IEC 646, the C0 control code, the Universal Coded Character Set, and EBCDIC. It is available in nearly all mainstream programming languages. It is often abbreviated as NUL. In 8-bit codes, it is known as a null byte.
CD+G is an extension of the compact disc standard that can present low-resolution graphics alongside the audio data on the disc when played on a compatible device. CD+G discs are often used for karaoke machines, which use this functionality to present on-screen lyrics for the song contained on the disc. The CD+G specifications were published by Philips and Sony as an extension of the Red Book specifications.
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese (characters).
The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard for reducing the number of bytes needed to represent Unicode text, especially if that text uses mostly characters from one or a small number of per-language character blocks. It does so by dynamically mapping values in the range 128–255 to offsets within particular blocks of 128 characters. The initial conditions of the encoder mean that existing strings in ASCII and ISO-8859-1 that do not contain C0 control codes other than NULL TAB CR and LF can be treated as SCSU strings. Since most alphabets do reside in blocks of contiguous Unicode codepoints, texts that use small alphabets and either ASCII punctuation or punctuation that fits within the window for the main alphabet can be encoded at one byte per character, most other punctuation can be encoded at 2 bytes per symbol through non-locking shifts. SCSU can also switch to UTF-16 internally to handle non-alphabetic languages.
MPEG transport stream or simply transport stream (TS) is a standard digital container format for transmission and storage of audio, video, and Program and System Information Protocol (PSIP) data. It is used in broadcast systems such as DVB, ATSC and IPTV.
A Data Matrix is a two-dimensional code consisting of black and white "cells" or dots arranged in either a square or rectangular pattern, also known as a matrix. The information to be encoded can be text or numeric data. Usual data size is from a few bytes up to 1556 bytes. The length of the encoded data depends on the number of cells in the matrix. Error correction codes are often used to increase reliability: even if one or more cells are damaged so it is unreadable, the message can still be read. A Data Matrix symbol can store up to 2,335 alphanumeric characters.
T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics. The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.
On an optical disc, a track (CD) or title (DVD) is a subdivision of its content. Specifically, it is a consecutive set of sectors on the disc containing a block of data. One session may contain one or more tracks of the same or different types. There are several kinds of tracks, and there is also a sub-track index for finding points within a track.
Subcode or subchannel data refers to data contained in a compact disc (CD) in addition to digital audio or user data, which is used for control and playback of the CD. The original specification was defined in the Red Book standard for CD Digital Audio, though further specifications have extended their use.
.m2ts is a filename extension used for the Blu-ray disc Audio-Video (BDAV) MPEG-2 Transport Stream (M2TS) container file format. It is used for multiplexing audio, video and other streams, such as subtitles. It is based on the MPEG-2 transport stream container. This container format is commonly used for high-definition video on Blu-ray Disc and AVCHD.
A CD-ROM is a type of read-only memory consisting of a pre-pressed optical compact disc that contains data computers can read, but not write or erase. Some CDs, called enhanced CDs, hold both computer data and audio with the latter capable of being played on a CD player, while data is only usable on a computer.
Super Video CD is a digital format for storing video on standard compact discs. SVCD was intended as a successor to Video CD and an alternative to DVD-Video, and falls somewhere between both in terms of technical capability and picture quality.
Universal Binary JSON (UBJSON) is a computer data interchange format. It is a binary form directly imitating JSON, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.
IBM code page 949 (IBM-949) is a character encoding which has been used by IBM to represent Korean language text on computers. It is a variable-width encoding which represents the characters from the Wansung code defined by the South Korean standard KS X 1001 in a format compatible with EUC-KR, but adds IBM extensions for additional hanja, additional precomposed Hangul syllables, and user-defined characters.