MPEG-4 is a group of international standards for the compression of digital audio and visual data, multimedia systems, and file storage formats. It was originally introduced in late 1998 as a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group (MPEG) (ISO/IEC JTC 1/SC29/WG11) under the formal standard ISO/IEC 14496 – Coding of audio-visual objects. Uses of MPEG-4 include compression of audiovisual data for Internet video and CD distribution, voice (telephone, videophone) and broadcast television applications. The MPEG-4 standard was developed by a group led by Touradj Ebrahimi (later the JPEG president) and Fernando Pereira. [1]
MPEG-4 absorbs many of the features of MPEG-1 and MPEG-2 and other related standards, adding new features such as (extended) VRML support for 3D rendering, object-oriented composite files (including audio, video and VRML objects), support for externally specified digital rights management and various types of interactivity. AAC (Advanced Audio Coding) was standardized as an adjunct to MPEG-2 (as Part 1) before MPEG-4 was issued.
MPEG-4 is still an evolving standard and is divided into a number of parts. Companies promoting MPEG-4 compatibility do not always clearly state which "part" level compatibility they are referring to. The key parts to be aware of are MPEG-4 Part 2 (including Advanced Simple Profile, used by codecs such as DivX, Xvid, Nero Digital, RealMedia, 3ivx, H.263 and by QuickTime 6) and MPEG-4 part 10 (MPEG-4 AVC/H.264 or Advanced Video Coding, used by the x264 encoder, Nero Digital AVC, QuickTime 7, Flash Video, and high-definition video media like Blu-ray Disc).
Most of the features included in MPEG-4 are left to individual developers to decide whether or not to implement. This means that there are probably no complete implementations of the entire MPEG-4 set of standards. To deal with this, the standard includes the concept of "profiles" and "levels", allowing a specific set of capabilities to be defined in a manner appropriate for a subset of applications.
Initially, MPEG-4 was aimed primarily at low-bit-rate video communications; however, its scope as a multimedia coding standard was later expanded. MPEG-4 is efficient across a variety of bit rates ranging from a few kilobits per second to tens of megabits per second. MPEG-4 provides the following functions:
MPEG-4 provides a series of technologies for developers, for various service-providers and for end users:
The MPEG-4 format can perform various functions, among which might be the following:
MPEG-4 provides a large and rich set of tools for encoding.[ vague ] Subsets of the MPEG-4 tool sets have been provided for use in specific applications.[ vague ] These subsets, called 'Profiles', limit the size of the tool set a decoder is required to implement. [3] In order to restrict computational complexity, one or more 'Levels' are set for each Profile. [3] A Profile and Level combination allows: [3]
MPEG-4 consists of several standards—termed "parts"—including the following (each part covers a certain aspect of the whole specification):
Part | Number | First public release date (first edition) | Latest public release date (last edition) | Latest amendment | Title | Description |
---|---|---|---|---|---|---|
Part 1 | ISO/IEC 14496-1 [6] | 1999 | 2010 [7] | 2014 [8] | Systems | Describes synchronization and multiplexing of video and audio. For example, the MPEG-4 file format version 1 (obsoleted by version 2 defined in MPEG-4 Part 14). The functionality of a transport protocol stack for transmitting and/or storing content complying with ISO/IEC 14496 is not within the scope of 14496-1 and only the interface to this layer is considered (DMIF). Information about transport of MPEG-4 content is defined e.g. in MPEG-2 Transport Stream, RTP Audio Video Profiles and others. [9] [10] [11] [12] [13] |
Part 2 | ISO/IEC 14496-2 [14] | 1999 | 2004 [15] | 2009 | Visual | A compression format for visual data (video, still textures, synthetic images, etc.). Contains many profiles, including the Advanced Simple Profile (ASP), and the Simple Profile (SP). |
Part 3 | ISO/IEC 14496-3 [16] | 1999 | 2009 [17] | 2017 [18] | Audio | A set of compression formats for perceptual coding of audio signals, including some variations of Advanced Audio Coding (AAC) as well as other audio/speech coding formats and tools (such as Audio Lossless Coding (ALS), Scalable Lossless Coding (SLS), Structured Audio, Text-To-Speech Interface (TTSI), HVXC, CELP and others) |
Part 4 | ISO/IEC 14496-4 [19] | 2000 | 2004 [20] | 2016 | Conformance testing | Describes procedures for testing conformance to other parts of the standard. |
Part 5 | ISO/IEC 14496-5 [21] | 2000 | 2001 [22] | 2017 | Reference software | Provides reference software for demonstrating and clarifying the other parts of the standard. |
Part 6 | ISO/IEC 14496-6 [23] | 1999 | 2000 [24] | Delivery Multimedia Integration Framework (DMIF) | ||
Part 7 | ISO/IEC TR 14496-7 [25] | 2002 | 2004 [26] | Optimized reference software for coding of audio-visual objects | Provides examples of how to make improved implementations (e.g., in relation to Part 5). | |
Part 8 | ISO/IEC 14496-8 [27] | 2004 | 2004 [28] | Carriage of ISO/IEC 14496 contents over IP networks | Specifies a method to carry MPEG-4 content on IP networks. It also includes guidelines to design RTP payload formats, usage rules of SDP to transport ISO/IEC 14496-1-related information, MIME type definitions, analysis on RTP security and multicasting. | |
Part 9 | ISO/IEC TR 14496-9 [29] | 2004 | 2009 [30] | Reference hardware description | Provides hardware designs for demonstrating how to implement the other parts of the standard. | |
Part 10 | ISO/IEC 14496-10 [31] | 2003 | 2014 [32] | 2016 [33] | Advanced Video Coding (AVC) | A compression format for video signals which is technically identical to the ITU-T H.264 standard. |
Part 11 | ISO/IEC 14496-11 [34] | 2005 | 2015 [35] | Scene description and application engine | Can be used for rich, interactive content with multiple profiles, including 2D and 3D versions. MPEG-4 Part 11 revised MPEG-4 Part 1 – ISO/IEC 14496-1:2001 and two amendments to MPEG-4 Part 1. It describes a system level description of an application engine (delivery, lifecycle, format and behaviour of downloadable Java byte code applications) and the Binary Format for Scene (BIFS) and the Extensible MPEG-4 Textual (XMT) format – a textual representation of the MPEG-4 multimedia content using XML, etc. [35] (It is also known as BIFS, XMT, MPEG-J. [36] MPEG-J was defined in MPEG-4 Part 21) | |
Part 12 | ISO/IEC 14496-12 [37] | 2004 | 2015 [38] | 2017 [39] | ISO base media file format | A file format for storing time-based media content. It is a general format forming the basis for a number of other more specific file formats (e.g. 3GP, Motion JPEG 2000, MPEG-4 Part 14). It is technically identical to ISO/IEC 15444-12 (JPEG 2000 image coding system – Part 12). |
Part 13 | ISO/IEC 14496-13 [40] | 2004 | 2004 [41] | Intellectual Property Management and Protection (IPMP) Extensions | MPEG-4 Part 13 revised an amendment to MPEG-4 Part 1 – ISO/IEC 14496-1:2001/Amd 3:2004. It specifies common Intellectual Property Management and Protection (IPMP) processing, syntax and semantics for the carriage of IPMP tools in the bit stream, IPMP information carriage, mutual authentication for IPMP tools, a list of registration authorities required for the support of the amended specifications (e.g. CISAC), etc. It was defined due to the lack of interoperability of different protection mechanisms (different DRM systems) for protecting and distributing copyrighted digital content such as music or video. [42] [43] [44] [45] [46] [47] [48] [49] [50] | |
Part 14 | ISO/IEC 14496-14 [51] | 2003 | 2003 [52] | 2010 [53] | MP4 file format | It is also known as "MPEG-4 file format version 2". The designated container file format for MPEG-4 content, which is based on Part 12. It revises and completely replaces Clause 13 of ISO/IEC 14496-1 (MPEG-4 Part 1: Systems), in which the MPEG-4 file format was previously specified. |
Part 15 | ISO/IEC 14496-15 [54] | 2004 | 2022 [55] | 2023 [56] | Part 15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format | For storage of Part 10 video. File format is based on Part 12, but also allows storage in other file formats. |
Part 16 | ISO/IEC 14496-16 [57] | 2004 | 2011 [58] | 2016 [59] | Animation Framework eXtension (AFX) | It specifies MPEG-4 Animation Framework eXtension (AFX) model for representing 3D Graphics content. MPEG-4 is extended with higher-level synthetic objects for specifying geometry, texture, animation and dedicated compression algorithms. |
Part 17 | ISO/IEC 14496-17 [60] | 2006 | 2006 [61] | Streaming text format | Timed Text subtitle format | |
Part 18 | ISO/IEC 14496-18 [62] | 2004 | 2004 [63] | 2014 | Font compression and streaming | For Open Font Format defined in Part 22. |
Part 19 | ISO/IEC 14496-19 [64] | 2004 | 2004 [65] | Synthesized texture stream | Synthesized texture streams are used for creation of very low bitrate synthetic video clips. | |
Part 20 | ISO/IEC 14496-20 [66] | 2006 | 2008 [67] | 2010 | Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF) | LASeR requirements (compression efficiency, code and memory footprint) are fulfilled by building upon the existing the Scalable Vector Graphics (SVG) format defined by the World Wide Web Consortium. [68] |
Part 21 | ISO/IEC 14496-21 [69] | 2006 | 2006 [70] | MPEG-J Graphics Framework eXtensions (GFX) | Describes a lightweight programmatic environment for advanced interactive multimedia applications – a framework that marries a subset of the MPEG standard Java application environment (MPEG-J) with a Java API. [36] [70] [71] [72] (at "FCD" stage in July 2005, FDIS January 2006, published as ISO standard on 2006-11-22). | |
Part 22 | ISO/IEC 14496-22 [73] | 2007 | 2015 [74] | 2017 | Open Font Format | OFFS is based on the OpenType version 1.4 font format specification, and is technically equivalent to that specification. [75] [76] Reached "CD" stage in July 2005, published as ISO standard in 2007 |
Part 23 | ISO/IEC 14496-23 [77] | 2008 | 2008 [78] | Symbolic Music Representation (SMR) | Reached "FCD" stage in October 2006, published as ISO standard in 2008-01-28 | |
Part 24 | ISO/IEC TR 14496-24 [79] | 2008 | 2008 [80] | Audio and systems interaction | Describes the desired joint behavior of MPEG-4 File Format and MPEG-4 Audio. | |
Part 25 | ISO/IEC 14496-25 [81] | 2009 | 2011 [82] | 3D Graphics Compression Model | Defines a model for connecting 3D Graphics Compression tools defined in MPEG-4 standards to graphics primitives defined in any other standard or specification. | |
Part 26 | ISO/IEC 14496-26 [83] | 2010 | 2010 [84] | 2016 | Audio Conformance | |
Part 27 | ISO/IEC 14496-27 [85] | 2009 | 2009 [86] | 2015 [87] | 3D Graphics conformance | 3D Graphics Conformance summarizes the requirements, cross references them to characteristics, and defines how conformance with them can be tested. Guidelines are given on constructing tests to verify decoder conformance. |
Part 28 | ISO/IEC 14496-28 [88] | 2012 | 2012 [89] | Composite font representation | ||
Part 29 | ISO/IEC 14496-29 [90] | 2014 | 2015 | Web video coding | Text of Part 29 is derived from Part 10 - ISO/IEC 14496-10. Web video coding is a technology that is compatible with the Constrained Baseline Profile of ISO/IEC 14496-10 (the subset that is specified in Annex A for Constrained Baseline is a normative specification, while all remaining parts are informative). | |
Part 30 | ISO/IEC 14496-30 [91] | 2014 | 2014 | Timed text and other visual overlays in ISO base media file format | It describes the carriage of some forms of timed text and subtitle streams in files based on ISO/IEC 14496-12 - W3C Timed Text Markup Language 1.0, W3C WebVTT (Web Video Text Tracks). The documentation of these forms does not preclude other definition of carriage of timed text or subtitles; see, for example, 3GPP Timed Text (3GPP TS 26.245). | |
Part 31 | ISO/IEC 14496-31 [92] | Under development (2018-05) | Video Coding for Browsers | Video Coding for Browsers (VCB) - a video compression technology that is intended for use within World Wide Web browser | ||
Part 32 | ISO/IEC CD 14496-32 [93] | Under development | Conformance and reference software | |||
Part 33 | ISO/IEC FDIS 14496-33 [94] | Under development | Internet video coding |
Profiles are also defined within the individual "parts", so an implementation of a part is ordinarily not an implementation of an entire part.
MPEG-1, MPEG-2, MPEG-7 and MPEG-21 are other suites of MPEG standards.
MPEG-4 contains patented technologies, the use of which requires licensing in countries that acknowledge software algorithm patents. Over two dozen companies claim to have patents covering MPEG-4. MPEG LA [95] licenses patents required for MPEG-4 Part 2 Visual from a wide range of companies (audio is licensed separately) and lists all of its licensors and licensees on the site. New licenses for MPEG-4 System patents are under development [96] and no new licenses are being offered while holders of its old MPEG-4 Systems license are still covered under the terms of that license for the patents listed. [97]
The majority of patents used for the MPEG-4 Visual format are held by three Japanese companies: Mitsubishi Electric (255 patents), Hitachi (206 patents), and Panasonic (200 patents).
The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission and file formats for various applications. Together with JPEG, MPEG is organized under ISO/IEC JTC 1/SC 29 – Coding of audio, picture, multimedia and hypermedia information.
MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth. While MPEG-2 is not as efficient as newer standards such as H.264/AVC and H.265/HEVC, backwards compatibility with existing hardware and software means it is still widely used, for example in over-the-air digital television broadcasting and in the DVD-Video standard.
Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. It was designed to be the successor of the MP3 format and generally achieves higher sound quality than MP3 at the same bit rate.
MPEG-4 Part 3 or MPEG-4 Audio is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.
Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency of 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic delay for the encoder and decoder is 36 ms.
High-Efficiency Advanced Audio Coding (HE-AAC) is an audio coding format for lossy data compression of digital audio defined as an MPEG-4 Audio profile in ISO/IEC 14496–3. It is an extension of Low Complexity AAC (AAC-LC) optimized for low-bitrate applications such as streaming audio. The usage profile HE-AAC v1 uses spectral band replication (SBR) to enhance the modified discrete cosine transform (MDCT) compression efficiency in the frequency domain. The usage profile HE-AAC v2 couples SBR with Parametric Stereo (PS) to further enhance the compression efficiency of stereo signals.
TwinVQ is an audio compression technique developed by Nippon Telegraph and Telephone Corporation (NTT) Human Interface Laboratories in 1994. The compression technique has been used in both standardized and proprietary designs.
MPEG-4 Part 2, MPEG-4 Visual is a video compression format developed by the Moving Picture Experts Group (MPEG). It belongs to the MPEG-4 ISO/IEC standards. It uses block-wise motion compensation and a discrete cosine transform (DCT), similar to previous standards such as MPEG-1 Part 2 and H.262/MPEG-2 Part 2.
The Extensible MPEG-4 Textual Format (XMT) is a high-level, XML-based file format for storing MPEG-4 data in a way suitable for further editing. In contrast, the more common MPEG-4 Part 14 (MP4) format is less flexible and used for distributing finished content.
MPEG-4 Part 11Scene description and application engine was published as ISO/IEC 14496-11 in 2005. MPEG-4 Part 11 is also known as BIFS, XMT, MPEG-J. It defines:
QuickTime File Format (QTFF) is a computer file format used natively by the QuickTime framework.
MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/IEC 14496-3:2005/Amd 2:2006 in 2006. The latest description of MPEG-4 ALS was published as subpart 11 of the MPEG-4 Audio standard in December 2019.
MPEG-4 Structured Audio is an ISO/IEC standard for describing sound. It was published as subpart 5 of MPEG-4 Part 3 in 1999.
MPEG-4 SLS, or MPEG-4 Scalable to Lossless as per ISO/IEC 14496-3:2005/Amd 3:2006 (Scalable Lossless Coding), is an extension to the MPEG-4 Part 3 (MPEG-4 Audio) standard to allow lossless audio compression scalable to lossy MPEG-4 General Audio coding methods (e.g., variations of AAC). It was developed jointly by the Institute for Infocomm Research (I2R) and Fraunhofer, which commercializes its implementation of a limited subset of the standard under the name of HD-AAC. Standardization of the HD-AAC profile for MPEG-4 Audio is under development (as of September 2009).
MPEG-4 Part 14, or MP4, is a digital multimedia container format most commonly used to store video and audio, but it can also be used to store other data such as subtitles and still images. Like most modern container formats, it allows streaming over the Internet. The only filename extension for MPEG-4 Part 14 files as defined by the specification is .mp4. MPEG-4 Part 14 is a standard specified as a part of MPEG-4.
The MPEG-4 Low Delay Audio Coder is audio compression standard designed to combine the advantages of perceptual audio coding with the low delay necessary for two-way communication. It is closely derived from the MPEG-2 Advanced Audio Coding (AAC) standard. It was published in MPEG-4 Audio Version 2 and in its later revisions.
MPEG Surround, also known as Spatial Audio Coding (SAC) is a lossy compression format for surround sound that provides a method for extending mono or stereo audio services to multi-channel audio in a backwards compatible fashion. The total bit rates used for the core and the MPEG Surround data are typically only slightly higher than the bit rates used for coding of the core. MPEG Surround adds a side-information stream to the core bit stream, containing spatial image data. Legacy stereo playback systems will ignore this side-information while players supporting MPEG Surround decoding will output the reconstructed multi-channel audio.
Structured Audio Orchestra Language (SAOL) is an imperative, MUSIC-N programming language designed for describing virtual instruments, processing digital audio, and applying sound effects. It was published as subpart 5 of MPEG-4 Part 3 in 1999.
DMIF, or Delivery Multimedia Integration Framework, is a uniform interface between the application and the transport, that allows the MPEG-4 application developer to stop worrying about that transport. DMIF was defined in MPEG-4 Part 6 in 1999. DMIF defines two interfaces: the DAI and the DNI. A single application can run on different transport layers when supported by the right DMIF instantiation. MPEG-4 DMIF supports the following functionalities:
The ISO base media file format (ISOBMFF) is a container file format that defines a general structure for files that contain time-based multimedia data such as video and audio. It is standardized in ISO/IEC 14496-12, a.k.a. MPEG-4 Part 12, and was formerly also published as ISO/IEC 15444-12, a.k.a. JPEG 2000 Part 12.
{{cite web}}
: CS1 maint: numeric names: authors list (link){{cite web}}
: CS1 maint: numeric names: authors list (link){{citation}}
: CS1 maint: numeric names: authors list (link){{cite web}}
: CS1 maint: numeric names: authors list (link)