Gapless playback is the uninterrupted playback of consecutive audio tracks, such that relative time distances in the original audio source are preserved over track boundaries on playback. For this to be useful, other artifacts (than timing-related ones) at track boundaries should not be severed either. Gapless playback is common with compact discs, gramophone records, or tapes, but is not always available with other formats that employ compressed digital audio. The absence of gapless playback is a source of annoyance to listeners of music where tracks are meant to segue into each other, such as some classical music (opera in particular), progressive rock, concept albums, electronic music, and live recordings with audience noise between tracks.
Various software, firmware, and hardware components may add up to a substantial delay associated with starting playback of a track. If not accounted for, the listener is left waiting in silence as the player fetches the next file (see harddisk access time), updates metadata, decodes the whole first block, before having any data to feed the hardware buffer. The gap can be as much as half a second or more — very noticeable in "continuous" music such as certain classical or dance genres. In extreme cases, the hardware is even reset between tracks, creating a very short "click".
To account for the whole chain of delays, the start of the next track should ideally be readily decoded before the currently playing track finishes. The two decoded pieces of audio must be fed to the hardware continuously over the transition, as if the tracks were concatenated in software.
Many older audio players on personal computers do not implement the required buffering to play gapless audio. Some of these rely on third-party gapless audio plug-ins to buffer output. Most recent players and newer versions of old players now support gapless playback directly.
Lossy audio compression schemes that are based on overlapping time/frequency transforms add a small amount of padding silence to the beginning and end of each track. These silences increase the playtime of the compressed audio data. [1] If not trimmed off upon playback, the two silences played consecutively over a track boundary will appear as a pause in the original audio content. Lossless formats are not prone to this problem.
For some audio formats (e.g. Ogg Vorbis), where the start and end are precisely defined, the padding is implicitly trimmed off in the decoding process. Other formats may require extra metadata for the player to achieve the same. The popular MP3 format defines no way to record the amount of delay or padding for later removal. [notes 1] Also, the encoder delay may vary from encoder to encoder, making automatic removal difficult. [2] Even if two tracks are decompressed and merged into a single track, a pause will usually remain between them.
Audio-CDs can be recorded in either disc at once (DAO) or track at once (TAO) mode. The latter is more flexible, but has the drawback of inserting approximately 2 seconds of silence between tracks. Disc at once (DAO) mode allows you to record the entire CD in one continuous session, without any pauses between tracks. [3] This mode is suitable when you want a seamless playback experience with no interruptions between songs. DAO is commonly used for live recordings, DJ mixes, or concept albums where tracks blend into each other. [4]
As opposed to heuristic techniques, what is often meant by precise gapless playback, is that playback timing is guaranteed to be identical to the source. By this definition, a precise gapless player is not allowed to introduce either gaps or overlaps (crossfading) between successive tracks, and is not allowed to use guesswork.
Apart from accounting for playback latency, the preciseness here lies in treating lossless data as-is, and removing the correct amount of padding from lossy data. This is not possible for file formats with loosely defined encoder specifications and no metadata and therefore no way for encoders to record the duration of extraneous silence.
Heuristics are used by some music players to detect silence between tracks and trim the audio as necessary on playback. Due to the loss of time resolution of lossy compression, this method is inexact. In particular, the silence is not exactly zero. If the silence threshold is too low, some silences go undetected. Too high, and entire sections of quiet music at the beginning or end of a track may be removed.
Digital signal processing (DSP) algorithms can also be used to crossfade between tracks. This eliminates gaps that some listeners find distracting, but also greatly alters the audio signal, which may have undesirable effects on the listening experience. Some listeners dislike these effects more than the gap they attempt to remove. For example, crossfading is inappropriate for files that are already gapless, in which case the transition may feel artificially short and disturb the rhythm. [5] Also, depending on the length of untrimmed silence and the particular crossfader, it may cause a large volume drop.
These methods defeat the purpose of intentional spacing between tracks. Not all albums are mix albums; perhaps more typically, there is an aesthetic pause between unrelated tracks. Also, the artist may intentionally leave in silences for dramatic effect, which should arguably be preserved regardless of whether there is a track boundary there.
Compared to precise gapless playback, these methods are a different approach to erroneous silence in audio files, but other required features are the same. However, this approach requires more computation. In portable digital audio players, this means a reduced playing time on batteries.
A common workaround is to encode consecutive tracks as one single file, relying on cue sheets (or something similar) for navigation. While this method results in gapless playback within consecutive tracks, it can be unwieldy because of the possibly large size of the resulting compressed file. Furthermore, unless the playback software or hardware can recognize the cue sheets, navigating between tracks may be difficult.
It may be possible to add gapless metadata to existing files. If the encoder is known, it is possible to guess the encoder delay. Also, if the compression was performed on CD audio, the original playback length will be an integer multiple of 588 samples, the size of one CD sector. Thus the total playback time can also be guessed. Adding such information to audio files will enable precise gapless playback in players that support this.
Since lossless data compression excludes the possibility of the introduction of padding, all lossless audio file formats are inherently gapless.
These lossy audio file formats have provisions for gapless encoding:
Some other formats do not officially support gapless encoding, but some implementations of encoders or decoders may handle gapless metadata.
Optimal solutions:
Alternative or partial solutions:
Ogg is a free, open container format maintained by the Xiph.Org Foundation. The authors of the Ogg format state that it is unrestricted by software patents and is designed to provide for efficient streaming and manipulation of high-quality digital multimedia. Its name is derived from "ogging", jargon from the computer game Netrek.
Vorbis is a free and open-source software project headed by the Xiph.Org Foundation. The project produces an audio coding format and software reference encoder/decoder (codec) for lossy audio compression, libvorbis. Vorbis is most commonly used in conjunction with the Ogg container format and it is therefore often referred to as Ogg Vorbis.
Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. WMA consists of four distinct codecs. The original WMA codec, known simply as WMA, was conceived as a competitor to the popular MP3 and RealAudio codecs. WMA Pro, a newer and more advanced codec, supports multichannel and high-resolution audio. A lossless codec, WMA Lossless, compresses audio data without loss of audio fidelity. WMA Voice, targeted at voice content, applies compression using a range of low bit rates. Microsoft has also developed a digital container format called Advanced Systems Format to store audio encoded by WMA.
Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC, in 1992. ATRAC allowed a relatively small disc like MiniDisc to have the same running time as CD while storing audio information with minimal perceptible loss in quality. Improvements to the codec in the form of ATRAC3, ATRAC3plus, and ATRAC Advanced Lossless followed in 1999, 2002, and 2006 respectively.
FLAC is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software package that includes a codec implementation. Digital audio compressed by FLAC's algorithm can typically be reduced to between 50 and 70 percent of its original size and decompresses to an identical copy of the original audio data.
Monkey's Audio is an algorithm and file format for lossless audio data compression. Lossless data compression does not discard data during the process of encoding, unlike lossy compression methods such as Advanced Audio Coding, MP3, Vorbis, and Opus. Therefore, it may be decompressed to a file that is identical to the source material.
Xiph.Org Foundation is a nonprofit organization that produces free multimedia formats and software tools. It focuses on the Ogg family of formats, the most successful of which has been Vorbis, an open and freely licensed audio format and codec designed to compete with the patented WMA, MP3 and AAC. As of 2013, development work was focused on Daala, an open and patent-free video format and codec designed to compete with VP9 and the patented High Efficiency Video Coding.
foobar2000 is a freeware audio player for Microsoft Windows, iOS, Android, macOS, and formerly Windows Phone, developed by Peter Pawłowski. It has a modular design, which provides user flexibility in configuration and customization. Standard "skin" elements can be individually augmented or replaced with different dials and buttons, as well as visualizers such as waveform, oscilloscope, spectrum, spectrogram (waterfall), peak and smoothed VU meters, which all of them are analysis-oriented, at least for built-in visualizations. foobar2000 offers third-party user interface modifications through a software development kit (SDK).
Musepack or MPC is an open source lossy audio codec, specifically optimized for transparent compression of stereo audio at bitrates of 160–180 kbit/s. It was formerly known as MPEGplus, MPEG+ or MP+.
ReplayGain is a proposed technical standard published by David Robinson in 2001 to measure and normalize the perceived loudness of audio in computer audio formats such as MP3 and Ogg Vorbis. It allows media players to normalize loudness for individual tracks or albums. This avoids the common problem of having to manually adjust volume levels between tracks when playing audio files from albums that have been mastered at different loudness levels.
Rockbox is a free and open-source software replacement for the OEM firmware in various forms of digital audio players (DAPs) with an original kernel. It offers an alternative to the player's operating system, in many cases without removing the original firmware, which provides a plug-in architecture for adding various enhancements and functions. Enhancements include personal digital assistant (PDA) functions, applications, utilities, and games. Rockbox can also retrofit video playback functions on players first released in mid-2000. Rockbox includes a voice-driven user-interface suitable for operation by visually impaired users.
Music Player Daemon (MPD) is a free and open source music player server. It plays audio files, organizes playlists and maintains a music database. In order to interact with it, a client program is needed. The MPD distribution includes mpc, a simple command line client.
Music On Console (MOC) is an ncurses-based console audio player for Linux/UNIX. It was originally written by Damian Pietras, and is currently maintained by John Fitzgerald. It is designed to be powerful and easy to use, with an interface inspired by the Midnight Commander console file manager. The default interface layout comprises a file list in the left pane with the playlist on the right. It is configurable with customizable key bindings, color schemes and interface layouts. MOC comes with several themes defined in text files, which can be modified to create new layouts. It supports ALSA, OSS or JACK outputs.
Banshee was a cross-platform open-source media player, called Sonance until 2005. Built upon Mono and Gtk#, it used the GStreamer multimedia platform for encoding, and decoding various media formats, including Ogg Vorbis, MP3 and FLAC. Banshee can play and import audio CDs and supports many portable media players, including Apple's iPod, Android devices and Creative's ZEN players. Other features include Last.fm integration, album artwork fetching, smart playlists and podcast support. Banshee is released under the terms of the MIT License. Stable versions are available for many Linux distributions, as well as a beta preview for OS X and an alpha preview for Windows.
A tag editor is an app that can add, edit, or remove embedded metadata on multimedia file formats. Content creators, such as musicians, photographers, podcasters, and video producers, may need to properly label and manage their creations, adding such details as title, creator, date of creation, and copyright notice.
The Sansa Fuze is a portable media player developed by SanDisk and released on March 8, 2008. The Fuze is available in three different Flash memory capacities: 2 GB, 4 GB, and 8 GB and comes in six different colors: black, blue, pink, red, silver, and white. Storage is expandable via a microSDHC slot with capacity up to 32 GB, and unofficially to 64 GB or more via FAT32 formatted SDXC cards. All models have a 1.9 inch TFT LCD display with a resolution of 220 by 176 pixels and a built-in monaural microphone and FM tuner; recordings of the latter two are saved as PCM WAV files.
Nightingale is a discontinued free, open source audio player based on the Songbird media player source code. As such, Nightingale's engine is based on the Mozilla XULRunner with libraries such as the GStreamer media framework and libtag providing media tagging and playback support, amongst others.
Puddletag is a graphical audio file metadata editor ("tagger") for Unix-like operating systems.