Concatenative synthesis

Last updated

Concatenative synthesis is a technique for synthesising sounds by concatenating short samples of recorded sound (called units). The duration of the units is not strictly defined and may vary according to the implementation, roughly in the range of 10 milliseconds up to 1 second. It is used in speech synthesis and music sound synthesis to generate user-specified sequences of sound from a database (often called a corpus) built from recordings of other sequences.

Contents

In contrast to granular synthesis, concatenative synthesis is driven by an analysis of the source sound, in order to identify the units that best match the specified criterion. [1]

In speech

In music

Concatenative synthesis for music started to develop in the 2000s in particular through the work of Schwarz [2] and Pachet [3] (so-called musaicing). The basic techniques are similar to those for speech, although with differences due to the differing nature of speech and music: for example, the segmentation is not into phonetic units but often into subunits of musical notes or events. [1] [2] [4]

Zero Point, the first full-length album by Rob Clouth (Mesh 2020), features self-made concatenative synthesis software called the 'Reconstructor' which "chops sampled sounds into tiny pieces and rearranges them to replicate a target sound. This allowed Clouth to use and manipulate his own beatboxing, a technique used on 'Into' and 'The Vacuum State'." [5] Clouth's concatenative synthesis algorithm was adapted from 'Let It Bee — Towards NMF-Inspired Audio Mosaicing' by Jonathan Driedger, Thomas Prätzlich, and Meinard Müller. [6] [7]

See also

Related Research Articles

Additive synthesis is a sound synthesis technique that creates timbre by adding sine waves together.

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.

Computer music is the application of computing technology in music composition, to help human composers create new music or to have computers independently create music, such as with algorithmic composition programs. It includes the theory and application of new and existing computer software technologies and basic aspects of music, such as sound synthesis, digital signal processing, sound design, sonic diffusion, acoustics, electrical engineering and psychoacoustics. The field of computer music can trace its roots back to the origins of electronic music, and the first experiments and innovations with electronic instruments at the turn of the 20th century.

<span class="mw-page-title-main">Electronic musical instrument</span> Musical instrument that uses electronic circuits to generate sound

An electronic musical instrument or electrophone is a musical instrument that produces sound using electronic circuitry. Such an instrument sounds by outputting an electrical, electronic or digital audio signal that ultimately is plugged into a power amplifier which drives a loudspeaker, creating the sound heard by the performer and listener.

<span class="mw-page-title-main">Vocoder</span> Voice encryption, transformation, and synthesis device

A vocoder is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch. Pitch scaling is the opposite: the process of changing the pitch without affecting the speed. Pitch shift is pitch scaling implemented in an effects unit and intended for live performance. Pitch control is a simpler process which affects pitch and speed simultaneously by slowing down or speeding up a recording.

Audio Interchange File Format (AIFF) is an audio file format standard used for storing sound data for personal computers and other electronic audio devices. The format was developed by Apple Inc. in 1988 based on Electronic Arts' Interchange File Format and is most commonly used on Apple Macintosh computer systems.

<span class="mw-page-title-main">Music technology (electronic and digital)</span>

Digital music technology encompasses digital instruments, computers, electronic effects units, software, or digital audio equipment by a performer, composer, sound engineer, DJ, or record producer to produce, perform or record music. The term refers to electronic devices, instruments, computer hardware, and software used in performance, playback, recording, composition, mixing, analysis, and editing of music.

A music sequencer is a device or application software that can record, edit, or play back music, by handling note and performance information in several forms, typically CV/Gate, MIDI, or Open Sound Control (OSC), and possibly audio and automation data for digital audio workstations (DAWs) and plug-ins.

Wavetable synthesis is a sound synthesis technique used to create quasi-periodic waveforms often used in the production of musical tones or notes.

Granular synthesis is a sound synthesis method that operates on the microsound time scale.

In music, montage or sound collage is a technique where newly branded sound objects or compositions, including songs, are created from collage, also known as montage. This is often done through the use of sampling, while some playable sound collages were produced by gluing together sectors of different vinyl records. In any case, it may be achieved through the use of previous sound recordings or musical scores. Like its visual cousin, the collage work may have a completely different effect than that of the component parts, even if the original parts are completely recognizable or from only one source.

<span class="mw-page-title-main">ChucK</span> Audio programming language

ChucK is a concurrent, strongly timed audio programming language for real-time synthesis, composition, and performance, which runs on Linux, Mac OS X, Microsoft Windows, and iOS. It is designed to favor readability and flexibility for the programmer over other considerations such as raw performance. It natively supports deterministic concurrency and multiple, simultaneous, dynamic control rates. Another key feature is the ability to live code; adding, removing, and modifying code on the fly, while the program is running, without stopping or restarting. It has a highly precise timing/concurrency model, allowing for arbitrarily fine granularity. It offers composers and researchers a powerful and flexible programming tool for building and experimenting with complex audio synthesis programs, and real-time interactive control.

MUSIC-N refers to a family of computer music programs and programming languages descended from or influenced by MUSIC, a program written by Max Mathews in 1957 at Bell Labs. MUSIC was the first computer program for generating digital audio waveforms through direct synthesis. It was one of the first programs for making music on a digital computer, and was certainly the first program to gain wide acceptance in the music research community as viable for that task. The world's first computer-controlled music was generated in Australia by programmer Geoff Hill on the CSIRAC computer which was designed and built by Trevor Pearcey and Maston Beard. However, CSIRAC produced sound by sending raw pulses to the speaker, it did not produce standard digital audio with PCM samples, like the MUSIC-series of programs.

<span class="mw-page-title-main">Ableton Live</span> Digital audio workstation

Ableton Live, also known as simply Ableton, is a digital audio workstation for macOS and Windows developed by the German company Ableton. In contrast to many other software sequencers, Ableton Live is designed to be an instrument for live performances as well as a tool for composing, recording, arranging, mixing, and mastering. It is also used by DJs, as it offers a suite of controls for beatmatching, crossfading, and other different effects used by turntablists, and was one of the first music applications to automatically beatmatch songs. Live is available directly from Ableton in three editions: Intro, Standard, and Suite. Ableton also made a fourth version, Lite, with similar limitations to Intro. It is only available bundled with a range of music production hardware, including MIDI controllers and audio interfaces.

<span class="mw-page-title-main">Eduardo Reck Miranda</span> Musical artist

Eduardo Reck Miranda is a Brazilian composer of chamber and electroacoustic pieces but is most notable in the United Kingdom for his scientific research into computer music, particularly in the field of human-machine interfaces where brain waves will replace keyboards and voice commands to permit the disabled to express themselves musically.

<span class="mw-page-title-main">Synthesizer</span> Electronic musical instrument

A synthesizer is an electronic musical instrument that generates audio signals. Synthesizers typically create sounds by generating waveforms through methods including subtractive synthesis, additive synthesis and frequency modulation synthesis. These sounds may be altered by components such as filters, which cut or boost frequencies; envelopes, which control articulation, or how notes begin and end; and low-frequency oscillators, which modulate parameters such as pitch, volume, or filter characteristics affecting timbre. Synthesizers are typically played with keyboards or controlled by sequencers, software or other instruments, and may be synchronized to other equipment via MIDI.

The stutter edit, or stutter effect, is the rhythmic repetition of small fragments of audio, occurring as the common 16th note repetition, but also as 64th notes and beyond, with layers of digital signal processing operations in a rhythmic fashion based on the overall length of the host tempo. The Stutter Edit audio software VST plug-in implements forms of granular synthesis, sample retrigger, and various effects to create a certain audible manipulation of the sound run through it, in which fragments of audio are repeated in rhythmic intervals. The plug-in allows musicians to manipulate audio in real time, slicing audio into small fragments and sequences the pieces into rhythmic effects, recreating techniques that formerly took hours to do in the studio. Electronic musician Brian Transeau is widely recognized for pioneering the stutter edit as a musical technique; he developed, coined the term, and holds multiple patents for the Stutter Edit software plug-in.

<span class="mw-page-title-main">WaveNet</span> Deep neural network for generating raw audio

WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices by directly modelling waveforms using a neural network method trained with recordings of real speech. Tests with US English and Mandarin reportedly showed that the system outperforms Google's best existing text-to-speech (TTS) systems, although as of 2016 its text-to-speech synthesis still was less convincing than actual human speech. WaveNet's ability to generate raw waveforms means that it can model any kind of audio, including music.

References

  1. 1 2 Schwarz, D. (2005), "Current research in Concatenative Sound Synthesis" (PDF), Proceedings of the International Computer Music Conference (ICMC)
  2. 1 2 Schwarz, Diemo (2004-01-23), Data-Driven Concatenative Sound Synthesis , retrieved 2010-01-15
  3. Zils, A.; Pachet, F. (2001), "Musical Mosaicing" (PDF), Proceedings of the COST G-6 Conference on Digital Audio Effects (DaFx-01), University of Limerick, pp. 39–44, archived from the original (PDF) on 2011-09-27, retrieved 2011-04-27
  4. Maestre, E. and Ramírez, R. and Kersten, S. and Serra, X. (2009), "Expressive Concatenative Synthesis by Reusing Samples from Real Performance Recordings", Computer Music Journal, vol. 33, no. 4, pp. 23–42, CiteSeerX   10.1.1.188.8860 , doi:10.1162/comj.2009.33.4.23, S2CID   1078610 {{citation}}: CS1 maint: multiple names: authors list (link)
  5. "Zero Point, by Rob Clouth". Rob Clouth. Retrieved 2022-07-23.
  6. Sónar+D CCCB 2020 Talk: "Journey to the Center of the Musical Brain" , retrieved 2022-07-23
  7. "AudioLabs - Let it Bee - Towards NMF-inspired Audio Mosaicing". www.audiolabs-erlangen.de. Retrieved 2022-07-23.