Transcription (music)

Last updated

A J.S. Bach keyboard piece transcribed for guitar. Guitar J.S.Bach Musette for Anna.pdf
A J.S. Bach keyboard piece transcribed for guitar.

In music, transcription is the practice of notating a piece or a sound which was previously unnotated and/or unpopular as a written music, for example, a jazz improvisation or a video game soundtrack. When a musician is tasked with creating sheet music from a recording and they write down the notes that make up the piece in music notation, it is said that they created a musical transcription of that recording. Transcription may also mean rewriting a piece of music, either solo or ensemble, for another instrument or other instruments than which it was originally intended. The Beethoven Symphonies transcribed for solo piano by Franz Liszt are an example. Transcription in this sense is sometimes called arrangement , although strictly speaking transcriptions are faithful adaptations, whereas arrangements change significant aspects of the original piece.

Contents

Further examples of music transcription include ethnomusicological notation of oral traditions of folk music, such as Béla Bartók's and Ralph Vaughan Williams' collections of the national folk music of Hungary and England respectively. The French composer Olivier Messiaen transcribed birdsong in the wild, and incorporated it into many of his compositions, for example his Catalogue d'oiseaux for solo piano. Transcription of this nature involves scale degree recognition and harmonic analysis, both of which the transcriber will need relative or perfect pitch to perform.

In popular music and rock, there are two forms of transcription. Individual performers copy a note-for-note guitar solo or other melodic line. As well, music publishers transcribe entire recordings of guitar solos and bass lines and sell the sheet music in bound books. Music publishers also publish PVG (piano/vocal/guitar) transcriptions of popular music, where the melody line is transcribed, and then the accompaniment on the recording is arranged as a piano part. The guitar aspect of the PVG label is achieved through guitar chords written above the melody. Lyrics are also included below the melody.

Adaptation

Some composers have rendered homage to other composers by creating "identical" versions of the earlier composers' pieces while adding their own creativity through the use of completely new sounds arising from the difference in instrumentation. The most widely known example of this is Ravel's arrangement for orchestra of Mussorgsky's piano piece Pictures at an Exhibition . Webern used his transcription for orchestra of the six-part ricercar from Bach's The Musical Offering to analyze the structure of the Bach piece, by using different instruments to play different subordinate motifs of Bach's themes and melodies.

In transcription of this form, the new piece can simultaneously imitate the original sounds while recomposing them with all the technical skills of an expert composer in such a way that it seems that the piece was originally written for the new medium. But some transcriptions and arrangements have been done for purely pragmatic or contextual reasons. For example, in Mozart's time, the overtures and songs from his popular operas were transcribed for small wind ensemble simply because such ensembles were common ways of providing popular entertainment in public places. Mozart himself did this in his opera Don Giovanni , transcribing for small wind ensemble several arias from other operas, including one from his own opera The Marriage of Figaro . A more contemporary example is Stravinsky´s transcription for four hands piano of The Rite of Spring , to be used on the ballet's rehearsals. Today musicians who play in cafes or restaurants will sometimes play transcriptions or arrangements of pieces written for a larger group of instruments.

Other examples of this type of transcription include Bach's arrangement of Vivaldi's four-violin concerti for four keyboard instruments and orchestra; Mozart's arrangement of some Bach fugues from The Well-Tempered Clavier for string trio; Beethoven's arrangement of his Große Fuge , originally written for string quartet, for piano duet, and his arrangement of his Violin Concerto as a piano concerto; Franz Liszt's piano arrangements of the works of many composers, including the symphonies of Beethoven; Tchaikovsky's arrangement of four Mozart piano pieces into an orchestral suite called "Mozartiana"; Mahler's re-orchestration of Schumann symphonies; and Schoenberg's arrangement for orchestra of Brahms's piano quintet and Bach's "St. Anne" Prelude and Fugue for organ.

Since the piano became a popular instrument, a large literature has sprung up of transcriptions and arrangements for piano of works for orchestra or chamber music ensemble. These are sometimes called "piano reductions", because the multiplicity of orchestral parts—in an orchestral piece there may be as many as two dozen separate instrumental parts being played simultaneously—has to be reduced to what a single pianist (or occasionally two pianists, on one or two pianos, such as the different arrangements for George Gershwin's Rhapsody in Blue ) can manage to play.

Piano reductions are frequently made of orchestral accompaniments to choral works, for the purposes of rehearsal or of performance with keyboard alone.

Many orchestral pieces have been transcribed for concert band.

Transcription aids

Notation software

Since the advent of desktop publishing, musicians can acquire music notation software, which can receive the user's mental analysis of notes and then store and format those notes into standard music notation for personal printing or professional publishing of sheet music. Some notation software can accept a Standard MIDI File (SMF) or MIDI performance as input instead of manual note entry. These notation applications can export their scores in a variety of formats like EPS, PNG, and SVG. Often the software contains a sound library that allows the user's score to be played aloud by the application for verification.

Slow-down software

Prior to the invention of digital transcription aids, musicians would slow down a record or a tape recording to be able to hear the melodic lines and chords at a slower, more digestible pace. The problem with this approach was that it also changed the pitches, so once a piece was transcribed, it would then have to be transposed into the correct key. Software designed to slow down the tempo of music without changing the pitch of the music can be very helpful for recognizing pitches, melodies, chords, rhythms and lyrics when transcribing music. However, unlike the slow-down effect of a record player, the pitch and original octave of the notes will stay the same, and not descend in pitch. This technology is simple enough that it is available in many free software applications.

The software generally goes through a two-step process to accomplish this. First, the audio file is played back at a lower sample rate than that of the original file. This has the same effect as playing a tape or vinyl record at slower speed – the pitch is lowered meaning the music can sound like it is in a different key. The second step is to use Digital Signal Processing (or DSP) to shift the pitch back up to the original pitch level or musical key.

Pitch tracking software

As mentioned in the Automatic music transcription section, some commercial software can roughly track the pitch of dominant melodies in polyphonic musical recordings. The note scans are not exact, and often need to be manually edited by the user before saving to file in either a proprietary file format or in Standard MIDI File Format. Some pitch tracking software also allows the scanned note lists to be animated during audio playback.

Automatic music transcription

The term "automatic music transcription" was first used by audio researchers James A. Moorer, Martin Piszczalski, and Bernard Galler in 1977. With their knowledge of digital audio engineering, these researchers believed that a computer could be programmed to analyze a digital recording of music such that the pitches of melody lines and chord patterns could be detected, along with the rhythmic accents of percussion instruments. The task of automatic music transcription concerns two separate activities: making an analysis of a musical piece, and printing out a score from that analysis. [1]

This was not a simple goal, but one that would encourage academic research for at least another three decades. Because of the close scientific relationship of speech to music, much academic and commercial research that was directed toward the more financially resourced speech recognition technology would be recycled into research about music recognition technology. While many musicians and educators insist that manually doing transcriptions is a valuable exercise for developing musicians, the motivation for automatic music transcription remains the same as the motivation for sheet music: musicians who do not have intuitive transcription skills will search for sheet music or a chord chart, so that they may quickly learn how to play a song. A collection of tools created by this ongoing research could be of great aid to musicians. Since much recorded music does not have available sheet music, an automatic transcription device could also offer transcriptions that are otherwise unavailable in sheet music. To date, no software application can yet completely fulfill James Moorer’s definition of automatic music transcription. However, the pursuit of automatic music transcription has spawned the creation of many software applications that can aid in manual transcription. Some can slow down music while maintaining original pitch and octave, some can track the pitch of melodies, some can track the chord changes, and others can track the beat of music.

Automatic transcription most fundamentally involves identifying the pitch and duration of the performed notes. This entails tracking pitch and identifying note onsets. After capturing those physical measurements, this information is mapped into traditional music notation, i.e., the sheet music.

Digital Signal Processing is the branch of engineering that provides software engineers with the tools and algorithms needed to analyze a digital recording in terms of pitch (note detection of melodic instruments), and the energy content of un-pitched sounds (detection of percussion instruments). Musical recordings are sampled at a given recording rate and its frequency data is stored in any digital wave format in the computer. Such format represents sound by digital sampling.

Pitch detection

Pitch detection is often the detection of individual notes that might make up a melody in music, or the notes in a chord. When a single key is pressed upon a piano, what we hear is not just one frequency of sound vibration, but a composite of multiple sound vibrations occurring at different mathematically related frequencies. The elements of this composite of vibrations at differing frequencies are referred to as harmonics or partials.

For instance, if we press the Middle C key on the piano, the individual frequencies of the composite's harmonics will start at 261.6 Hz as the fundamental frequency, 523 Hz would be the 2nd Harmonic, 785 Hz would be the 3rd Harmonic, 1046 Hz would be the 4th Harmonic, etc. The later harmonics are integer multiples of the fundamental frequency, 261.6 Hz ( ex: 2 x 261.6 = 523, 3 x 261.6 = 785, 4 x 261.6 = 1046 ). While only about eight harmonics are really needed to audibly recreate the note, the total number of harmonics in this mathematical series can be large, although the higher the harmonic's numeral the weaker the magnitude and contribution of that harmonic. Contrary to intuition, a musical recording at its lowest physical level is not a collection of individual notes, but is really a collection of individual harmonics. That is why very similar-sounding recordings can be created with differing collections of instruments and their assigned notes. As long as the total harmonics of the recording are recreated to some degree, it does not really matter which instruments or which notes were used.

A first step in the detection of notes is the transformation of the sound file's digital data from the time domain into the frequency domain, which enables the measurement of various frequencies over time. The graphic image of an audio recording in the frequency domain is called a spectrogram or sonogram. A musical note, as a composite of various harmonics, appears in a spectrogram like a vertically placed comb, with the individual teeth of the comb representing the various harmonics and their differing frequency values. A Fourier Transform is the mathematical procedure that is used to create the spectrogram from the sound file’s digital data.

The task of many note detection algorithms is to search the spectrogram for the occurrence of such comb patterns (a composite of harmonics) caused by individual notes. Once the pattern of a note's particular comb shape of harmonics is detected, the note's pitch can be measured by the vertical position of the comb pattern upon the spectrogram.

There are basically two different types of music which create very different demands for a pitch detection algorithm: monophonic music and polyphonic music. Monophonic music is a passage with only one instrument playing one note at a time, while polyphonic music can have multiple instruments and vocals playing at once. Pitch detection upon a monophonic recording was a relatively simple task, and its technology enabled the invention of guitar tuners in the 1970s. However, pitch detection upon polyphonic music becomes a much more difficult task because the image of its spectrogram now appears as a vague cloud due to a multitude of overlapping comb patterns, caused by each note's multiple harmonics.

Another method of pitch detection was invented by Martin Piszczalski in conjunction with Bernard Galler in the 1970s [2] and has since been widely followed. [3] It targets monophonic music. Central to this method is how pitch is determined by the human ear. [4] The process attempts to roughly mimic the biology of the human inner ear by finding only but a few of the loudest harmonics at a given instant. That small set of found harmonics are in turn compared against all the possible resultant pitches' harmonic-sets, to hypothesize what the most probable pitch could be given that particular set of harmonics.

To date, the complete note detection of polyphonic recordings remains a mystery to audio engineers, although they continue to make progress by inventing algorithms which can partially detect some of the notes of a polyphonic recording, such as a melody or bass line.

Beat detection

Beat tracking is the determination of a repeating time interval between perceived pulses in music. Beat can also be described as 'foot tapping' or 'hand clapping' in time with the music. The beat is often a predictable basic unit in time for the musical piece, and may only vary slightly during the performance. Songs are frequently measured for their Beats Per Minute (BPM) in determining the tempo of the music, whether it be fast or slow.

Since notes frequently begin on a beat, or a simple subdivision of the beat's time interval, beat tracking software has the potential to better resolve note onsets that may have been detected in a crude fashion. Beat tracking is often the first step in the detection of percussion instruments.

Despite the intuitive nature of 'foot tapping' of which most humans are capable, developing an algorithm to detect those beats is difficult. Most of the current software algorithms for beat detection use a group competing hypothesis for beats-per-minute, as the algorithm progressively finds and resolves local peaks in volume, roughly corresponding to the foot-taps of the music.

How automatic music transcription works

To transcribe music automatically, several problems must be solved:

1. Notes must be recognized – this is typically done by changing from the time domain into the frequency domain. This can be accomplished through the Fourier transform. Computer algorithms for doing this are common. The fast Fourier transform algorithm computes the frequency content of a signal, and is useful in processing musical excerpts.

2. A beat and tempo need to be detected (Beat detection)- this is a difficult, many-faceted problem. [5]

The method proposed in Costantini et al. 2009 [6] focuses on note events and their main characteristics: the attack instant, the pitch and the final instant. Onset detection exploits a binary time-frequency representation of the audio signal. Note classification and offset detection are based on constant Q transform (CQT) and support vector machines (SVMs). A collection of public domain sheet music can be found here.

This in turn leads to a “pitch contour” namely a continuously time-varying line that corresponds to what humans refer to as melody. The next step is to segment this continuous melodic stream to identify the beginning and end of each note. After that, each “note unit” is expressed in physical terms (e.g., 442 Hz, .52 seconds). The final step is then to map this physical information into familiar music-notation-like terms for each note (e.g., an A4, quarter note).

Detailed computer steps behind automatic music transcription

In terms of actual computer processing, the principal steps are to 1) digitize the performed, analog music, 2) do successive short-term, fast Fourier transform (FFTs) to obtain the time-varying spectra, 3) identify the peaks in each spectrum, 4) analyze the spectral peaks to get pitch candidates, 5) connect the strongest individual pitch candidates to get the most likely time-varying, pitch contour, 6) map this physical data into the closest music-notation terms. These fundamental steps, originated by Piszczalski in the 1970s, became the foundation of automatic music transcription. [2]

The most controversial and difficult step in this process is detecting pitch . [7] The most successful pitch methods operate in the frequency domain, not the time domain. While time-domain methods have been proposed, they can break down for real-world musical instruments played in typically reverberant rooms.

The pitch-detection method invented by Piszczalski [4] again mimics human hearing. It follows how only certain sets of partials “fuse” together in human listening. These are the sets that create the perception of a single pitch only. Fusion occurs only when two partials are within 1.5% of being a perfect, harmonic pair (i.e., their frequencies approximate a low-integer pair set such as 1:2, 5:8, etc.) This near harmonic match is required of all the partials in order for a human to hear them as only a single pitch.

See also

Related Research Articles

<span class="mw-page-title-main">Music</span> Form of art using sound

Music is generally defined as the art of arranging sound to create some combination of form, harmony, melody, rhythm, or otherwise expressive content. Definitions of music vary depending on culture, though it is an aspect of all human societies and a cultural universal. While scholars agree that music is defined by a few specific elements, there is no consensus on their precise definitions. The creation of music is commonly divided into musical composition, musical improvisation, and musical performance, though the topic itself extends into academic disciplines, criticism, philosophy, psychology, and therapeutic contexts. Music may be performed or improvised using a vast range of instruments, including the human voice.

<span class="mw-page-title-main">Musical notation</span> Visual representation of music

Music notation or musical notation is any system used to visually represent aurally perceived music played with instruments or sung by the human voice through the use of written, printed, or otherwise-produced symbols, including notation for durations of absence of sound such as rests.

<span class="mw-page-title-main">Overtone</span> Tone with a frequency higher than the frequency of the reference tone

An overtone is any resonant frequency above the fundamental frequency of a sound. In other words, overtones are all pitches higher than the lowest pitch within an individual sound; the fundamental is the lowest pitch. While the fundamental is usually heard most prominently, overtones are actually present in any pitch except a true sine wave. The relative volume or amplitude of various overtone partials is one of the key identifying features of timbre, or the individual characteristic of a sound.

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch. Pitch scaling is the opposite: the process of changing the pitch without affecting the speed. Pitch shift is pitch scaling implemented in an effects unit and intended for live performance. Pitch control is a simpler process which affects pitch and speed simultaneously by slowing down or speeding up a recording.

<span class="mw-page-title-main">Orchestration</span> Study or practice of writing music for an orchestra

Orchestration is the study or practice of writing music for an orchestra or of adapting music composed for another medium for an orchestra. Also called "instrumentation", orchestration is the assignment of different instruments to play the different parts of a musical work. For example, a work for solo piano could be adapted and orchestrated so that an orchestra could perform the piece, or a concert band piece could be orchestrated for a symphony orchestra.

<span class="mw-page-title-main">Music theory</span> Study of the practices and possibilities of music

Music theory is the study of the practices and possibilities of music. The Oxford Companion to Music describes three interrelated uses of the term "music theory": The first is the "rudiments", that are needed to understand music notation ; the second is learning scholars' views on music from antiquity to the present; the third is a sub-topic of musicology that "seeks to define processes and general principles in music". The musicological approach to theory differs from music analysis "in that it takes as its starting-point not the individual work or performance but the fundamental materials from which it is built."

<span class="mw-page-title-main">Timbre</span> Quality of a musical note or sound or tone

In music, timbre, also known as tone color or tone quality, is the perceived sound quality of a musical note, sound or tone. Timbre distinguishes different types of sound production, such as choir voices and musical instruments. It also enables listeners to distinguish different instruments in the same category.

<span class="mw-page-title-main">Pitch (music)</span> Perceptual property in music ordering sounds from low to high

Pitch is a perceptual property of sounds that allows their ordering on a frequency-related scale, or more commonly, pitch is the quality that makes it possible to judge sounds as "higher" and "lower" in the sense associated with musical melodies. Pitch is a major auditory attribute of musical tones, along with duration, loudness, and timbre.

Sheet music is a handwritten or printed form of musical notation that uses musical symbols to indicate the pitches, rhythms, or chords of a song or instrumental musical piece. Like its analogs – printed books or pamphlets in English, Arabic, or other languages – the medium of sheet music typically is paper. However, access to musical notation since the 1980s has included the presentation of musical notation on computer screens and the development of scorewriter computer programs that can notate a song or piece electronically, and, in some cases, "play back" the notated music using a synthesizer or virtual instruments.

Articles related to music include:

Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. Those involved in MIR may have a background in academic musicology, psychoacoustics, psychology, signal processing, informatics, machine learning, optical music recognition, computational intelligence or some combination of these.

In music, texture is how the tempo, melodic, and harmonic materials are combined in a musical composition, determining the overall quality of the sound in a piece. The texture is often described in regard to the density, or thickness, and range, or width, between lowest and highest pitches, in relative terms as well as more specifically distinguished according to the number of voices, or parts, and the relationship between these voices. For example, a thick texture contains many 'layers' of instruments. One of these layers could be a string section or another brass. The thickness also is changed by the amount and the richness of the instruments playing the piece. The thickness varies from light to thick. A piece's texture may be changed by the number and character of parts playing at once, the timbre of the instruments or voices playing these parts and the harmony, tempo, and rhythms used. The types categorized by number and relationship of parts are analyzed and determined through the labeling of primary textural elements: primary melody (PM), secondary melody (SM), parallel supporting melody (PSM), static support (SS), harmonic support (HS), rhythmic support (RS), and harmonic and rhythmic support (HRS).

<span class="mw-page-title-main">Accompaniment</span> Part of a musical composition

Accompaniment is the musical part which provides the rhythmic and/or harmonic support for the melody or main themes of a song or instrumental piece. There are many different styles and types of accompaniment in different genres and styles of music. In homophonic music, the main accompaniment approach used in popular music, a clear vocal melody is supported by subordinate chords. In popular music and traditional music, the accompaniment parts typically provide the "beat" for the music and outline the chord progression of the song or instrumental piece.

<span class="mw-page-title-main">Ableton Live</span> Digital audio workstation

Ableton Live, also known as Live or sometimes colloquially as "Ableton", is a digital audio workstation for macOS and Windows developed by the German company Ableton.

The numbered musical notation is a cipher notation system used in Mainland China, Taiwan, Hong Kong, and to some extent in Japan, Indonesia, Malaysia, Australia, Ireland, the United Kingdom, the United States and English-speaking Canada. It dates back to the system designed by Pierre Galin, known as Galin-Paris-Chevé system. It is also known as Ziffernsystem, meaning "number system" or "cipher system" in German.

A pitch detection algorithm (PDA) is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or oscillating signal, usually a digital recording of speech or a musical note or tone. This can be done in the time domain, the frequency domain, or both.

Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."

<span class="mw-page-title-main">Impro-Visor</span> Scorewriter

Impro-Visor is an educational tool for creating and playing a lead sheet, with a particular orientation toward representing jazz solos.

<span class="mw-page-title-main">Glossary of jazz and popular music</span> List of definitions of terms and jargon used in jazz and popular music

This is a glossary of jazz and popular music terms that are likely to be encountered in printed popular music songbooks, fake books and vocal scores, big band scores, jazz, and rock concert reviews, and album liner notes. This glossary includes terms for musical instruments, playing or singing techniques, amplifiers, effects units, sound reinforcement equipment, and recording gear and techniques which are widely used in jazz and popular music. Most of the terms are in English, but in some cases, terms from other languages are encountered.

Harmonic pitch class profiles (HPCP) is a group of features that a computer program extracts from an audio signal, based on a pitch class profile—a descriptor proposed in the context of a chord recognition system. HPCP are an enhanced pitch distribution feature that are sequences of feature vectors that, to a certain extent, describe tonality, measuring the relative intensity of each of the 12 pitch classes of the equal-tempered scale within an analysis frame. Often, the twelve pitch spelling attributes are also referred to as chroma and the HPCP features are closely related to what is called chroma features or chromagrams.

References

  1. Eric David Scheirer (October 1998): "Music Perception Systems", Massachusetts Institute of Technology Press, pp. 24.
  2. 1 2 Martin Piszczalski (January 1, 1986). "A Computational model of music transcription, PhD. Thesis". University of Michigan.{{cite web}}: CS1 maint: date and year (link)
  3. David Gerhard (October 15, 1997). "Computer music analysis". Simon Fraser University.
  4. 1 2 Martin Piszczalski & Bernard Galler (December 1, 1979). "Predicting musical pitch from component frequency ratios". Journal of the Acoustical Society of America. Archived from the original on September 4, 2013.
  5. Simon Dixon (May 16, 2001). "Automatic Extraction of Tempo and Beat from Expressive Performances" (PDF). CiteSeer.IST. Retrieved October 8, 2009.
  6. Giovanni Costantini; Renzo Perfetti; Massimiliano Todisco (September 2009). "Event based transcription system for polyphonic piano music" (PDF). Signal Processing. 89 (9): 1798–1811. doi:10.1016/j.sigpro.2009.03.024. hdl: 2108/29990 .
  7. David Gerhard (November 1, 2003). "Pitch extraction and fundamental frequency: history and current techniques" (PDF). University of Regina. Retrieved May 3, 2017.