Music information retrieval

Last updated August 02, 2024

Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. Those involved in MIR may have a background in academic musicology, psychoacoustics, psychology, signal processing, informatics, machine learning, optical music recognition, computational intelligence, or some combination of these.

Applications
Music classification
Recommender systems
Music source separation and instrument recognition
Automatic music transcription
Music generation
Methods used
Data source
Feature representation
Statistics and machine learning
Other issues
Academic activity
See also
References
External links
Example MIR applications

Applications

Music information retrieval is being used by businesses and academics to categorize, manipulate and even create music.

Music classification

One of the classical MIR research topics is genre classification, which is categorizing music items into one of the pre-defined genres such as classical, jazz, rock, etc. Mood classification, artist classification, instrument identification, and music tagging are also popular topics.

Recommender systems

Several recommender systems for music already exist, but surprisingly few are based upon MIR techniques, instead of making use of similarity between users or laborious data compilation. Pandora, for example, uses experts to tag the music with particular qualities such as "female singer" or "strong bassline". Many other systems find users whose listening history is similar and suggests unheard music to the users from their respective collections. MIR techniques for similarity in music are now beginning to form part of such systems.

Music source separation and instrument recognition

Music source separation is about separating original signals from a mixture audio signal. Instrument recognition is about identifying the instruments involved in music. Various MIR systems have been developed that can separate music into its component tracks without access to the master copy. In this way, for example, karaoke tracks can be created from normal music tracks, though the process is not yet perfect owing to vocals occupying some of the same frequency space as the other instruments.

Automatic music transcription

Automatic music transcription is the process of converting an audio recording into symbolic notation, such as a score or a MIDI file.^[1] This process involves several audio analysis tasks, which may include multi-pitch detection, onset detection, duration estimation, instrument identification, and the extraction of harmonic, rhythmic or melodic information. This task becomes more difficult with greater numbers of instruments and a greater polyphony level.

Music generation

The automatic generation of music is a goal held by many MIR researchers. Attempts have been made with limited success in terms of human appreciation of the results.

Methods used

Data source

Scores give a clear and logical description of music from which to work, but access to sheet music, whether digital or otherwise, is often impractical. MIDI music has also been used for similar reasons, but some data is lost in the conversion to MIDI from any other format, unless the music was written with the MIDI standards in mind, which is rare. Digital audio formats such as WAV, mp3, and ogg are used when the audio itself is part of the analysis. Lossy formats such as mp3 and ogg work well with the human ear but may be missing crucial data for study. Additionally some encodings create artifacts which could be misleading to any automatic analyser. Despite this the ubiquity of the mp3 has meant much research in the field involves these as the source material. Increasingly, metadata mined from the web is incorporated in MIR for a more rounded understanding of the music within its cultural context, and this recently consists of analysis of social tags for music.

Feature representation

Analysis can often require some summarising,^[2] and for music (as with many other forms of data) this is achieved by feature extraction, especially when the audio content itself is analysed and machine learning is to be applied. The purpose is to reduce the sheer quantity of data down to a manageable set of values so that learning can be performed within a reasonable time-frame. One common feature extracted is the Mel-Frequency Cepstral Coefficient (MFCC) which is a measure of the timbre of a piece of music. Other features may be employed to represent the key, chords, harmonies, melody, main pitch, beats per minute or rhythm in the piece. There are a number of available audio feature extraction tools^[3] Available here

Statistics and machine learning

Computational methods for classification, clustering, and modelling — musical feature extraction for mono- and polyphonic music, similarity and pattern matching, retrieval
Formal methods and databases — applications of automated music identification and recognition, such as score following, automatic accompaniment, routing and filtering for music and music queries, query languages, standards and other metadata or protocols for music information handling and retrieval, multi-agent systems, distributed search)
Software for music information retrieval — Semantic Web and musical digital objects, intelligent agents, collaborative software, web-based search and semantic retrieval, query by humming / Search by sound, acoustic fingerprinting
Music analysis and knowledge representation — automatic summarization, citing, excerpting, downgrading, transformation, formal models of music, digital scores and representations, music indexing and metadata.

Other issues

Human-computer interaction and interfaces — multi-modal interfaces, user interfaces and usability, mobile applications, user behavior
Music perception, cognition, affect, and emotions — music similarity metrics, syntactical parameters, semantic parameters, musical forms, structures, styles and music annotation methodologies
Music archives, libraries, and digital collections — music digital libraries, public access to musical archives, benchmarks and research databases
Intellectual property rights and music — national and international copyright issues, digital rights management, identification and traceability
Sociology and Economy of music — music industry and use of MIR in the production, distribution, consumption chain, user profiling, validation, user needs and expectations, evaluation of music IR systems, building test collections, experimental design and metrics

Academic activity

International Society for Music Information Retrieval (ISMIR) conference is the top-tier venue for music information retrieval research.
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) is also a highly relevant venue.

Related Research Articles

Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. Typically, this involves processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction.

Content-based image retrieval, also known as query by image content and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases. Content-based image retrieval is opposed to traditional concept-based approaches.

Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

In music, transcription is the practice of notating a piece or a sound which was previously unnotated and/or unpopular as a written music, for example, a jazz improvisation or a video game soundtrack. When a musician is tasked with creating sheet music from a recording and they write down the notes that make up the piece in music notation, it is said that they created a musical transcription of that recording. Transcription may also mean rewriting a piece of music, either solo or ensemble, for another instrument or other instruments than which it was originally intended. The Beethoven Symphonies transcribed for solo piano by Franz Liszt are an example. Transcription in this sense is sometimes called arrangement, although strictly speaking transcriptions are faithful adaptations, whereas arrangements change significant aspects of the original piece.

A video search engine is a web-based search engine which crawls the web for video content. Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers. Some engines also allow users to search by video format type and by length of the clip. The video search results are usually accompanied by a thumbnail view of the video.

Optical music recognition (OMR) is a field of research that investigates how to computationally read musical notation in documents. The goal of OMR is to teach the computer to read and interpret sheet music and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g. MIDI and MusicXML . In the past it has, misleadingly, also been called "music optical character recognition". Due to significant differences, this term should no longer be used.

Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."

The National Centre for Text Mining (NaCTeM) is a publicly funded text mining (TM) centre. It was established to provide support, advice and information on TM technologies and to disseminate information within the larger TM community, while also providing services and tools in response to the requirements of the United Kingdom academic community.

Audio mining is a technique by which the content of an audio signal can be automatically analyzed and searched. It is most commonly used in the field of automatic speech recognition, where the analysis tries to identify any speech within the audio. The term ‘audio mining’ is sometimes used interchangeably with audio indexing, phonetic searching, phonetic indexing, speech indexing, audio analytics, speech analytics, word spotting, and information retrieval. Audio indexing, however, is mostly used to describe the pre-process of audio mining, in which the audio file is broken down into a searchable index of words.

Music informatics is a study of music processing, in particular music representations, fourier analysis of music, music synchronization, music structure analysis and chord recognition. Other music informatics research topics include computational music modeling, computational music analysis, optical music recognition, digital audio editors, online music search engines, music information retrieval and cognitive issues in music. Because music informatics is an emerging discipline, it is a very dynamic area of research with many diverse viewpoints, whose future is yet to be determined.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

Semantic audio is the extraction of meaning from audio signals. The field of semantic audio is primarily based around the analysis of audio to create some meaningful metadata, which can then be used in a variety of different ways.

Harmonic pitch class profiles (HPCP) is a group of features that a computer program extracts from an audio signal, based on a pitch class profile—a descriptor proposed in the context of a chord recognition system. HPCP are an enhanced pitch distribution feature that are sequences of feature vectors that, to a certain extent, describe tonality, measuring the relative intensity of each of the 12 pitch classes of the equal-tempered scale within an analysis frame. Often, the twelve pitch spelling attributes are also referred to as chroma and the HPCP features are closely related to what is called chroma features or chromagrams.

Sound and music computing (SMC) is a research field that studies the whole sound and music communication chain from a multidisciplinary point of view. By combining scientific, technological and artistic methodologies it aims at understanding, modeling and generating sound and music through computational approaches.

<span class="mw-page-title-main">International Society for Music Information Retrieval</span>

The International Society for Music Information Retrieval (ISMIR) is an international forum for research on the organization of music-related data. It started as an informal group steered by an ad hoc committee in 2000 which established a yearly symposium - whence "ISMIR", which meant International Symposium on Music Information Retrieval. It was turned into a conference in 2002 while retaining the acronym. ISMIR was incorporated in Canada on July 4, 2008.

Multimedia information retrieval is a research discipline of computer science that aims at extracting semantic information from multimedia data sources. Data sources include directly perceivable media such as audio, image and video, indirectly perceivable sources such as text, semantic descriptions, biosignals as well as not perceivable sources such as bioinformation, stock prices, etc. The methodology of MMIR can be organized in three groups:

Methods for the summarization of media content. The result of feature extraction is a description.
Methods for the filtering of media descriptions
Methods for the categorization of media descriptions into classes.

Songs2See is an application for music learning, practice and gaming developed by the Fraunhofer Institute for Digital Media Technology in Ilmenau - Germany and distributed by the company Songquito UG (haftungsbeschränkt).

The following outline is provided as an overview of and topical guide to natural-language processing:

<span class="mw-page-title-main">ScoreCloud</span> Scorewriter

ScoreCloud is a software service and web application for creating, storing, and sharing music notation, created by Doremir for macOS, Microsoft Windows, iPhone and iPad.

References

↑ A. Klapuri and M. Davy, editors. Signal Processing Methods for Music Transcription. Springer-Verlag, New York, 2006.
↑ Eidenberger, Horst (2011). "Fundamental Media Understanding", atpress. ISBN 978-3-8423-7917-6.
↑ David Moffat, David Ronan, and Joshua D Reiss. "An Evaluation of Audio Feature Extraction Toolboxes". In Proceedings of the International Conference on Digital Audio Effects (DAFx), 2016.

Michael Fingerhut (2004). "Music Information Retrieval, or how to search for (and maybe find) music and do away with incipits", IAML-IASA Congress, Oslo (Norway), August 8–13, 2004.

External links

Example MIR applications

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] A. Klapuri and M. Davy, editors. Signal Processing Methods for Music Transcription. Springer-Verlag, New York, 2006.

[2] Eidenberger, Horst (2011). "Fundamental Media Understanding", atpress. ISBN 978-3-8423-7917-6.

[3] David Moffat, David Ronan, and Joshua D Reiss. "An Evaluation of Audio Feature Extraction Toolboxes". In Proceedings of the International Conference on Digital Audio Effects (DAFx), 2016.

[1]

[2]

[3]