Tunebot

Last updated

Tunebot is a music search engine developed by the Interactive Audio Lab at Northwestern University. Users can search the database by humming or singing a melody into a microphone, playing the melody on a virtual keyboard, or by typing some of the lyrics. This allows users to finally identify that song that was stuck in their head.

Contents

Searching techniques

Tunebot is a query by humming system. It compares a sung query to a database of musical themes by using the intervals between each note. This allows a user to sing in a different key than the target recording and still produce a match. The intervals are also unquantized to allow for other tunings besides the standard A=440Hz, since not many people in the world have perfect pitch.

In addition to note intervals, Tunebot compares a query with potential targets by using rhythmic ratios between notes. Since ratios between note lengths are used, the tempo of the performance does not affect the rhythmic similarity measure.

Queries and targets are then matched by a weighted string alignment algorithm between the note intervals and rhythmic ratios.

Database

The database consists of unaccompanied melodies sung by contributors (a cappella). Contributors log into the website and sing their examples to the system. Each of these recordings is associated with a corresponding song on Amazon. A sung query is compared to these examples. A cappella sung examples are used as search keys because it is much easier to compare one unaccompanied vocal (the sung query) to another (an example search key) than it is to compare an unaccompanied vocal to a full band recording, which may contain guitar, drums, other singers, sound effects, etc.

Distinguishing features

Tunebot learns from user input, and it improve its results as each user submits more queries. Since no human can sing perfectly in tune every time they sing, the search engine must take that into account. By choosing a song from a list of ranked results, users tell Tunebot which song was correct. Tunebot then pairs that song with the user's query, analyzes the differences, and runs a genetic algorithm. This process tweaks the parameters that control how the system compares the user's query to the targets. For instance, if a user has no sense of rhythm, that factor of the comparison is lowered for future queries.

Related Research Articles

Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

A search engine is an information retrieval system designed to help find information stored on a computer system. It is an information retrieval software program that discovers, crawls, transforms, and stores information for retrieval and presentation in response to user queries. The search results are usually presented in a list and are commonly called hits. A search engine normally consists of four components, as follows: a search interface, a crawler, an indexer, and a database. The crawler traverses a document collection, deconstructs document text, and assigns surrogates for storage in the search engine index. Online search engines store images, link data and metadata for the document as well.

<span class="mw-page-title-main">Monophony</span> Musical texture

In music, monophony is the simplest of musical textures, consisting of a melody, typically sung by a single singer or played by a single instrument player without accompanying harmony or chords. Many folk songs and traditional songs are monophonic. A melody is also considered to be monophonic if a group of singers sings the same melody together at the unison or with the same melody notes duplicated at the octave. If an entire melody is played by two or more instruments or sung by a choir with a fixed interval, such as a perfect fifth, it is also said to be monophony. The musical texture of a song or musical piece is determined by assessing whether varying components are used, such as an accompaniment part or polyphonic melody lines.

Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. Those involved in MIR may have a background in academic musicology, psychoacoustics, psychology, signal processing, informatics, machine learning, optical music recognition, computational intelligence or some combination of these.

<span class="mw-page-title-main">Unison</span> Musical parts sounding at the same pitch

In music, unison is two or more musical parts that sound either the same pitch or pitches separated by intervals of one or more octaves, usually at the same time. Rhythmic unison is another term for homorhythm.

<span class="mw-page-title-main">Metasearch engine</span> ALO.Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. User queries can range from multi-sentence full descriptions of an information need to a few words.

<span class="mw-page-title-main">Content-based image retrieval</span> Method of image retrieval

Content-based image retrieval, also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases. Content-based image retrieval is opposed to traditional concept-based approaches.

Query by humming (QbH) is a music retrieval system that branches off the original classification systems of title, artist, composer, and genre. It normally applies to songs or other music with a distinct single theme or melody. The system involves taking a user-hummed melody and comparing it to an existing database. The system then returns a ranked list of music closest to the input query.

In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases.

<span class="mw-page-title-main">Transcription (music)</span>

In music, transcription is the practice of notating a piece or a sound which was previously unnotated and/or unpopular as a written music, for example, a jazz improvisation or a video game soundtrack. When a musician is tasked with creating sheet music from a recording and they write down the notes that make up the piece in music notation, it is said that they created a musical transcription of that recording. Transcription may also mean rewriting a piece of music, either solo or ensemble, for another instrument or other instruments than which it was originally intended. The Beethoven Symphonies transcribed for solo piano by Franz Liszt are an example. Transcription in this sense is sometimes called arrangement, although strictly speaking transcriptions are faithful adaptations, whereas arrangements change significant aspects of the original piece.

Musipedia is a search engine for identifying pieces of music. This can be done by whistling a theme, playing it on a virtual piano keyboard, tapping the rhythm on the computer keyboard, or entering the Parsons code. Anybody can modify the collection of melodies and enter MIDI files, bitmaps with sheet music, lyrics or some text about the piece, or the melodic contours as Parsons Code. Certain features on the site may no longer work due to reliance on flash which became defunct in 2020.

<span class="mw-page-title-main">Barbershop arranging</span>

Barbershop arranging is the art of creating arrangements of barbershop music. The Barbershop Harmony Society (BHS) and Sweet Adelines International (SAI) have prescribed rules that dictate what is an acceptable arrangement, particularly with regard to singing in competition. This makes barbershop arranging a specialist form of arranging, rarely tackled by those outside barbershop; likewise, barbershop arrangers tend to be known only for their barbershop arrangements rather than for their work in any other musical form.

Multimedia search enables information search using queries in multiple data types including text and other multimedia formats. Multimedia search can be implemented through multimodal search interfaces, i.e., interfaces that allow to submit search queries not only as textual requests, but also through other media. We can distinguish two methodologies in multimedia search:

An audio search engine is a web-based search engine which crawls the web for audio content. The information can consist of web pages, images, audio files, or another type of document. Various techniques exist for research on these engines.

<span class="mw-page-title-main">Vocal harmony</span> Style of vocal music

Vocal harmony is a style of vocal music in which a consonant note or notes are simultaneously sung as a main melody in a predominantly homophonic texture. Vocal harmonies are used in many subgenres of European art music, including Classical choral music and opera and in the popular styles from many Western cultures ranging from folk songs and musical theater pieces to rock ballads. In the simplest style of vocal harmony, the main vocal melody is supported by a single backup vocal line, either at a pitch which is above or below the main vocal line, often in thirds or sixths which fit in with the chord progression used in the song. In more complex vocal harmony arrangements, different backup singers may sing two or even three other notes at the same time as each of the main melody notes, mostly with a consonant, pleasing-sounding thirds, sixths, and fifths.

Karaoke Callout is a karaoke dueling game developed by David A. Shamma and Bryan Pardo at the Interactive Audio Lab at Northwestern University. It is an example of a game with a purpose. Its purpose is to help train the Tunebot database by providing the system with more query to target matches.

Doreso, is an automatic content recognition (ACR) company specialized in music discovery and social TV recognition service for the second screen. Their sound-to-sound music search engine allows users to obtain more detailed information about music and songs by singing, humming or by recording original music.

Search by sound is the retrieval of information based on audio input. There are a handful of applications, specifically for mobile devices that utilize search by sound. Shazam (service), Soundhound, Axwave, ACRCloud and others have seen considerable success by using a simple algorithm to match an acoustic fingerprint to a song in a library. These applications take a sample clip of a song, or a user-generated melody and check a music library/music database to see where the clip matches with the song. From there, song information will be queried and displayed to the user.

Evaluation measures for an information retrieval (IR) system assess how well an index, search engine or database returns results from a collection of resources that satisfy a user's query. They are therefore fundamental to the success of information systems and digital platforms. The success of an IR system may be judged by a range of criteria including relevance, speed, user satisfaction, usability, efficiency and reliability. However, the most important factor in determining a system's effectiveness for users is the overall relevance of results retrieved in response to a query. Evaluation measures may be categorised in various ways including offline or online, user-based or system-based and include methods such as observed user behaviour, test collections, precision and recall, and scores from prepared benchmark test sets.

References