Momel

Last updated

Momel (Modelling melody) is an algorithm developed by Daniel Hirst and Robert Espesser at the CNRS Laboratoire Parole et Langage, [1] Aix-en-Provence: [1] for the analysis and synthesis of intonation patterns.

Contents

Purpose

The analysis of raw fundamental frequency curves for the study of intonation needs to take into account the fact that speakers are simultaneously producing an intonation pattern and a sequence of syllables made up of segmental phones. The actual raw fundamental frequency curves which can be analysed acoustically are the result of an interaction between these two components and this makes it difficult to compare intonation patterns when they are produced with different segmental material. Compare for example the intonation patterns on the utterances It's for papa and It's for mama.

Algorithm

The Momel algorithm attempts to solve this problem by factoring the raw curves into two components:

The quadratic spline function used to model the macromelodic component is defined by a sequence of target points, (couples <s, Hz> each pair of which is linked by two monotonic parabolic curves with the spline knot occurring (by default) at the midway point between the two targets. The first derivative of the curve thus defined is zero at each target point and the two parabolas have the same value and same derivative at the spline knot. This in fact defines the most simple mathematical function for which the curves are both continuous and smooth.

Implications

On the one hand, two utterances "For Mama!" and "For Papa!" could thus be modelled with the same target points (hence the same macromelodic component) while "For Mama?" and "For Papa?" would also have the same target points but which would probably be different from those of the first pair.

On the other hand, the utterances "For Mama!" and "For Mama?" could be modelled with the same micromelodic profile but with different target point, while "For Papa!" and "For Papa?" would also have the same micromelodic profile but which would be different from those of the first pair.

The Momel algorithm derives what its authors refer to as a phonetic representation of an intonation pattern which is neutral with respect to speech production and speech perception since while not explicitly derived from a model of either production or perception it contains sufficient information to allow it to be used as input to models of either process. The relatively theory-neutral nature of the algorithm has allowed it to be used as a first step in deriving representations such as those of the Fujisaki model (Mixdorff 1999), ToBI (Maghbouleh 1999, Wightman & al. 2000) or INTSINT (Hirst & Espesser 1993, Hirst et al. 2000).

Related Research Articles

Bézier curve Curve used in computer graphics and related fields

A Bézier curve is a parametric curve used in computer graphics and related fields. A set of discrete "control points" defines a smooth, continuous curve by means of a formula. Usually the curve is intended to approximate a real-world shape that otherwise has no mathematical representation or whose representation is unknown or too complicated. Bézier curve is named after French engineer Pierre Bézier, who used it in the 1960s for designing curves for the bodywork of Renault cars. Other uses include the design of computer fonts and animation. Bézier curves can be combined to form a Bézier spline, or generalized to higher dimensions to form Bézier surfaces. The Bézier triangle is a special case of the latter.

B-spline Spline function

In the mathematical subfield of numerical analysis, a B-spline or basis spline is a spline function that has minimal support with respect to a given degree, smoothness, and domain partition. Any spline function of given degree can be expressed as a linear combination of B-splines of that degree. Cardinal B-splines have knots that are equidistant from each other. B-splines can be used for curve-fitting and numerical differentiation of experimental data.

Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power. These activities can be viewed as two facets of the same field of application, and they have undergone substantial development over the past few decades.

Time series Sequence of data points over time

In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

<span class="mw-page-title-main">Spline (mathematics)</span> Mathematical function defined piecewise by polynomials

In mathematics, a spline is a special function defined piecewise by polynomials. In interpolating problems, spline interpolation is often preferred to polynomial interpolation because it yields similar results, even when using low degree polynomials, while avoiding Runge's phenomenon for higher degrees.

In the field of 3D computer graphics, a subdivision surface is a curved surface represented by the specification of a coarser polygon mesh and produced by a recursive algorithmic method. The curved surface, the underlying inner mesh, can be calculated from the coarse mesh, known as the control cage or outer mesh, as the functional limit of an iterative process of subdividing each polygonal face into smaller faces that better approximate the final underlying curved surface. Less commonly, a simple algorithm is used to add geometry to a mesh by subdividing the faces into smaller ones without changing the overall shape or volume.

In linguistics, prosody is concerned with elements of speech that are not individual phonetic segments but are properties of syllables and larger units of speech, including linguistic functions such as intonation, stress, and rhythm. Such elements are known as suprasegmentals.

A demosaicing algorithm is a digital image process used to reconstruct a full color image from the incomplete color samples output from an image sensor overlaid with a color filter array (CFA). It is also known as CFA interpolation or color reconstruction.

INTSINT is an acronym for INternational Transcription System for INTonation.

In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear response variable depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions.

In linguistics, intonation is variation in pitch used to indicate the speaker's attitudes and emotions, to highlight or focus an expression, to signal the illocutionary act performed by a sentence, or to regulate the flow of discourse. For example, the English question "Does Maria speak Spanish or French?" is interpreted as a yes-or-no question when it is uttered with a single rising intonation contour, but is interpreted as an alternative question when uttered with a rising contour on "Spanish" and a falling contour on "French". Although intonation is primarily a matter of pitch variation, its effects almost always work hand-in-hand with other prosodic features. Intonation is distinct from tone, the phenomenon where pitch is used to distinguish words or to mark grammatical features.

Bootstrapping is a term used in language acquisition in the field of linguistics. It refers to the idea that humans are born innately equipped with a mental faculty that forms the basis of language. It is this language faculty that allows children to effortlessly acquire language. As a process, bootstrapping can be divided into different domains, according to whether it involves semantic bootstrapping, syntactic bootstrapping, prosodic bootstrapping, or pragmatic bootstrapping.

Dysprosody, which may manifest as pseudo-foreign accent syndrome, refers to a disorder in which one or more of the prosodic functions are either compromised or eliminated.

In linguistics, speech synthesis, and music, the pitch contour of a sound is a function or curve that tracks the perceived pitch of the sound over time. Pitch contour may include multiple sounds utilizing many pitches, and can relate the frequency function at one point in time to the frequency function at a later point.

Range segmentation is the task of segmenting (dividing) a range image, an image containing depth information for each pixel, into segments (regions), so that all the points of the same surface belong to the same region, there is no overlap between different regions and the union of these regions generates the entire image.

Automatic target recognition (ATR) is the ability for an algorithm or device to recognize targets or other objects based on data obtained from sensors.

ToBI is a set of conventions for transcribing and annotating the prosody of speech. The term "ToBI" is sometimes used to refer to the conventions used for describing American English specifically, which was the first ToBI system, developed by Mary Beckman and Janet Pierrehumbert, among others. Other ToBI systems have been defined for a number of languages; for example, J-ToBI refers to the ToBI conventions for Tokyo Japanese, and an adaptation of ToBI to describe Dutch intonation was developed by Carlos Gussenhoven, and called ToDI. Another variation of ToBI, called IViE, was established in 1998 to enable comparison between several dialects of British English.

Solid Modeling Solutions is a company who has an implementation of a mathematical representation of many NURBS, 3D geometry, and Solid modeling technology which emerged in the 1980s and 1990s into a commercial implementation known as SMLib. This article will provide the background and history of this implementation into a commercial product line from Solid Modeling Solutions (SMS). SMS is an independent supplier of source code for a powerful suite of 3D geometry kernels. SMS provides advanced NURBS-based geometry libraries, SMLib, TSNLib, GSNLib, NLib, SDLib, VSLib, and PolyMLib, that encompass extensive definition and manipulation of NURBS curves and surfaces with the latest fully functional non-manifold topology.

Klaus J. Kohler is a German phonetician.

References

Momel automatic annotation can be performed by SPPAS