Fujisaki model

Last updated July 29, 2021

The Fujisaki model is a superpositional model for representing F₀ contour of speech.

According to the model, F₀ contour is generated as a result of the superposition of the outputs of two second order linear filters with a base frequency value. The second order linear filters are for generating the phrase and accent components of speech. The base frequency is the minimum frequency value of the speaker. In other words, F₀ contour is obtained by adding base frequency, phrase components and accent components. The model was proposed by Hiroya Fujisaki.

$\ln(F_{0}(t))=\ln(F_{b})+\sum _{i=1}^{I}A_{pi}G_{pi}(t-T_{0i})+\sum _{j=1}^{J}A_{aj}\{G_{aj}(t-T_{1j})-G_{aj}(t-T_{2j})\}$
where
$G_{pi}(t)=\alpha _{i}^{2}t\,\exp(-\alpha _{i}t)\quad \forall t\geq 0;=0\forall t\leq 0$
$G_{ai}(t)=\min[\gamma _{j},\,1-(1+\beta _{j}t)\,\exp(-\beta _{j}t)]\quad \forall t\geq 0;=0\forall t\leq 0$

Where,

$F_{b}$ : bias level upon which all the phrase and accent components are superposed to form an $F_{0}$ contour,

$I$ : number of phrase commands,

$J$ : number of accent commands,

$A_{pi}$ : magnitude of the ith phrase command,

$A_{aj}$ : amplitude of the jth accent command,

$T_{0i}$ : instant of occurrence of the ith phrase command,

$T_{1j}$ : onset of the jth accent command,

$T_{2j}$ : end of the jth accent command,

$\alpha _{i}$ : natural angular frequency of the phrase control mechanism to the ith phrase command,

$\beta _{j}$ : natural angular frequency of the accent control mechanism to the jth accent command, and

$\gamma _{j}$ : ceiling level of the accent component for the jth accent command.

Related Research Articles

In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parameterized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution. The generalization to multiple variables is called a Dirichlet distribution.

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distribution. There are two different parameterizations in common use:

With a shape parameter k and a scale parameter θ.
With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.

Gumbel distribution Particular case of the generalized extreme value distribution

In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions.

Logistic distribution continuous probability distribution

In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It resembles the normal distribution in shape but has heavier tails. The logistic distribution is a special case of the Tukey lambda distribution.

The step response of a system in a given initial state consists of the time evolution of its outputs when its control inputs are Heaviside step functions. In electronic engineering and control theory, step response is the time behaviour of the outputs of a general system when its inputs change from zero to one in a very short time. The concept can be extended to the abstract mathematical notion of a dynamical system using an evolution parameter.

In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stable if its distribution is stable. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.

In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value distributions. By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. Note that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables.

In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to the gamma distribution. Perhaps the chief use of the inverse gamma distribution is in Bayesian statistics, where the distribution arises as the marginal posterior distribution for the unknown variance of a normal distribution, if an uninformative prior is used, and as an analytically tractable conjugate prior, if an informative prior is required.

The principle of detailed balance can be used in kinetic systems which are decomposed into elementary processes. It states that at equilibrium, each elementary process is in equilibrium with its reverse process.

A number of different Markov models of DNA sequence evolution have been proposed. These substitution models differ in terms of the parameters used to describe the rates at which one nucleotide replaces another during evolution. These models are frequently used in molecular phylogenetic analyses. In particular, they are used during the calculation of likelihood of a tree and they are used to estimate the evolutionary distance between sequences from the observed differences between the sequences.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst $of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.$

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability theory and statistics, the normal-gamma distribution is a bivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and precision.

Tail value at risk (TVaR), also known as tail conditional expectation (TCE) or conditional tail expectation (CTE), is a risk measure associated with the more general value at risk. It quantifies the expected value of the loss given that an event outside a given probability level has occurred.

Bilinear time–frequency distributions, or quadratic time–frequency distributions, arise in a sub-field of signal analysis and signal processing called time–frequency signal processing, and, in the statistical analysis of time series data. Such methods are used where one needs to deal with a situation where the frequency composition of a signal may be changing over time; this sub-field used to be called time–frequency signal analysis, and is now more often called time–frequency signal processing due to the progress in using these methods to a wide range of signal-processing problems.

In electrochemistry, the Butler–Volmer equation, also known as Erdey-Grúz–Volmer equation, is one of the most fundamental relationships in electrochemical kinetics. It describes how the electrical current through an electrode depends on the voltage difference between the electrode and the bulk electrolyte for a simple, unimolecular redox reaction, considering that both a cathodic and an anodic reaction occur on the same electrode:

Surface-extended X-ray absorption fine structure (SEXAFS) is the surface-sensitive equivalent of the EXAFS technique. This technique involves the illumination of the sample by high-intensity X-ray beams from a synchrotron and monitoring their photoabsorption by detecting in the intensity of Auger electrons as a function of the incident photon energy. Surface sensitivity is achieved by the interpretation of data depending on the intensity of the Auger electrons instead of looking at the relative absorption of the X-rays as in the parent method, EXAFS.

The multivariate stable distribution is a multivariate probability distribution that is a multivariate generalisation of the univariate stable distribution. The multivariate stable distribution defines linear relations between stable distribution marginals. In the same way as for the univariate case, the distribution is defined in terms of its characteristic function.

$imbalance$ is a performance-limiting issue in the design of direct conversion receivers, also known as zero intermediate frequency (IF) or homodyne receivers. Such a design translates the received radio frequency signal directly from the carrier frequency to baseband using only one mixing stage. The traditional heterodyne receiver structure needs an IF stage between the RF and baseband signals. The direct conversion receiver structure does not have an IF stage and does not need an image rejection filter. Due to the lower component count, it is easier to integrate. However, a direct-conversion RF front-end suffers from two major drawbacks: one is $imbalance$ and the other is DC offset. When designing a homodyne receiver, control of $imbalance is necessary to limit signal demodulation error.$

References

An Introduction to Text-to-Speech Synthesis^[1]
Keikichi Hirose; Hiroya Fujisaki; Mikio Yamaguchi (1984). "Synthesis by rule of voice fundamental frequency contours of spoken Japanese from linguistic information". IEEE.

↑ Dutoit, Thierry (2001). An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers. ISBN 1-4020-0369-2.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Dutoit2001-1] Dutoit, Thierry (2001). An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers. ISBN 1-4020-0369-2.

[1]