Compressed sensing in speech signals

Last updated September 19, 2023

In communications technology, the technique of compressed sensing (CS) may be applied to the processing of speech signals under certain conditions. In particular, CS can be used to reconstruct a sparse vector from a smaller number of measurements, provided the signal can be represented in sparse domain. "Sparse domain" refers to a domain in which only a few measurements have non-zero values.^[1]

Theory

Suppose a signal ${x\in R^{N}}$ can be represented in a domain where only ${\it {M}}$ coefficients out of ${\it {N}}$ (where ${M\ll N}$ ) are non-zero, then the signal is said to be sparse in that domain. This reconstructed sparse vector can be used to construct back the original signal if the sparse domain of signal is known. CS can be applied to speech signal only if sparse domain of speech signal is known.

Consider a speech signal ${x}$ , which can be represented in a domain ${\Psi }$ such that ${x}={\Psi {\boldsymbol {\alpha }}}$ , where speech signal ${x\in R^{\it {N}}}$ , dictionary matrix ${\Psi \in R^{\it {N\times N}}}$ and the sparse coefficient vector ${{\boldsymbol {\alpha }}\in R^{\it {N}}}$ . This speech signal is said to be sparse in domain ${\Psi }$ , if the number of significant (non zero) coefficients in sparse vector ${\boldsymbol {\alpha }}$ is ${\it {K}}$ , where ${\it {K\ll N}}$ .

The observed signal ${x}$ is of dimension ${\it {N\times 1}}$ . To reduce the complexity for solving ${\boldsymbol {\alpha }}$ using CS speech signal is observed using a measurement matrix ${\Phi }$ such that

{y=\Phi x}

(1)

where ${y\in R^{\it {M}}}$ , and measurement matrix ${\Phi \in R^{\it {M\times N}}}$ such that ${\it {M\ll N}}$ .

Sparse decomposition problem for eq. 1 can be solved as standard ${l_{1}}$ minimization^[2] as

{{\boldsymbol {\hat {\mathbf {\boldsymbol {\alpha }} }}}={\mbox{minimize}}\;\Vert \mathbf {\boldsymbol {\alpha }} \Vert _{1}\;\;\;\;{\mbox{s.t.}}\;\;\;\;\mathbf {y} =\mathbf {\Phi x} =\mathbf {\Phi \Psi } \mathbf {\boldsymbol {\alpha }} =\mathbf {A{\boldsymbol {\alpha }}} ,\;{\mbox{where}}\;\;\mathbf {A} =\mathbf {\Phi \Psi } }

(2)

If measurement matrix ${\Phi }$ satisfies the restricted isometric property (RIP) and is incoherent with dictionary matrix ${\Psi }$ .^[3] then the reconstructed signal is much closer to the original speech signal.

Different types of measurement matrices like random matrices can be used for speech signals.^[4]^[5] Estimating the sparsity of a speech signal is a problem since the speech signal varies greatly over time and thus sparsity of speech signal also varies highly over time. If sparsity of speech signal can be calculated over time without much complexity that will be best. If this is not possible then worst-case scenario for sparsity can be considered for a given speech signal.

Sparse vector ( ${\hat {\boldsymbol {\alpha }}}$ ) for a given speech signal is reconstructed from as small as possible a number of measurements ( ${y}$ ) using ${l_{1}}$ minimization.^[2] Then original speech signal is reconstructed form the calculated sparse vector ${\hat {\boldsymbol {\alpha }}}$ using the fixed dictionary matrix as ${\Psi }$ as ${\hat {x}}$ = ${\Psi }$ ${\hat {\boldsymbol {\alpha }}}$ .^[6]

Estimation of both the dictionary matrix and sparse vector from random measurements only has been done iterative ly.^[7] The speech signal reconstructed from estimated sparse vector and dictionary matrix is much closer to the original signal. Some more iterative approaches to calculate both dictionary matrix and speech signal from just random measurements of speech signal have been developed.^[8]

Applications

The application of structured sparsity for joint speech localization-separation in reverberant acoustics has been investigated for multiparty speech recognition.^[9] Further applications of the concept of sparsity are yet to be studied in the field of speech processing. The idea behind applying CS to speech signals is to formulate algorithms or methods that use only those random measurements ( ${y})$ ) to carry out various forms of application-based processing such as speaker recognition and speech enhancement.^[10]

Related Research Articles

Bra–ket notation, also called Dirac notation, is a notation for linear algebra and linear operators on complex vector spaces together with their dual space both in the finite-dimensional and infinite-dimensional case. It is specifically designed to ease the types of calculations that frequently come up in quantum mechanics. Its use in quantum mechanics is quite widespread.

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices which are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In particle physics, the Dirac equation is a relativistic wave equation derived by British physicist Paul Dirac in 1928. In its free form, or including electromagnetic interactions, it describes all spin-1⁄2 massive particles, called "Dirac particles", such as electrons and quarks for which parity is a symmetry. It is consistent with both the principles of quantum mechanics and the theory of special relativity, and was the first theory to account fully for special relativity in the context of quantum mechanics. It was validated by accounting for the fine structure of the hydrogen spectrum in a completely rigorous way.

<span class="mw-page-title-main">Wave function</span> Mathematical description of the quantum state of a system

In quantum physics, a wave function, represented by the Greek letter $Ψ$ , is a mathematical description of the quantum state of an isolated quantum system. In the Copenhagen interpretation of quantum mechanics, the wave function is a complex-valued probability amplitude; the probabilities for the possible results of the measurements made on a measured system can be derived from the wave function.

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space $under the operation of composition.$

In physics, the S-matrix or scattering matrix relates the initial state and the final state of a physical system undergoing a scattering process. It is used in quantum mechanics, scattering theory and quantum field theory (QFT).

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted $, is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD) . Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.$

In quantum mechanics, a two-state system is a quantum system that can exist in any quantum superposition of two independent quantum states. The Hilbert space describing such a system is two-dimensional. Therefore, a complete basis spanning the space will consist of two independent states. Any two-state system can also be seen as a qubit.

In quantum computing, quantum finite automata (QFA) or quantum state machines are a quantum analog of probabilistic automata or a Markov decision process. They provide a mathematical abstraction of real-world quantum computers. Several types of automata may be defined, including measure-once and measure-many automata. Quantum finite automata can also be understood as the quantization of subshifts of finite type, or as a quantization of Markov chains. QFAs are, in turn, special cases of geometric finite automata or topological finite automata.

In statistics, the inverse Wishart distribution, also called the inverted Wishart distribution, is a probability distribution defined on real-valued positive-definite matrices. In Bayesian statistics it is used as the conjugate prior for the covariance matrix of a multivariate normal distribution.

Compressed sensing is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the Nyquist–Shannon sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity, which requires the signal to be sparse in some domain. The second one is incoherence, which is applied through the isometric property, which is sufficient for sparse signals.

The method of iteratively reweighted least squares (IRLS) is used to solve certain optimization problems with objective functions of the form of a p-norm:

In linear algebra, the restricted isometry property (RIP) characterizes matrices which are nearly orthonormal, at least when operating on sparse vectors. The concept was introduced by Emmanuel Candès and Terence Tao and is used to prove many theorems in the field of compressed sensing. There are no known large matrices with bounded restricted isometry constants, but many random matrices have been shown to remain bounded. In particular, it has been shown that with exponentially high probability, random Gaussian, Bernoulli, and partial Fourier matrices satisfy the RIP with number of measurements nearly linear in the sparsity level. The current smallest upper bounds for any large rectangular matrices are for those of Gaussian matrices. Web forms to evaluate bounds for the Gaussian ensemble are available at the Edinburgh Compressed Sensing RIC page.

<span class="mw-page-title-main">Logit-normal distribution</span>

In probability theory, a logit-normal distribution is a probability distribution of a random variable whose logit has a normal distribution. If Y is a random variable with a normal distribution, and t is the standard logistic function, then X = t(Y) has a logit-normal distribution; likewise, if X is logit-normally distributed, then Y = logit(X)= log (X/(1-X)) is normally distributed. It is also known as the logistic normal distribution, which often refers to a multinomial logit version (e.g.).

In quantum physics, a quantum state is a mathematical entity that embodies the knowledge of a quantum system. Quantum mechanics specifies the construction, evolution, and measurement of a quantum state. The result is a quantum mechanical prediction for the system represented by the state. Knowledge of the quantum state and the quantum mechanical rules for the system's evolution in time, exhausts all that can be known about a quantum system.

This is a glossary for the terminology often encountered in undergraduate quantum mechanics courses.

In relativistic quantum mechanics and quantum field theory, the Bargmann–Wigner equations describe free particles with non-zero mass and arbitrary spin $j$ , an integer for bosons or half-integer for fermions. The solutions to the equations are wavefunctions, mathematically in the form of multi-component spinor fields.

In theoretical particle physics, the gluon field strength tensor is a second order tensor field characterizing the gluon interaction between quarks.

In financial mathematics and stochastic optimization, the concept of risk measure is used to quantify the risk involved in a random outcome or risk position. Many risk measures have hitherto been proposed, each having certain characteristics. The entropic value at risk (EVaR) is a coherent risk measure introduced by Ahmadi-Javid, which is an upper bound for the value at risk (VaR) and the conditional value at risk (CVaR), obtained from the Chernoff inequality. The EVaR can also be represented by using the concept of relative entropy. Because of its connection with the VaR and the relative entropy, this risk measure is called "entropic value at risk". The EVaR was developed to tackle some computational inefficiencies of the CVaR. Getting inspiration from the dual representation of the EVaR, Ahmadi-Javid developed a wide class of coherent risk measures, called g-entropic risk measures. Both the CVaR and the EVaR are members of this class.

In theoretical particle physics, the gluon field is a four-vector field characterizing the propagation of gluons in the strong interaction between quarks. It plays the same role in quantum chromodynamics as the electromagnetic four-potential in quantum electrodynamics – the gluon field constructs the gluon field strength tensor.

References

↑ Vidyasagar, M. (2019-12-03). An Introduction to Compressed Sensing. SIAM. ISBN 978-1-61197-612-0.
1 2 Donoho D. (2006). "Compressed sensing". IEEE Transactions on Information Theory. 52 (4): 1289–1306. CiteSeerX 10.1.1.212.6447 . doi:10.1109/TIT.2006.871582. PMID 17969013. S2CID 206737254.
↑ Candes E.; Romberg J.; Tao T. (2006). "Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information" (PDF). IEEE Transactions on Information Theory. 52 (2): 489. arXiv: math/0409186 . doi:10.1109/TIT.2005.862083. S2CID 7033413.
↑ Zhang G.; Jiao S.; Xu X.; Wang L. (2010). "Compressed sensing and reconstruction with bernoulli matrices". The 2010 IEEE International Conference on Information and Automation. pp. 455–460. doi:10.1109/ICINFA.2010.5512379. ISBN 978-1-4244-5701-4. S2CID 15886491.
↑ Li K.; Ling C.; Gan L. (2011). "Deterministic compressed-sensing matrices: Where Toeplitz meets Golay". 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 3748–3751. doi:10.1109/ICASSP.2011.5947166. ISBN 978-1-4577-0538-0. S2CID 12289159.
↑ Christensen M.; Stergaard J.; Jensen S. (2009). "On compressed sensing and its application to speech and audio signals". 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers. pp. 356–360. doi:10.1109/ACSSC.2009.5469828. ISBN 978-1-4244-5825-7. S2CID 15151303.
↑ Raj C. S.; Sreenivas T. V. (2011). "Time-varying signal adaptive transform and IHT recovery of compressive sensed speech". Interspeech: 73–76. doi:10.21437/Interspeech.2011-19. S2CID 35813887.
↑ Chetupally S.R.; Sreenivas T.V. (2012). "Joint pitch-analysis formant-synthesis framework for CS recovery of speech". Interspeech: 946–949.
↑ Asaei A.; Bourlard H.; Cevher V. (2011). "Model-based Compressive Sensing for Multiparty Distant Speech Recognition". ICASSP: 4600–4603.
↑ Abrol Vinayak; Sharma Pulkit (2013). "Speech Enhancement Using Compressed Sensing". 2013 Conference Record of 14th Interspeech: 3274–3278. doi:10.21437/Interspeech.2013-725.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Vidyasagar, M. (2019-12-03). An Introduction to Compressed Sensing. SIAM. ISBN 978-1-61197-612-0.

[Donoho-2] 1 2 Donoho D. (2006). "Compressed sensing". IEEE Transactions on Information Theory. 52 (4): 1289–1306. CiteSeerX 10.1.1.212.6447 . doi:10.1109/TIT.2006.871582. PMID 17969013. S2CID 206737254.

[3] Candes E.; Romberg J.; Tao T. (2006). "Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information" (PDF). IEEE Transactions on Information Theory. 52 (2): 489. arXiv: math/0409186 . doi:10.1109/TIT.2005.862083. S2CID 7033413.

[4] Zhang G.; Jiao S.; Xu X.; Wang L. (2010). "Compressed sensing and reconstruction with bernoulli matrices". The 2010 IEEE International Conference on Information and Automation. pp. 455–460. doi:10.1109/ICINFA.2010.5512379. ISBN 978-1-4244-5701-4. S2CID 15886491.

[5] Li K.; Ling C.; Gan L. (2011). "Deterministic compressed-sensing matrices: Where Toeplitz meets Golay". 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 3748–3751. doi:10.1109/ICASSP.2011.5947166. ISBN 978-1-4577-0538-0. S2CID 12289159.

[6] Christensen M.; Stergaard J.; Jensen S. (2009). "On compressed sensing and its application to speech and audio signals". 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers. pp. 356–360. doi:10.1109/ACSSC.2009.5469828. ISBN 978-1-4244-5825-7. S2CID 15151303.

[7] Raj C. S.; Sreenivas T. V. (2011). "Time-varying signal adaptive transform and IHT recovery of compressive sensed speech". Interspeech: 73–76. doi:10.21437/Interspeech.2011-19. S2CID 35813887.

[8] Chetupally S.R.; Sreenivas T.V. (2012). "Joint pitch-analysis formant-synthesis framework for CS recovery of speech". Interspeech: 946–949.

[9] Asaei A.; Bourlard H.; Cevher V. (2011). "Model-based Compressive Sensing for Multiparty Distant Speech Recognition". ICASSP: 4600–4603.

[10] Abrol Vinayak; Sharma Pulkit (2013). "Speech Enhancement Using Compressed Sensing". 2013 Conference Record of 14th Interspeech: 3274–3278. doi:10.21437/Interspeech.2013-725.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Compressed sensing in speech signals

Contents

Theory

Applications

Related Research Articles

References