Independent component analysis

Last updated August 19, 2025

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other.^[1] ICA was invented by Jeanny Hérault and Christian Jutten in 1985.^[2] ICA is a special case of blind source separation. A common example application of ICA is the "cocktail party problem" of listening in on one person's speech in a noisy room.^[3]

Introduction
Defining component independence
Mathematical definitions
General Derivation
Identifiability
Binary ICA
Methods for blind source separation
Projection pursuit
Based on infomax
Based on maximum likelihood estimation
History and background
Applications
Availability
See also
Notes
References
External links

Introduction

ICA on four randomly mixed videos.^[4] Top row: The original source videos. Middle row: Four random mixtures used as input to the algorithm. Bottom row: The reconstructed videos.

Independent component analysis attempts to decompose a multivariate signal into independent non-Gaussian signals. As an example, sound is usually a signal that is composed of the numerical addition, at each time t, of signals from several sources. The question then is whether it is possible to separate these contributing sources from the observed total signal. When the statistical independence assumption is correct, blind ICA separation of a mixed signal gives very good results.^[5] It is also used for signals that are not supposed to be generated by mixing for analysis purposes.

A simple application of ICA is the "cocktail party problem", where the underlying speech signals are separated from a sample data consisting of people talking simultaneously in a room. Usually the problem is simplified by assuming no time delays or echoes. Note that a filtered and delayed signal is a copy of a dependent component, and thus the statistical independence assumption is not violated.

Mixing weights for constructing the ${\textstyle M}$ observed signals from the ${\textstyle N}$ components can be placed in an ${\textstyle M\times N}$ matrix. An important thing to consider is that if ${\textstyle N}$ sources are present, at least ${\textstyle N}$ observations (e.g. microphones if the observed signal is audio) are needed to recover the original signals. When there are an equal number of observations and source signals, the mixing matrix is square ( ${\textstyle M=N}$ ). Other cases of underdetermined ( ${\textstyle M<N}$ ) and overdetermined ( ${\textstyle M>N}$ ) have been investigated.

The success of ICA separation of mixed signals relies on two assumptions and three effects of mixing source signals. Two assumptions:

The source signals are independent of each other.
The values in each source signal have non-Gaussian distributions.

Three effects of mixing source signals:

Independence: As per assumption 1, the source signals are independent; however, their signal mixtures are not. This is because the signal mixtures share the same source signals.
Normality: According to the Central Limit Theorem, the distribution of a sum of independent random variables with finite variance tends towards a Gaussian distribution.
Loosely speaking, a sum of two independent random variables usually has a distribution that is closer to Gaussian than any of the two original variables. Here we consider the value of each signal as the random variable.
Complexity: The temporal complexity of any signal mixture is greater than that of its simplest constituent source signal.

Those principles contribute to the basic establishment of ICA. If the signals extracted from a set of mixtures are independent and have non-Gaussian distributions or have low complexity, then they must be source signals.^[6]^[7]

Another common example is image steganography, where ICA is used to embed one image within another. For instance, two grayscale images can be linearly combined to create mixed images in which the hidden content is visually imperceptible. ICA can then be used to recover the original source images from the mixtures. This technique underlies digital watermarking, which allows the embedding of ownership information into images, as well as more covert applications such as undetected information transmission. The method has even been linked to real-world cyberespionage cases. In such applications, ICA serves to unmix the data based on statistical independence, making it possible to extract hidden components that are not apparent in the observed data.

Steganographic techniques, including those potentially involving ICA-based analysis, have been used in real-world cyberespionage cases. In 2010, the FBI uncovered a Russian spy network known as the "Illegals Program" (Operation Ghost Stories), where agents used custom-built steganography tools to conceal encrypted text messages within image files shared online.^[8]

In another case, a former General Electric engineer, Xiaoqing Zheng, was convicted in 2022 for economic espionage. Zheng used steganography to exfiltrate sensitive turbine technology by embedding proprietary data within image files for transfer to entities in China.^[9]

Defining component independence

ICA finds the independent components (also called factors, latent variables or sources) by maximizing the statistical independence of the estimated components. We may choose one of many ways to define a proxy for independence, and this choice governs the form of the ICA algorithm. The two broadest definitions of independence for ICA are

Minimization of mutual information
Maximization of non-Gaussianity

The Minimization-of-Mutual information (MMI) family of ICA algorithms uses measures like Kullback-Leibler Divergence and maximum entropy. The non-Gaussianity family of ICA algorithms, motivated by the central limit theorem, uses kurtosis and negentropy.^[10]

Typical algorithms for ICA use centering (subtract the mean to create a zero mean signal), whitening (usually with the eigenvalue decomposition),^[11] and dimensionality reduction as preprocessing steps in order to simplify and reduce the complexity of the problem for the actual iterative algorithm.

Mathematical definitions

Linear independent component analysis can be divided into noiseless and noisy cases, where noiseless ICA is a special case of noisy ICA. Nonlinear ICA should be considered as a separate case.

General Derivation

In the classical ICA model, it is assumed that the observed data $\mathbf {x} _{i}\in \mathbb {R} ^{m}$ at time $t_{i}$ is generated from source signals $\mathbf {s} _{i}\in \mathbb {R} ^{m}$ via a linear transformation $\mathbf {x} _{i}=A\mathbf {s} _{i}$ , where $A$ is an unknown, invertible mixing matrix. To recover the source signals, the data is first centered (zero mean), and then whitened so that the transformed data has unit covariance. This whitening reduces the problem from estimating a general matrix $A$ to estimating an orthogonal matrix $V$ , significantly simplifying the search for independent components.

If the covariance matrix of the centered data is $\Sigma _{x}=AA^{\top }$ , then using the eigen-decomposition $\Sigma _{x}=QDQ^{\top }$ , the whitening transformation can be taken as $D^{-1/2}Q^{\top }$ . This step ensures that the recovered sources are uncorrelated and of unit variance, leaving only the task of rotating the whitened data to maximize statistical independence. This general derivation underlies many ICA algorithms and is foundational in understanding the ICA model.^[12]

Reduced Mixing Problem

Independent component analysis (ICA) addresses the problem of recovering a set of unobserved source signals $s_{i}=(s_{i1},s_{i2},\dots ,s_{im})^{T}$ from observed mixed signals $x_{i}=(x_{i1},x_{i2},\dots ,x_{im})^{T}$ , based on the linear mixing model:

$x_{i}=A\,s_{i},$

where the $A$ is an $m\times m$ invertible matrix called the mixing matrix, $s_{i}$ represents the m‑dimensional vector containing the values of the sources at time $t_{i}$ , and $x_{i}$ is the corresponding vector of observed values at time $t_{i}$ . The goal is to estimate both $A$ and the source signals $\{s_{i}\}$ solely from the observed data $\{x_{i}\}$ .

After centering, the Gram matrix is computed as: $(X^{*})^{T}X^{*}=Q\,D\,Q^{T},$ where D is a diagonal matrix with positive entries (assuming $X^{*}$ has maximum rank), and Q is an orthogonal matrix.^[11] Writing the SVD of the mixing matrix $A=U\Sigma V^{T}$ and comparing with $AA^{T}=U\Sigma ^{2}U^{T}$ the mixing A has the form $A=Q\,D^{1/2}\,V^{T}.$ So, the normalized source values satisfy $s_{i}^{*}=V\,y_{i}^{*}$ , where $y_{i}^{*}=D^{-{\tfrac {1}{2}}}Q^{T}x_{i}^{*}.$ Thus, ICA reduces to finding the orthogonal matrix $V$ . This matrix can be computed using optimization techniques via projection pursuit methods (see Projection Pursuit).^[11]

Well-known algorithms for ICA include infomax, FastICA, JADE, and kernel-independent component analysis, among others. In general, ICA cannot identify the actual number of source signals, a uniquely correct ordering of the source signals, nor the proper scaling (including sign) of the source signals.

ICA is important to blind signal separation and has many practical applications. It is closely related to (or even a special case of) the search for a factorial code of the data, i.e., a new vector-valued representation of each data vector such that it gets uniquely encoded by the resulting code vector (loss-free coding), but the code components are statistically independent.

Linear noiseless ICA

The components $x_{i}$ of the observed random vector ${\boldsymbol {x}}=(x_{1},\ldots ,x_{m})^{T}$ are generated as a sum of the independent components $s_{k}$ , $k=1,\ldots ,n$ :

$x_{i}=a_{i,1}s_{1}+\cdots +a_{i,k}s_{k}+\cdots +a_{i,n}s_{n}$

weighted by the mixing weights $a_{i,k}$ .

The same generative model can be written in vector form as ${\boldsymbol {x}}=\sum _{k=1}^{n}s_{k}{\boldsymbol {a}}_{k}$ , where the observed random vector ${\boldsymbol {x}}$ is represented by the basis vectors ${\boldsymbol {a}}_{k}=({\boldsymbol {a}}_{1,k},\ldots ,{\boldsymbol {a}}_{m,k})^{T}$ . The basis vectors ${\boldsymbol {a}}_{k}$ form the columns of the mixing matrix ${\boldsymbol {A}}=({\boldsymbol {a}}_{1},\ldots ,{\boldsymbol {a}}_{n})$ and the generative formula can be written as ${\boldsymbol {x}}={\boldsymbol {A}}{\boldsymbol {s}}$ , where ${\boldsymbol {s}}=(s_{1},\ldots ,s_{n})^{T}$ .

Given the model and realizations (samples) ${\boldsymbol {x}}_{1},\ldots ,{\boldsymbol {x}}_{N}$ of the random vector ${\boldsymbol {x}}$ , the task is to estimate both the mixing matrix ${\boldsymbol {A}}$ and the sources ${\boldsymbol {s}}$ . This is done by adaptively calculating the ${\boldsymbol {w}}$ vectors and setting up a cost function which either maximizes the non-gaussianity of the calculated $s_{k}={\boldsymbol {w}}^{T}{\boldsymbol {x}}$ or minimizes the mutual information. In some cases, a priori knowledge of the probability distributions of the sources can be used in the cost function.

The original sources ${\boldsymbol {s}}$ can be recovered by multiplying the observed signals ${\boldsymbol {x}}$ with the inverse of the mixing matrix ${\boldsymbol {W}}={\boldsymbol {A}}^{-1}$ , also known as the unmixing matrix. Here it is assumed that the mixing matrix is square ( $n=m$ ). If the number of basis vectors is greater than the dimensionality of the observed vectors, $n>m$ , the task is overcomplete but is still solvable with the pseudo inverse.

Linear noisy ICA

With the added assumption of zero-mean and uncorrelated Gaussian noise $n\sim N(0,\operatorname {diag} (\Sigma ))$ , the ICA model takes the form ${\boldsymbol {x}}={\boldsymbol {A}}{\boldsymbol {s}}+n$ .

Nonlinear ICA

The mixing of the sources does not need to be linear. Using a nonlinear mixing function $f(\cdot |\theta )$ with parameters $\theta$ the nonlinear ICA model is $x=f(s|\theta )+n$ .

Identifiability

The independent components are identifiable up to a permutation and scaling of the sources.^[13] This identifiability requires that:

At most one of the sources $s_{k}$ is Gaussian,
The number of observed mixtures, $m$ , must be at least as large as the number of estimated components $n$ : $m\geq n$ . It is equivalent to say that the mixing matrix ${\boldsymbol {A}}$ must be of full rank for its inverse to exist.

Binary ICA

A special variant of ICA is binary ICA in which both signal sources and monitors are in binary form and observations from monitors are disjunctive mixtures of binary independent sources. The problem was shown to have applications in many domains including medical diagnosis, multi-cluster assignment, network tomography and internet resource management.

Let ${x_{1},x_{2},\ldots ,x_{m}}$ be the set of binary variables from $m$ monitors and ${y_{1},y_{2},\ldots ,y_{n}}$ be the set of binary variables from $n$ sources. Source-monitor connections are represented by the (unknown) mixing matrix ${\textstyle {\boldsymbol {G}}}$ , where $g_{ij}=1$ indicates that signal from the i-th source can be observed by the j-th monitor. The system works as follows: at any time, if a source $i$ is active ( $y_{i}=1$ ) and it is connected to the monitor $j$ ( $g_{ij}=1$ ) then the monitor $j$ will observe some activity ( $x_{j}=1$ ). Formally we have:

x_{i}=\bigvee _{j=1}^{n}(g_{ij}\wedge y_{j}),i=1,2,\ldots ,m,

where $\wedge$ is Boolean AND and $\vee$ is Boolean OR. Noise is not explicitly modelled, rather, can be treated as independent sources.

The above problem can be heuristically solved^[14] by assuming variables are continuous and running FastICA on binary observation data to get the mixing matrix ${\textstyle {\boldsymbol {G}}}$ (real values), then apply round number techniques on ${\textstyle {\boldsymbol {G}}}$ to obtain the binary values. This approach has been shown to produce a highly inaccurate result.^{[ citation needed ]}

Another method is to use dynamic programming: recursively breaking the observation matrix ${\textstyle {\boldsymbol {X}}}$ into its sub-matrices and run the inference algorithm on these sub-matrices. The key observation which leads to this algorithm is the sub-matrix ${\textstyle {\boldsymbol {X}}^{0}}$ of ${\textstyle {\boldsymbol {X}}}$ where ${\textstyle x_{ij}=0,\forall j}$ corresponds to the unbiased observation matrix of hidden components that do not have connection to the $i$ -th monitor. Experimental results from^[15] show that this approach is accurate under moderate noise levels.

The Generalized Binary ICA framework^[16] introduces a broader problem formulation which does not necessitate any knowledge on the generative model. In other words, this method attempts to decompose a source into its independent components (as much as possible, and without losing any information) with no prior assumption on the way it was generated. Although this problem appears quite complex, it can be accurately solved with a branch and bound search tree algorithm or tightly upper bounded with a single multiplication of a matrix with a vector.

Methods for blind source separation

Projection pursuit

Signal mixtures tend to have Gaussian probability density functions, and source signals tend to have non-Gaussian probability density functions. Each source signal can be extracted from a set of signal mixtures by taking the inner product of a weight vector and those signal mixtures where this inner product provides an orthogonal projection of the signal mixtures. The remaining challenge is finding such a weight vector. One type of method for doing so is projection pursuit.^[17]^[18]

Projection pursuit seeks one projection at a time such that the extracted signal is as non-Gaussian as possible. This contrasts with ICA, which typically extracts M signals simultaneously from M signal mixtures, which requires estimating a M × M unmixing matrix. One practical advantage of projection pursuit over ICA is that fewer than M signals can be extracted if required, where each source signal is extracted from M signal mixtures using an M-element weight vector.

We can use kurtosis to recover the multiple source signal by finding the correct weight vectors with the use of projection pursuit.

The kurtosis of the probability density function of a signal, for a finite sample, is computed as

K={\frac {\operatorname {E} [(\mathbf {y} -\mathbf {\overline {y}} )^{4}]}{(\operatorname {E} [(\mathbf {y} -\mathbf {\overline {y}} )^{2}])^{2}}}-3

where $\mathbf {\overline {y}}$ is the sample mean of $\mathbf {y}$ , the extracted signals. The constant 3 ensures that Gaussian signals have zero kurtosis, Super-Gaussian signals have positive kurtosis, and Sub-Gaussian signals have negative kurtosis. The denominator is the variance of $\mathbf {y}$ , and ensures that the measured kurtosis takes account of signal variance. The goal of projection pursuit is to maximize the kurtosis, and make the extracted signal as non-normal as possible.

Using kurtosis as a measure of non-normality, we can now examine how the kurtosis of a signal $\mathbf {y} =\mathbf {w} ^{T}\mathbf {x}$ extracted from a set of M mixtures $\mathbf {x} =(x_{1},x_{2},\ldots ,x_{M})^{T}$ varies as the weight vector $\mathbf {w}$ is rotated around the origin. Given our assumption that each source signal $\mathbf {s}$ is super-gaussian we would expect:

the kurtosis of the extracted signal $\mathbf {y}$ to be maximal precisely when $\mathbf {y} =\mathbf {s}$ .
the kurtosis of the extracted signal $\mathbf {y}$ to be maximal when $\mathbf {w}$ is orthogonal to the projected axes $S_{1}$ or $S_{2}$ , because we know the optimal weight vector should be orthogonal to a transformed axis $S_{1}$ or $S_{2}$ .

For multiple source mixture signals, we can use kurtosis and Gram-Schmidt Orthogonalization (GSO) to recover the signals. Given M signal mixtures in an M-dimensional space, GSO project these data points onto an (M-1)-dimensional space by using the weight vector. We can guarantee the independence of the extracted signals with the use of GSO.

In order to find the correct value of $\mathbf {w}$ , we can use gradient descent method. We first of all whiten the data, and transform $\mathbf {x}$ into a new mixture $\mathbf {z}$ , which has unit variance, and $\mathbf {z} =(z_{1},z_{2},\ldots ,z_{M})^{T}$ . This process can be achieved by applying Singular value decomposition to $\mathbf {x}$ ,

\mathbf {x} =\mathbf {U} \mathbf {D} \mathbf {V} ^{T}

Rescaling each vector $U_{i}=U_{i}/\operatorname {E} (U_{i}^{2})$ , and let $\mathbf {z} =\mathbf {U}$ . The signal extracted by a weighted vector $\mathbf {w}$ is $\mathbf {y} =\mathbf {w} ^{T}\mathbf {z}$ . If the weight vector w has unit length, then the variance of y is also 1, that is $\operatorname {E} [(\mathbf {w} ^{T}\mathbf {z} )^{2}]=1$ . The kurtosis can thus be written as:

K={\frac {\operatorname {E} [\mathbf {y} ^{4}]}{(\operatorname {E} [\mathbf {y} ^{2}])^{2}}}-3=\operatorname {E} [(\mathbf {w} ^{T}\mathbf {z} )^{4}]-3.

The updating process for $\mathbf {w}$ is:

\mathbf {w} _{new}=\mathbf {w} _{old}-\eta \operatorname {E} [\mathbf {z} (\mathbf {w} _{old}^{T}\mathbf {z} )^{3}].

where $\eta$ is a small constant to guarantee that $\mathbf {w}$ converges to the optimal solution. After each update, we normalize $\mathbf {w} _{new}={\frac {\mathbf {w} _{new}}{|\mathbf {w} _{new}|}}$ , and set $\mathbf {w} _{old}=\mathbf {w} _{new}$ , and repeat the updating process until convergence. We can also use another algorithm to update the weight vector $\mathbf {w}$ .

Another approach is using negentropy ^[10]^[19] instead of kurtosis. Using negentropy is a more robust method than kurtosis, as kurtosis is very sensitive to outliers. The negentropy methods are based on an important property of Gaussian distribution: a Gaussian variable has the largest entropy among all continuous random variables of equal variance. This is also the reason why we want to find the most nongaussian variables. A simple proof can be found in Differential entropy.

J(x)=S(y)-S(x)\,

y is a Gaussian random variable of the same covariance matrix as x

S(x)=-\int p_{x}(u)\log p_{x}(u)du

An approximation for negentropy is

J(x)={\frac {1}{12}}(E(x^{3}))^{2}+{\frac {1}{48}}(kurt(x))^{2}

A proof can be found in the original papers of Comon;^[20]^[10] it has been reproduced in the book Independent Component Analysis by Aapo Hyvärinen, Juha Karhunen, and Erkki Oja ^[21] This approximation also suffers from the same problem as kurtosis (sensitivity to outliers). Other approaches have been developed.^[22]

J(y)=k_{1}(E(G_{1}(y)))^{2}+k_{2}(E(G_{2}(y))-E(G_{2}(v))^{2}

A choice of $G_{1}$ and $G_{2}$ are

G_{1}={\frac {1}{a_{1}}}\log(\cosh(a_{1}u))

and

G_{2}=-\exp(-{\frac {u^{2}}{2}})

Based on infomax

Infomax ICA^[23] is essentially a multivariate, parallel version of projection pursuit. Whereas projection pursuit extracts a series of signals one at a time from a set of M signal mixtures, ICA extracts M signals in parallel. This tends to make ICA more robust than projection pursuit.^[24]

The projection pursuit method uses Gram-Schmidt orthogonalization to ensure the independence of the extracted signal, while ICA use infomax and maximum likelihood estimate to ensure the independence of the extracted signal. The Non-Normality of the extracted signal is achieved by assigning an appropriate model, or prior, for the signal.

The process of ICA based on infomax in short is: given a set of signal mixtures $\mathbf {x}$ and a set of identical independent model cumulative distribution functions(cdfs) $g$ , we seek the unmixing matrix $\mathbf {W}$ which maximizes the joint entropy of the signals $\mathbf {Y} =g(\mathbf {y} )$ , where $\mathbf {y} =\mathbf {Wx}$ are the signals extracted by $\mathbf {W}$ . Given the optimal $\mathbf {W}$ , the signals $\mathbf {Y}$ have maximum entropy and are therefore independent, which ensures that the extracted signals $\mathbf {y} =g^{-1}(\mathbf {Y} )$ are also independent. $g$ is an invertible function, and is the signal model. Note that if the source signal model probability density function $p_{s}$ matches the probability density function of the extracted signal $p_{\mathbf {y} }$ , then maximizing the joint entropy of $Y$ also maximizes the amount of mutual information between $\mathbf {x}$ and $\mathbf {Y}$ . For this reason, using entropy to extract independent signals is known as infomax.

Consider the entropy of the vector variable $\mathbf {Y} =g(\mathbf {y} )$ , where $\mathbf {y} =\mathbf {Wx}$ is the set of signals extracted by the unmixing matrix $\mathbf {W}$ . For a finite set of values sampled from a distribution with pdf $p_{\mathbf {y} }$ , the entropy of $\mathbf {Y}$ can be estimated as:

H(\mathbf {Y} )=-{\frac {1}{N}}\sum _{t=1}^{N}\ln p_{\mathbf {Y} }(\mathbf {Y} ^{t})

The joint pdf $p_{\mathbf {Y} }$ can be shown to be related to the joint pdf $p_{\mathbf {y} }$ of the extracted signals by the multivariate form:

p_{\mathbf {Y} }(Y)={\frac {p_{\mathbf {y} }(\mathbf {y} )}{|{\frac {\partial \mathbf {Y} }{\partial \mathbf {y} }}|}}

where $\mathbf {J} ={\frac {\partial \mathbf {Y} }{\partial \mathbf {y} }}$ is the Jacobian matrix. We have $|\mathbf {J} |=g'(\mathbf {y} )$ , and $g'$ is the pdf assumed for source signals $g'=p_{s}$ , therefore,

p_{\mathbf {Y} }(Y)={\frac {p_{\mathbf {y} }(\mathbf {y} )}{|{\frac {\partial \mathbf {Y} }{\partial \mathbf {y} }}|}}={\frac {p_{\mathbf {y} }(\mathbf {y} )}{p_{\mathbf {s} }(\mathbf {y} )}}

therefore,

H(\mathbf {Y} )=-{\frac {1}{N}}\sum _{t=1}^{N}\ln {\frac {p_{\mathbf {y} }(\mathbf {y} )}{p_{\mathbf {s} }(\mathbf {y} )}}

We know that when $p_{\mathbf {y} }=p_{s}$ , $p_{\mathbf {Y} }$ is of uniform distribution, and $H({\mathbf {Y} })$ is maximized. Since

p_{\mathbf {y} }(\mathbf {y} )={\frac {p_{\mathbf {x} }(\mathbf {x} )}{|{\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}|}}={\frac {p_{\mathbf {x} }(\mathbf {x} )}{|\mathbf {W} |}}

where $|\mathbf {W} |$ is the absolute value of the determinant of the unmixing matrix $\mathbf {W}$ . Therefore,

H(\mathbf {Y} )=-{\frac {1}{N}}\sum _{t=1}^{N}\ln {\frac {p_{\mathbf {x} }(\mathbf {x} ^{t})}{|\mathbf {W} |p_{\mathbf {s} }(\mathbf {y} ^{t})}}

so,

H(\mathbf {Y} )={\frac {1}{N}}\sum _{t=1}^{N}\ln p_{\mathbf {s} }(\mathbf {y} ^{t})+\ln |\mathbf {W} |+H(\mathbf {x} )

since $H(\mathbf {x} )=-{\frac {1}{N}}\sum _{t=1}^{N}\ln p_{\mathbf {x} }(\mathbf {x} ^{t})$ , and maximizing $\mathbf {W}$ does not affect $H_{\mathbf {x} }$ , so we can maximize the function

h(\mathbf {Y} )={\frac {1}{N}}\sum _{t=1}^{N}\ln p_{\mathbf {s} }(\mathbf {y} ^{t})+\ln |\mathbf {W} |

to achieve the independence of the extracted signal.

If there are M marginal pdfs of the model joint pdf $p_{\mathbf {s} }$ are independent and use the commonly super-gaussian model pdf for the source signals $p_{\mathbf {s} }=(1-\tanh(\mathbf {s} )^{2})$ , then we have

h(\mathbf {Y} )={\frac {1}{N}}\sum _{i=1}^{M}\sum _{t=1}^{N}\ln(1-\tanh(\mathbf {w} _{i}^{\mathsf {T}}\mathbf {x} ^{t})^{2})+\ln |\mathbf {W} |

In the sum, given an observed signal mixture $\mathbf {x}$ , the corresponding set of extracted signals $\mathbf {y}$ and source signal model $p_{\mathbf {s} }=g'$ , we can find the optimal unmixing matrix $\mathbf {W}$ , and make the extracted signals independent and non-gaussian. Like the projection pursuit situation, we can use gradient descent method to find the optimal solution of the unmixing matrix.

Based on maximum likelihood estimation

Maximum likelihood estimation (MLE) is a standard statistical tool for finding parameter values (e.g. the unmixing matrix $\mathbf {W}$ ) that provide the best fit of some data (e.g., the extracted signals $y$ ) to a given a model (e.g., the assumed joint probability density function (pdf) $p_{s}$ of source signals).^[24]

The ML "model" includes a specification of a pdf, which in this case is the pdf $p_{s}$ of the unknown source signals $s$ . Using ML ICA, the objective is to find an unmixing matrix that yields extracted signals $y=\mathbf {W} x$ with a joint pdf as similar as possible to the joint pdf $p_{s}$ of the unknown source signals $s$ .

MLE is thus based on the assumption that if the model pdf $p_{s}$ and the model parameters $\mathbf {A}$ are correct then a high probability should be obtained for the data $x$ that were actually observed. Conversely, if $\mathbf {A}$ is far from the correct parameter values then a low probability of the observed data would be expected.

Using MLE, we call the probability of the observed data for a given set of model parameter values (e.g., a pdf $p_{s}$ and a matrix $\mathbf {A}$ ) the likelihood of the model parameter values given the observed data.

We define a likelihood function $\mathbf {L(W)}$ of $\mathbf {W}$ :

$\mathbf {L(W)} =p_{s}(\mathbf {W} x)|\det \mathbf {W} |.$

This equals to the probability density at $x$ , since $s=\mathbf {W} x$ .

Thus, if we wish to find a $\mathbf {W}$ that is most likely to have generated the observed mixtures $x$ from the unknown source signals $s$ with pdf $p_{s}$ then we need only find that $\mathbf {W}$ which maximizes the likelihood $\mathbf {L(W)}$ . The unmixing matrix that maximizes equation is known as the MLE of the optimal unmixing matrix.

It is common practice to use the log likelihood, because this is easier to evaluate. As the logarithm is a monotonic function, the $\mathbf {W}$ that maximizes the function $\mathbf {L(W)}$ also maximizes its logarithm $\ln \mathbf {L(W)}$ . This allows us to take the logarithm of equation above, which yields the log likelihood function

$\ln \mathbf {L(W)} =\sum _{i}\sum _{t}\ln p_{s}(w_{i}^{T}x_{t})+N\ln |\det \mathbf {W} |$

If we substitute a commonly used high-Kurtosis model pdf for the source signals $p_{s}=(1-\tanh(s)^{2})$ then we have

$\ln \mathbf {L(W)} ={1 \over N}\sum _{i}^{M}\sum _{t}^{N}\ln(1-\tanh(w_{i}^{T}x_{t})^{2})+\ln |\det \mathbf {W} |$

This matrix $\mathbf {W}$ that maximizes this function is the maximum likelihood estimation.

History and background

The early general framework for independent component analysis was introduced by Jeanny Hérault and Bernard Ans from 1984,^[25] further developed by Christian Jutten in 1985 and 1986,^[2]^[26]^[27] and refined by Pierre Comon in 1991,^[20] and popularized in his paper of 1994.^[10] In 1995, Tony Bell and Terry Sejnowski introduced a fast and efficient ICA algorithm based on infomax, a principle introduced by Ralph Linsker in 1987. A link exists between maximum-likelihood estimation and Infomax approaches.^[28] A quite comprehensive tutorial on the maximum-likelihood approach to ICA has been published by J-F. Cardoso in 1998.^[29]

There are many algorithms available in the literature which do ICA. A largely used one, including in industrial applications, is the FastICA algorithm, developed by Hyvärinen and Oja,^[30] which uses the negentropy as cost function, already proposed 7 years before by Pierre Comon in this context.^[10] Other examples are rather related to blind source separation where a more general approach is used. For example, one can drop the independence assumption and separate mutually correlated signals, thus, statistically "dependent" signals. Sepp Hochreiter and Jürgen Schmidhuber showed how to obtain non-linear ICA or source separation as a by-product of regularization (1999).^[31] Their method does not require a priori knowledge about the number of independent sources.

Applications

ICA can be extended to analyze non-physical signals. For instance, ICA has been applied to discover discussion topics on a bag of news list archives.

Some ICA applications are listed below:^[6]

image steganography ^[32]
optical Imaging of neurons^[33]
neuronal spike sorting^[34]
face recognition^[35]
modelling receptive fields of primary visual neurons^[36]
predicting stock market prices^[37]
mobile phone communications^[38]
colour based detection of the ripeness of tomatoes^[39]
removing artifacts, such as eye blinks, from EEG data.^[40]
predicting decision-making using EEG^[41]
analysis of changes in gene expression over time in single cell RNA-sequencing experiments.^[42]
studies of the resting state network of the brain.^[43]
astronomy and cosmology^[44]
finance^[45]

Availability

ICA can be applied through the following software:

SAS PROC ICA
R ICA package
scikit-learn Python implementation sklearn.decomposition.FastICA
mlpack C++ implementation of RADICAL (The Robust Accurate, Direct ICA aLgorithm (RADICAL).)

Notes

↑ "Independent Component Analysis: A Demo".
1 2 Ans, B., Hérault, J., & Jutten, C. (1985). Architectures neuromimétiques adaptatives : Détection de primitives. Cognitiva 85 (Vol. 2, pp. 593-597). Paris: CESTA.
↑ Hyvärinen, Aapo (2013). "Independent component analysis: recent advances". Philosophical Transactions: Mathematical, Physical and Engineering Sciences. 371 (1984): 20110534. Bibcode:2012RSPTA.37110534H. doi:10.1098/rsta.2011.0534. ISSN 1364-503X. JSTOR 41739975. PMC 3538438 . PMID 23277597.
↑ Isomura, Takuya; Toyoizumi, Taro (2016). "A local learning rule for independent component analysis". Scientific Reports. 6: 28073. Bibcode:2016NatSR...628073I. doi:10.1038/srep28073. PMC 4914970 . PMID 27323661.
↑ Comon, P.; Jutten C., (2010): Handbook of Blind Source Separation, Independent Component Analysis and Applications. Academic Press, Oxford UK. ISBN 978-0-12-374726-6
1 2 Stone, James V. (2004). Independent component analysis : a tutorial introduction. Cambridge, Massachusetts: MIT Press. ISBN 978-0-262-69315-8.
↑ Hyvärinen, Aapo; Karhunen, Juha; Oja, Erkki (2001). Independent component analysis (1st ed.). New York: John Wiley & Sons. ISBN 978-0-471-22131-9.
↑ "Operation Ghost Stories: Inside the Russian Spy Case". FBI.gov. Federal Bureau of Investigation. 28 June 2010.
↑ "Former GE Power Engineer Sentenced for Conspiracy to Commit Economic Espionage". Justice.gov. U.S. Department of Justice. 3 January 2022.
1 2 3 4 5 Pierre Comon (1994) Independent component analysis, a new concept? http://www.ece.ucsb.edu/wcsl/courses/ECE594/594C_F10Madhow/comon94.pdf
1 2 3 Holmes, M. (2023). Introduction to Scientific Computing and Data Analysis, 2nd Ed. Springer. ISBN 978-3-031-22429-4.
↑ Holmes, Mark (2023). Introduction to Scientific Computing and Data Analysis (2nd ed.). Springer. ISBN 978-3-031-22429-4.
↑ Theorem 11, Comon, Pierre. "Independent component analysis, a new concept?." Signal processing 36.3 (1994): 287-314.
↑ Johan Himbergand Aapo Hyvärinen, Independent Component Analysis For Binary Data: An Experimental Study , Proc. Int. Workshop on Independent Component Analysis and Blind Signal Separation (ICA2001), San Diego, California, 2001.
↑ Huy Nguyen and Rong Zheng, Binary Independent Component Analysis With or Mixtures , IEEE Transactions on Signal Processing, Vol. 59, Issue 7. (July 2011), pp. 3168–3181.
↑ Painsky, Amichai; Rosset, Saharon; Feder, Meir (2014). "Generalized binary independent component analysis". 2014 IEEE International Symposium on Information Theory. pp. 1326–1330. doi:10.1109/ISIT.2014.6875048. ISBN 978-1-4799-5186-4. S2CID 18579555.
↑ James V. Stone(2004); "Independent Component Analysis: A Tutorial Introduction", The MIT Press Cambridge, Massachusetts, London, England; ISBN 0-262-69315-1
↑ Kruskal, JB. 1969; "Toward a practical method which helps uncover the structure of a set of observations by finding the line transformation which optimizes a new "index of condensation", Pages 427–440 of: Milton, RC, & Nelder, JA (eds), Statistical computation; New York, Academic Press
↑ Hyvärinen, Aapo; Erkki Oja (2000). "Independent Component Analysis:Algorithms and Applications". Neural Networks. 4-5. 13 (4–5): 411–430. CiteSeerX 10.1.1.79.7003 . doi:10.1016/s0893-6080(00)00026-5. PMID 10946390. S2CID 11959218.
1 2 P.Comon, Independent Component Analysis, Workshop on Higher-Order Statistics, July 1991, republished in J-L. Lacoume, editor, Higher Order Statistics, pp. 29-38. Elsevier, Amsterdam, London, 1992. HAL link
↑ Hyvärinen, Aapo; Karhunen, Juha; Oja, Erkki (2001). Independent component analysis (Reprint ed.). New York, NY: Wiley. ISBN 978-0-471-40540-5.
↑ Hyvärinen, Aapo (1998). "New approximations of differential entropy for independent component analysis and projection pursuit". Advances in Neural Information Processing Systems. 10: 273–279.
↑ Bell, A. J.; Sejnowski, T. J. (1995). "An Information-Maximization Approach to Blind Separation and Blind Deconvolution", Neural Computation, 7, 1129-1159
1 2 James V. Stone (2004). "Independent Component Analysis: A Tutorial Introduction", The MIT Press Cambridge, Massachusetts, London, England; ISBN 0-262-69315-1
↑ Hérault, J.; Ans, B. (1984). "Réseau de neurones à synapses modifiables : Décodage de messages sensoriels composites par apprentissage non supervisé et permanent". Comptes Rendus de l'Académie des Sciences, Série III. 299: 525–528.
↑ Hérault, J., Jutten, C., & Ans, B. (1985). Détection de grandeurs primitives dans un message composite par une architecture de calcul neuromimétique en apprentissage non supervisé. Proceedings of the 10th Workshop Traitement du signal et ses applications (Vol. 2, pp. 1017-1022). Nice (France): GRETSI.
↑ Hérault, J., & Jutten, C. (1986). Space or time adaptive signal processing by neural networks models. Intern. Conf. on Neural Networks for Computing (pp. 206-211). Snowbird (Utah, USA).
↑ J-F.Cardoso, "Infomax and Maximum Likelihood for source separation", IEEE Sig. Proc. Letters, 1997, 4(4):112-114.
↑ J-F.Cardoso, "Blind signal separation: statistical principles", Proc. of the IEEE, 1998, 90(8):2009-2025.
↑ Hyvärinen, A.; Oja, E. (2000-06-01). "Independent component analysis: algorithms and applications" (PDF). Neural Networks. 13 (4): 411–430. doi:10.1016/S0893-6080(00)00026-5. ISSN 0893-6080. PMID 10946390. S2CID 11959218.
↑ Hochreiter, Sepp; Schmidhuber, Jürgen (1999). "Feature Extraction Through LOCOCODE" (PDF). Neural Computation. 11 (3): 679–714. doi:10.1162/089976699300016629. ISSN 0899-7667. PMID 10085426. S2CID 1642107. Archived from the original (PDF) on 2017-07-06. Retrieved 24 February 2018.
↑ Ferreira, Artur J.; Figueiredo, Mário A.T. (2006). "On the use of independent component analysis for image compression" . Signal Processing: Image Communication. 21 (5): 378–389. doi:10.1016/j.image.2006.01.002. ISSN 0923-5965.
↑ Brown, GD; Yamada, S; Sejnowski, TJ (2001). "Independent components analysis at the neural cocktail party". Trends in Neurosciences. 24 (1): 54–63. doi:10.1016/s0166-2236(00)01683-0. PMID 11163888. S2CID 511254.
↑ Lewicki, MS (1998). "Areview of methods for spike sorting: detection and classification of neural action potentials". Network: Computation in Neural Systems. 9 (4): 53–78. doi:10.1088/0954-898X_9_4_001. S2CID 10290908.
↑ Barlett, MS (2001). Face image analysis by unsupervised learning. Boston: Kluwer International Series on Engineering and Computer Science.
↑ Bell, AJ; Sejnowski, TJ (1997). "The independent components of natural scenes are edge filters". Vision Research. 37 (23): 3327–3338. doi:10.1016/s0042-6989(97)00121-1. PMC 2882863 . PMID 9425547.
↑ Back, AD; Weigend, AS (1997). "A first application of independent component analysis to extracting structure from stock returns". International Journal of Neural Systems. 8 (4): 473–484. doi:10.1142/s0129065797000458. PMID 9730022. S2CID 872703.
↑ Hyvarinen, A, Karhunen, J & Oja, E (2001a). Independent component analysis. New York: John Wiley and Sons.{{cite book}}: CS1 maint: multiple names: authors list (link)
↑ Polder, G; van der Heijen, FWAM (2003). "Estimation of compound distribution in spectral images of tomatoes using independent component analysis". In R. Leitner (ed.). Spectral Imaging. Proceedings of the International Workshop of the Carinthian Tech Research AG, Graz, Austria, 3 April 2003. Vienna, Austria: Austrian Computer Society. pp. 57–64.
↑ Delorme, A; Sejnowski, T; Makeig, S (2007). "Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis". NeuroImage. 34 (4): 1443–1449. doi:10.1016/j.neuroimage.2006.11.004. PMC 2895624 . PMID 17188898.
↑ Douglas, P (2013). "Single trial decoding of belief decision making from EEG and fMRI data using independent components features". Frontiers in Human Neuroscience. 7: 392. doi: 10.3389/fnhum.2013.00392 . PMC 3728485 . PMID 23914164.
↑ Trapnell, C; Cacchiarelli, D; Grimsby, J (2014). "The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells". Nature Biotechnology. 32 (4): 381–386. doi:10.1038/nbt.2859. PMC 4122333 . PMID 24658644.
↑ Kiviniemi, Vesa J.; Kantola, Juha-Heikki; Jauhiainen, Jukka; Hyvärinen, Aapo; Tervonen, Osmo (2003). "Independent component analysis of nondeterministic fMRI signal sources". NeuroImage. 19 (2): 253–260. doi:10.1016/S1053-8119(03)00097-1. PMID 12814576. S2CID 17110486.
↑ Wang, Jingying; Xu, Haiguang; Gu, Junhua; An, Tao; Cui, Haijuan; Li, Jianxun; Zhang, Zhongli; Zheng, Qian; Wu, Xiang-Ping (2010-11-01). "How to Identify and Separate Bright Galaxy Clusters from the Low-frequency Radio Sky?". The Astrophysical Journal. 723 (1): 620–633. arXiv: 1008.3391 . Bibcode:2010ApJ...723..620W. doi: 10.1088/0004-637X/723/1/620 . ISSN 0004-637X.
↑ Moraux, Franck; Villa, Christophe (2003). "The dynamics of the term structure of interest rates: An Independent Component Analysis". Connectionist Approaches in Economics and Management Sciences. Advances in Computational Management Science. Vol. 6. pp. 215–232. doi:10.1007/978-1-4757-3722-6_11. ISBN 978-1-4757-3722-6.

References

Comon, Pierre (1994): "Independent Component Analysis: a new concept?" Archived 2016-03-04 at the Wayback Machine , Signal Processing, 36(3):287–314 (The original paper describing the concept of ICA)
Hyvärinen, A.; Karhunen, J.; Oja, E. (2001): Independent Component Analysis , New York: Wiley, ISBN 978-0-471-40540-5 ( Introductory chapter )
Hyvärinen, A.; Oja, E. (2000): "Independent Component Analysis: Algorithms and Application", Neural Networks, 13(4-5):411-430. (Technical but pedagogical introduction).
Comon, P.; Jutten C., (2010): Handbook of Blind Source Separation, Independent Component Analysis and Applications. Academic Press, Oxford UK. ISBN 978-0-12-374726-6
Lee, T.-W. (1998): Independent component analysis: Theory and applications, Boston, Mass: Kluwer Academic Publishers, ISBN 0-7923-8261-7
Acharyya, Ranjan (2008): A New Approach for Blind Source Separation of Convolutive Sources - Wavelet Based Separation Using Shrinkage Function ISBN 3-639-07797-0 ISBN 978-3639077971 (this book focuses on unsupervised learning with Blind Source Separation)

External links

What is independent component analysis? by Aapo Hyvärinen
Independent Component Analysis: A Tutorial by Aapo Hyvärinen
A Tutorial on Independent Component Analysis
FastICA as a package for Matlab, in R language, C++
ICALAB Toolboxes for Matlab, developed at RIKEN
High Performance Signal Analysis Toolkit provides C++ implementations of FastICA and Infomax
ICA toolbox Matlab tools for ICA with Bell-Sejnowski, Molgedey-Schuster and mean field ICA. Developed at DTU.
Demonstration of the cocktail party problem Archived 2010-03-13 at the Wayback Machine
EEGLAB Toolbox ICA of EEG for Matlab, developed at UCSD.
FMRLAB Toolbox ICA of fMRI for Matlab, developed at UCSD
MELODIC, part of the FMRIB Software Library.
Discussion of ICA used in a biomedical shape-representation context
FastICA, CuBICA, JADE and TDSEP algorithm for Python and more...
Group ICA Toolbox and Fusion ICA Toolbox
Tutorial: Using ICA for cleaning EEG signals Archived 2016-03-04 at the Wayback Machine

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Independent Component Analysis: A Demo".

[jutten85-2] 1 2 Ans, B., Hérault, J., & Jutten, C. (1985). Architectures neuromimétiques adaptatives : Détection de primitives. Cognitiva 85 (Vol. 2, pp. 593-597). Paris: CESTA.

[3] Hyvärinen, Aapo (2013). "Independent component analysis: recent advances". Philosophical Transactions: Mathematical, Physical and Engineering Sciences. 371 (1984): 20110534. Bibcode:2012RSPTA.37110534H. doi:10.1098/rsta.2011.0534. ISSN 1364-503X. JSTOR 41739975. PMC 3538438 . PMID 23277597.

[4] Isomura, Takuya; Toyoizumi, Taro (2016). "A local learning rule for independent component analysis". Scientific Reports. 6: 28073. Bibcode:2016NatSR...628073I. doi:10.1038/srep28073. PMC 4914970 . PMID 27323661.

[ComoJ2010-5] Comon, P.; Jutten C., (2010): Handbook of Blind Source Separation, Independent Component Analysis and Applications. Academic Press, Oxford UK. ISBN 978-0-12-374726-6

[Stone_2004-6] 1 2 Stone, James V. (2004). Independent component analysis : a tutorial introduction. Cambridge, Massachusetts: MIT Press. ISBN 978-0-262-69315-8.

[7] Hyvärinen, Aapo; Karhunen, Juha; Oja, Erkki (2001). Independent component analysis (1st ed.). New York: John Wiley & Sons. ISBN 978-0-471-22131-9.

[8] "Operation Ghost Stories: Inside the Russian Spy Case". FBI.gov. Federal Bureau of Investigation. 28 June 2010.

[9] "Former GE Power Engineer Sentenced for Conspiracy to Commit Economic Espionage". Justice.gov. U.S. Department of Justice. 3 January 2022.

[comon94-10] 1 2 3 4 5 Pierre Comon (1994) Independent component analysis, a new concept? http://www.ece.ucsb.edu/wcsl/courses/ECE594/594C_F10Madhow/comon94.pdf

[Springer-11] 1 2 3 Holmes, M. (2023). Introduction to Scientific Computing and Data Analysis, 2nd Ed. Springer. ISBN 978-3-031-22429-4.

[12] Holmes, Mark (2023). Introduction to Scientific Computing and Data Analysis (2nd ed.). Springer. ISBN 978-3-031-22429-4.

[13] Theorem 11, Comon, Pierre. "Independent component analysis, a new concept?." Signal processing 36.3 (1994): 287-314.

[Hyvärinen-14] Johan Himbergand Aapo Hyvärinen, Independent Component Analysis For Binary Data: An Experimental Study , Proc. Int. Workshop on Independent Component Analysis and Blind Signal Separation (ICA2001), San Diego, California, 2001.

[Huyna-15] Huy Nguyen and Rong Zheng, Binary Independent Component Analysis With or Mixtures , IEEE Transactions on Signal Processing, Vol. 59, Issue 7. (July 2011), pp. 3168–3181.

[Generalized_Binary_ICA-16] Painsky, Amichai; Rosset, Saharon; Feder, Meir (2014). "Generalized binary independent component analysis". 2014 IEEE International Symposium on Information Theory. pp. 1326–1330. doi:10.1109/ISIT.2014.6875048. ISBN 978-1-4799-5186-4. S2CID 18579555.

[James_V._Stone_2004-17] James V. Stone(2004); "Independent Component Analysis: A Tutorial Introduction", The MIT Press Cambridge, Massachusetts, London, England; ISBN 0-262-69315-1

[18] Kruskal, JB. 1969; "Toward a practical method which helps uncover the structure of a set of observations by finding the line transformation which optimizes a new "index of condensation", Pages 427–440 of: Milton, RC, & Nelder, JA (eds), Statistical computation; New York, Academic Press

[19] Hyvärinen, Aapo; Erkki Oja (2000). "Independent Component Analysis:Algorithms and Applications". Neural Networks. 4-5. 13 (4–5): 411–430. CiteSeerX 10.1.1.79.7003 . doi:10.1016/s0893-6080(00)00026-5. PMID 10946390. S2CID 11959218.

[pc91-20] 1 2 P.Comon, Independent Component Analysis, Workshop on Higher-Order Statistics, July 1991, republished in J-L. Lacoume, editor, Higher Order Statistics, pp. 29-38. Elsevier, Amsterdam, London, 1992. HAL link

[21] Hyvärinen, Aapo; Karhunen, Juha; Oja, Erkki (2001). Independent component analysis (Reprint ed.). New York, NY: Wiley. ISBN 978-0-471-40540-5.

[22] Hyvärinen, Aapo (1998). "New approximations of differential entropy for independent component analysis and projection pursuit". Advances in Neural Information Processing Systems. 10: 273–279.

[Bell-Sejnowski-23] Bell, A. J.; Sejnowski, T. J. (1995). "An Information-Maximization Approach to Blind Separation and Blind Deconvolution", Neural Computation, 7, 1129-1159

[ReferenceA-24] 1 2 James V. Stone (2004). "Independent Component Analysis: A Tutorial Introduction", The MIT Press Cambridge, Massachusetts, London, England; ISBN 0-262-69315-1

[25] Hérault, J.; Ans, B. (1984). "Réseau de neurones à synapses modifiables : Décodage de messages sensoriels composites par apprentissage non supervisé et permanent". Comptes Rendus de l'Académie des Sciences, Série III. 299: 525–528.

[26] Hérault, J., Jutten, C., & Ans, B. (1985). Détection de grandeurs primitives dans un message composite par une architecture de calcul neuromimétique en apprentissage non supervisé. Proceedings of the 10th Workshop Traitement du signal et ses applications (Vol. 2, pp. 1017-1022). Nice (France): GRETSI.

[27] Hérault, J., & Jutten, C. (1986). Space or time adaptive signal processing by neural networks models. Intern. Conf. on Neural Networks for Computing (pp. 206-211). Snowbird (Utah, USA).

[card97-28] J-F.Cardoso, "Infomax and Maximum Likelihood for source separation", IEEE Sig. Proc. Letters, 1997, 4(4):112-114.

[card98-29] J-F.Cardoso, "Blind signal separation: statistical principles", Proc. of the IEEE, 1998, 90(8):2009-2025.

[30] Hyvärinen, A.; Oja, E. (2000-06-01). "Independent component analysis: algorithms and applications" (PDF). Neural Networks. 13 (4): 411–430. doi:10.1016/S0893-6080(00)00026-5. ISSN 0893-6080. PMID 10946390. S2CID 11959218.

[HochreiterSchmidhuber1999-31] Hochreiter, Sepp; Schmidhuber, Jürgen (1999). "Feature Extraction Through LOCOCODE" (PDF). Neural Computation. 11 (3): 679–714. doi:10.1162/089976699300016629. ISSN 0899-7667. PMID 10085426. S2CID 1642107. Archived from the original (PDF) on 2017-07-06. Retrieved 24 February 2018.

[32] Ferreira, Artur J.; Figueiredo, Mário A.T. (2006). "On the use of independent component analysis for image compression" . Signal Processing: Image Communication. 21 (5): 378–389. doi:10.1016/j.image.2006.01.002. ISSN 0923-5965.

[33] Brown, GD; Yamada, S; Sejnowski, TJ (2001). "Independent components analysis at the neural cocktail party". Trends in Neurosciences. 24 (1): 54–63. doi:10.1016/s0166-2236(00)01683-0. PMID 11163888. S2CID 511254.

[34] Lewicki, MS (1998). "Areview of methods for spike sorting: detection and classification of neural action potentials". Network: Computation in Neural Systems. 9 (4): 53–78. doi:10.1088/0954-898X_9_4_001. S2CID 10290908.

[35] Barlett, MS (2001). Face image analysis by unsupervised learning. Boston: Kluwer International Series on Engineering and Computer Science.

[36] Bell, AJ; Sejnowski, TJ (1997). "The independent components of natural scenes are edge filters". Vision Research. 37 (23): 3327–3338. doi:10.1016/s0042-6989(97)00121-1. PMC 2882863 . PMID 9425547.

[37] Back, AD; Weigend, AS (1997). "A first application of independent component analysis to extracting structure from stock returns". International Journal of Neural Systems. 8 (4): 473–484. doi:10.1142/s0129065797000458. PMID 9730022. S2CID 872703.

[38] Hyvarinen, A, Karhunen, J & Oja, E (2001a). Independent component analysis. New York: John Wiley and Sons.{{cite book}}: CS1 maint: multiple names: authors list (link)

[39] Polder, G; van der Heijen, FWAM (2003). "Estimation of compound distribution in spectral images of tomatoes using independent component analysis". In R. Leitner (ed.). Spectral Imaging. Proceedings of the International Workshop of the Carinthian Tech Research AG, Graz, Austria, 3 April 2003. Vienna, Austria: Austrian Computer Society. pp. 57–64.

[40] Delorme, A; Sejnowski, T; Makeig, S (2007). "Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis". NeuroImage. 34 (4): 1443–1449. doi:10.1016/j.neuroimage.2006.11.004. PMC 2895624 . PMID 17188898.

[41] Douglas, P (2013). "Single trial decoding of belief decision making from EEG and fMRI data using independent components features". Frontiers in Human Neuroscience. 7: 392. doi: 10.3389/fnhum.2013.00392 . PMC 3728485 . PMID 23914164.

[42] Trapnell, C; Cacchiarelli, D; Grimsby, J (2014). "The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells". Nature Biotechnology. 32 (4): 381–386. doi:10.1038/nbt.2859. PMC 4122333 . PMID 24658644.

[Kiviniemi2003-43] Kiviniemi, Vesa J.; Kantola, Juha-Heikki; Jauhiainen, Jukka; Hyvärinen, Aapo; Tervonen, Osmo (2003). "Independent component analysis of nondeterministic fMRI signal sources". NeuroImage. 19 (2): 253–260. doi:10.1016/S1053-8119(03)00097-1. PMID 12814576. S2CID 17110486.

[44] Wang, Jingying; Xu, Haiguang; Gu, Junhua; An, Tao; Cui, Haijuan; Li, Jianxun; Zhang, Zhongli; Zheng, Qian; Wu, Xiang-Ping (2010-11-01). "How to Identify and Separate Bright Galaxy Clusters from the Low-frequency Radio Sky?". The Astrophysical Journal. 723 (1): 620–633. arXiv: 1008.3391 . Bibcode:2010ApJ...723..620W. doi: 10.1088/0004-637X/723/1/620 . ISSN 0004-637X.

[45] Moraux, Franck; Villa, Christophe (2003). "The dynamics of the term structure of interest rates: An Independent Component Analysis". Connectionist Approaches in Economics and Management Sciences. Advances in Computational Management Science. Vol. 6. pp. 215–232. doi:10.1007/978-1-4757-3722-6_11. ISBN 978-1-4757-3722-6.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]