Spike-triggered average

Last updated December 01, 2024

The spike-triggered averaging (STA) is a tool for characterizing the response properties of a neuron using the spikes emitted in response to a time-varying stimulus. The STA provides an estimate of a neuron's linear receptive field. It is a useful technique for the analysis of electrophysiological data.

Mathematically, the STA is the average stimulus preceding a spike.^[1]^[2]^[3]^[4] To compute the STA, the stimulus in the time window preceding each spike is extracted, and the resulting (spike-triggered) stimuli are averaged (see diagram). The STA provides an unbiased estimate of a neuron's receptive field only if the stimulus distribution is spherically symmetric (e.g., Gaussian white noise).^[3]^[5]^[6]

The STA has been used to characterize retinal ganglion cells,^[7]^[8] neurons in the lateral geniculate nucleus and simple cells in the striate cortex (V1) .^[9]^[10] It can be used to estimate the linear stage of the linear-nonlinear-Poisson (LNP) cascade model.^[4] The approach has also been used to analyze how transcription factor dynamics control gene regulation within individual cells.^[11]

Spike-triggered averaging is also commonly referred to as reverse correlation or white-noise analysis. The STA is well known as the first term in the Volterra kernel or Wiener kernel series expansion.^[12] It is closely related to linear regression, and identical to it in common circumstances.

Mathematical definition

Standard STA

Let $\mathbf {x_{i}}$ denote the spatio-temporal stimulus vector preceding the $i$ 'th time bin, and $y_{i}$ the spike count in that bin. The stimuli can be assumed to have zero mean (i.e., $E[\mathbf {x} ]=0$ ). If not, it can be transformed to have zero-mean by subtracting the mean stimulus from each vector. The STA is given

\mathrm {STA} ={\tfrac {1}{n_{sp}}}\sum _{i=1}^{T}y_{i}\mathbf {x_{i}} ,

where $n_{sp}=\sum y_{i}$ , the total number of spikes.

This equation is more easily expressed in matrix notation: let $X$ denote a matrix whose $i$ 'th row is the stimulus vector $\mathbf {x_{i}^{T}}$ and let $\mathbf {y}$ denote a column vector whose $i$ th element is $y_{i}$ . Then the STA can be written

\mathrm {STA} ={\tfrac {1}{n_{sp}}}X^{T}\mathbf {y} .

Whitened STA

If the stimulus is not white noise, but instead has non-zero correlation across space or time, the standard STA provides a biased estimate of the linear receptive field.^[5] It may therefore be appropriate to whiten the STA by the inverse of the stimulus covariance matrix. This resolves the spatial dependency issue, however we still assume the stimulus is temporally independent. The resulting estimator is known as the whitened STA, which is given by

\mathrm {STA} _{w}=\left({\tfrac {1}{T}}\sum _{i=1}^{T}\mathbf {x_{i}} \mathbf {x_{i}} ^{T}\right)^{-1}\left({\tfrac {1}{n_{sp}}}\sum _{i=1}^{T}y_{i}\mathbf {x_{i}} \right),

where the first term is the inverse covariance matrix of the raw stimuli and the second is the standard STA. In matrix notation, this can be written

\mathrm {STA} _{w}={\tfrac {T}{n_{sp}}}\left(X^{T}X\right)^{-1}X^{T}\mathbf {y} .

The whitened STA is unbiased only if the stimulus distribution can be described by a correlated Gaussian distribution ^[6] (correlated Gaussian distributions are elliptically symmetric, i.e. can be made spherically symmetric by a linear transformation, but not all elliptically symmetric distributions are Gaussian). This is a weaker condition than spherical symmetry.

The whitened STA is equivalent to linear least-squares regression of the stimulus against the spike train.

Regularized STA

In practice, it may be necessary to regularize the whitened STA, since whitening amplifies noise along stimulus dimensions that are poorly explored by the stimulus (i.e., axes along which the stimulus has low variance). A common approach to this problem is ridge regression. The regularized STA, computed using ridge regression, can be written

\mathrm {STA} _{ridge}={\tfrac {T}{n_{sp}}}\left(X^{T}X+\lambda I\right)^{-1}X^{T}\mathbf {y} ,

where $I$ denotes the identity matrix and $\lambda$ is the ridge parameter controlling the amount of regularization. This procedure has a simple Bayesian interpretation: ridge regression is equivalent to placing a prior on the STA elements that says they are drawn i.i.d. from a zero-mean Gaussian prior with covariance proportional to the identity matrix. The ridge parameter sets the inverse variance of this prior, and is usually fit by cross-validation or empirical Bayes.

Statistical properties

For responses generated according to an LNP model, the whitened STA provides an estimate of the subspace spanned by the linear receptive field. The properties of this estimate are as follows

Consistency

The whitened STA is a consistent estimator, i.e., it converges to the true linear subspace, if

The stimulus distribution $P(\mathbf {x} )$ is elliptically symmetric, e.g., Gaussian. (Bussgang's theorem)
The expected STA is not zero, i.e., nonlinearity induces a shift in the spike-triggered stimuli.^[5]

Optimality

The whitened STA is an asymptotically efficient estimator if

The stimulus distribution $P(\mathbf {x} )$ is Gaussian
The neuron's nonlinear response function is the exponential, $exp(x)$ .^[5]

For arbitrary stimuli, the STA is generally not consistent or efficient. For such cases, maximum likelihood and information-based estimators ^[5]^[6]^[13] have been developed that are both consistent and efficient.

Related Research Articles

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. Also known as Tikhonov regularization, named for Andrey Tikhonov, it is a method of regularization of ill-posed problems. It is particularly useful to mitigate the problem of multicollinearity in linear regression, which commonly occurs in models with large numbers of parameters. In general, the method provides improved efficiency in parameter estimation problems in exchange for a tolerable amount of bias.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in R^p×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has a normal distribution, the sample covariance matrix has a Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data, heteroscedasticity, or autocorrelated residuals require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fitted by a method of successive approximations (iterations).

In statistics, the theory of minimum norm quadratic unbiased estimation (MINQUE) was developed by C. R. Rao. MINQUE is a theory alongside other estimation methods in estimation theory, such as the method of moments or maximum likelihood estimation. Similar to the theory of best linear unbiased estimation, MINQUE is specifically concerned with linear regression models. The method was originally conceived to estimate heteroscedastic error variance in multiple linear regression. MINQUE estimators also provide an alternative to maximum likelihood estimators or restricted maximum likelihood estimators for variance components in mixed effects models. MINQUE estimators are quadratic forms of the response variable and are used to estimate a linear function of the variances.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

In control theory, the linear–quadratic–Gaussian (LQG) control problem is one of the most fundamental optimal control problems, and it can also be operated repeatedly for model predictive control. It concerns linear systems driven by additive white Gaussian noise. The problem is to determine an output feedback law that is optimal in the sense of minimizing the expected value of a quadratic cost criterion. Output measurements are assumed to be corrupted by Gaussian noise and the initial state, likewise, is assumed to be a Gaussian random vector.

In the field of multivariate statistics, kernel principal component analysis (kernel PCA) is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are performed in a reproducing kernel Hilbert space.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors, Eicker–Huber–White standard errors, to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.

<span class="mw-page-title-main">Linear-nonlinear-Poisson cascade model</span>

The linear-nonlinear-Poisson (LNP) cascade model is a simplified functional model of neural spike responses. It has been successfully used to describe the response characteristics of neurons in early sensory pathways, especially the visual system. The LNP model is generally implicit when using reverse correlation or the spike-triggered average to characterize neural responses with white-noise stimuli.

Spike-triggered covariance (STC) analysis is a tool for characterizing a neuron's response properties using the covariance of stimuli that elicit spikes from a neuron. STC is related to the spike-triggered average (STA), and provides a complementary tool for estimating linear filters in a linear-nonlinear-Poisson (LNP) cascade model. Unlike STA, the STC can be used to identify a multi-dimensional feature space in which a neuron computes its response.

In probability theory, the family of complex normal distributions, denoted $or, characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix, and the relation matrix . The standard complex normal is the univariate distribution with,, and .$

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

Maximally informative dimensions is a dimensionality reduction technique used in the statistical analyses of neural responses. Specifically, it is a way of projecting a stimulus onto a low-dimensional subspace so that as much information as possible about the stimulus is preserved in the neural response. It is motivated by the fact that natural stimuli are typically confined by their statistics to a lower-dimensional space than that spanned by white noise but correctly identifying this subspace using traditional techniques is complicated by the correlations that exist within natural images. Within this subspace, stimulus-response functions may be either linear or nonlinear. The idea was originally developed by Tatyana Sharpee, Nicole C. Rust, and William Bialek in 2003.

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

References

↑ de Boer and Kuyper (1968) Triggered Correlation. IEEE Transact. Biomed. Eng., 15:169-179
↑ Marmarelis, P. Z. and Naka, K. (1972). White-noise analysis of a neuron chain: an application of the Wiener theory. Science, 175:1276-1278
1 2 Chichilnisky, E. J. (2001). A simple white noise analysis of neuronal light responses. Network: Computation in Neural Systems, 12:199-213
1 2 Simoncelli, E. P., Paninski, L., Pillow, J. & Swartz, O. (2004). "Characterization of neural responses with stochastic stimuli". In M. Gazzaniga (Ed.) The Cognitive Neurosciences, III (pp. 327-338). MIT press.
1 2 3 4 5 Paninski, L. (2003). Convergence properties of some spike-triggered analysis techniques. Network: Computation in Neural Systems 14:437-464
1 2 3 Sharpee, T.O., Rust, N.C., & Bialek, W. (2004). Analyzing neural responses to natural signals: Maximally informative dimensions. Neural Computation 16:223-250
↑ Sakai and Naka (1987).
↑ Meister, Pine, and Baylor (1994).
↑ Jones and Palmer (1987).
↑ McLean and Palmer (1989).
↑ Lin, Yihan (2015). "Combinatorial gene regulation by modulation of relative pulse timing". Nature. 527 (7576): 54–58. Bibcode:2015Natur.527...54L. doi:10.1038/nature15710. PMC 4870307 . PMID 26466562.
↑ Lee and Schetzen (1965). Measurement of the Wiener kernels of a non- linear system by cross-correlation. International Journal of Control, First Series, 2:237-254
↑ Kouh M. & Sharpee, T.O. (2009). Estimating linear-nonlinear models using Rényi divergences, Network: Computation in Neural Systems 20(2): 49–68

External links

Matlab code for computing the STA

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[deBoer68-1] Boer and Kuyper (1968) Triggered Correlation. IEEE Transact. Biomed. Eng., 15:169-179

[Marmarelis72-2] Marmarelis, P. Z. and Naka, K. (1972). White-noise analysis of a neuron chain: an application of the Wiener theory. Science, 175:1276-1278

[Chichilnisky01-3] 1 2 Chichilnisky, E. J. (2001). A simple white noise analysis of neuronal light responses. Network: Computation in Neural Systems, 12:199-213

[simoncelli-4] 1 2 Simoncelli, E. P., Paninski, L., Pillow, J. & Swartz, O. (2004). "Characterization of neural responses with stochastic stimuli". In M. Gazzaniga (Ed.) The Cognitive Neurosciences, III (pp. 327-338). MIT press.

[Paninski03-5] 1 2 3 4 5 Paninski, L. (2003). Convergence properties of some spike-triggered analysis techniques. Network: Computation in Neural Systems 14:437-464

[SharpeeRustBialek04-6] 1 2 3 Sharpee, T.O., Rust, N.C., & Bialek, W. (2004). Analyzing neural responses to natural signals: Maximally informative dimensions. Neural Computation 16:223-250

[7] Sakai and Naka (1987).

[8] Meister, Pine, and Baylor (1994).

[9] Jones and Palmer (1987).

[10] McLean and Palmer (1989).

[11] Lin, Yihan (2015). "Combinatorial gene regulation by modulation of relative pulse timing". Nature. 527 (7576): 54–58. Bibcode:2015Natur.527...54L. doi:10.1038/nature15710. PMC 4870307 . PMID 26466562.

[12] Lee and Schetzen (1965). Measurement of the Wiener kernels of a non- linear system by cross-correlation. International Journal of Control, First Series, 2:237-254

[KouhSharpee09-13] Kouh M. & Sharpee, T.O. (2009). Estimating linear-nonlinear models using Rényi divergences, Network: Computation in Neural Systems 20(2): 49–68

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]