Spike-triggered average

Last updated

The spike-triggered averaging (STA) is a tool for characterizing the response properties of a neuron using the spikes emitted in response to a time-varying stimulus. The STA provides an estimate of a neuron's linear receptive field. It is a useful technique for the analysis of electrophysiological data.

Contents

Diagram showing how the STA is calculated. A stimulus (consisting here of a checkerboard with random pixels) is presented, and spikes from the neuron are recorded. The stimuli in some time window preceding each spike (here consisting of 3 time bins) are selected (color boxes) and then averaged (here just summed for clarity) to obtain the STA. The STA indicates that this neuron is selective for a bright spot of light just before the spike, located in the top left corner of the checkerboard. Illustration diagram for the Spike-triggered average.pdf
Diagram showing how the STA is calculated. A stimulus (consisting here of a checkerboard with random pixels) is presented, and spikes from the neuron are recorded. The stimuli in some time window preceding each spike (here consisting of 3 time bins) are selected (color boxes) and then averaged (here just summed for clarity) to obtain the STA. The STA indicates that this neuron is selective for a bright spot of light just before the spike, located in the top left corner of the checkerboard.

Mathematically, the STA is the average stimulus preceding a spike. [1] [2] [3] [4] To compute the STA, the stimulus in the time window preceding each spike is extracted, and the resulting (spike-triggered) stimuli are averaged (see diagram). The STA provides an unbiased estimate of a neuron's receptive field only if the stimulus distribution is spherically symmetric (e.g., Gaussian white noise). [3] [5] [6]

The STA has been used to characterize retinal ganglion cells, [7] [8] neurons in the lateral geniculate nucleus and simple cells in the striate cortex (V1) . [9] [10] It can be used to estimate the linear stage of the linear-nonlinear-Poisson (LNP) cascade model. [4] The approach has also been used to analyze how transcription factor dynamics control gene regulation within individual cells. [11]

Spike-triggered averaging is also commonly referred to as “reverse correlation″ or “white-noise analysis”. The STA is well known as the first term in the Volterra kernel or Wiener kernel series expansion. [12] It is closely related to linear regression, and identical to it in common circumstances.

Mathematical definition

Standard STA

Let denote the spatio-temporal stimulus vector preceding the 'th time bin, and the spike count in that bin. The stimuli can be assumed to have zero mean (i.e., ). If not, it can be transformed to have zero-mean by subtracting the mean stimulus from each vector. The STA is given

where , the total number of spikes.

This equation is more easily expressed in matrix notation: let denote a matrix whose 'th row is the stimulus vector and let denote a column vector whose th element is . Then the STA can be written

Whitened STA

If the stimulus is not white noise, but instead has non-zero correlation across space or time, the standard STA provides a biased estimate of the linear receptive field. [5] It may therefore be appropriate to whiten the STA by the inverse of the stimulus covariance matrix. This resolves the spatial dependency issue, however we still assume the stimulus is temporally independent. The resulting estimator is known as the whitened STA, which is given by

where the first term is the inverse covariance matrix of the raw stimuli and the second is the standard STA. In matrix notation, this can be written

The whitened STA is unbiased only if the stimulus distribution can be described by a correlated Gaussian distribution [6] (correlated Gaussian distributions are elliptically symmetric, i.e. can be made spherically symmetric by a linear transformation, but not all elliptically symmetric distributions are Gaussian). This is a weaker condition than spherical symmetry.

The whitened STA is equivalent to linear least-squares regression of the stimulus against the spike train.

Regularized STA

In practice, it may be necessary to regularize the whitened STA, since whitening amplifies noise along stimulus dimensions that are poorly explored by the stimulus (i.e., axes along which the stimulus has low variance). A common approach to this problem is ridge regression. The regularized STA, computed using ridge regression, can be written

where denotes the identity matrix and is the ridge parameter controlling the amount of regularization. This procedure has a simple Bayesian interpretation: ridge regression is equivalent to placing a prior on the STA elements that says they are drawn i.i.d. from a zero-mean Gaussian prior with covariance proportional to the identity matrix. The ridge parameter sets the inverse variance of this prior, and is usually fit by cross-validation or empirical Bayes.

Statistical properties

For responses generated according to an LNP model, the whitened STA provides an estimate of the subspace spanned by the linear receptive field. The properties of this estimate are as follows

Consistency

The whitened STA is a consistent estimator, i.e., it converges to the true linear subspace, if

  1. The stimulus distribution is elliptically symmetric, e.g., Gaussian. (Bussgang's theorem)
  2. The expected STA is not zero, i.e., nonlinearity induces a shift in the spike-triggered stimuli. [5]

Optimality

The whitened STA is an asymptotically efficient estimator if

  1. The stimulus distribution is Gaussian
  2. The neuron's nonlinear response function is the exponential, . [5]

For arbitrary stimuli, the STA is generally not consistent or efficient. For such cases, maximum likelihood and information-based estimators [5] [6] [13] have been developed that are both consistent and efficient.

See also

Related Research Articles

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data. Formally, PCA is a statistical technique for reducing the dimensionality of a dataset. This is accomplished by linearly transforming the data into a new coordinate system where the variation in the data can be described with fewer dimensions than the initial data. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points. Principal component analysis has applications in many fields such as population genetics, microbiome studies, and atmospheric science.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector. Any covariance matrix is symmetric and positive semi-definite and its main diagonal contains variances.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. Also known as Tikhonov regularization, named for Andrey Tikhonov, it is a method of regularization of ill-posed problems. it is particularly useful to mitigate the problem of multicollinearity in linear regression, which commonly occurs in models with large numbers of parameters. In general, the method provides improved efficiency in parameter estimation problems in exchange for a tolerable amount of bias.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has a normal distribution, the sample covariance matrix has a Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data, heteroscedasticity, or autocorrelated residuals require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

<span class="mw-page-title-main">Nonlinear regression</span> Regression analysis

In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fitted by a method of successive approximations.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable.

In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordinary least squares and weighted least squares can be statistically inefficient, or even give misleading inferences. GLS was first described by Alexander Aitken in 1936.

In control theory, the linear–quadratic–Gaussian (LQG) control problem is one of the most fundamental optimal control problems, and it can also be operated repeatedly for model predictive control. It concerns linear systems driven by additive white Gaussian noise. The problem is to determine an output feedback law that is optimal in the sense of minimizing the expected value of a quadratic cost criterion. Output measurements are assumed to be corrupted by Gaussian noise and the initial state, likewise, is assumed to be a Gaussian random vector.

In the field of multivariate statistics, kernel principal component analysis is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are performed in a reproducing kernel Hilbert space.

The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors, Eicker–Huber–White standard errors, to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.

<span class="mw-page-title-main">Biological neuron model</span> Mathematical descriptions of the properties of certain cells in the nervous system

Biological neuron models, also known as a spiking neuron models, are mathematical descriptions of the properties of certain cells in the nervous system that generate sharp electrical potentials across their cell membrane, roughly one millisecond in duration, called action potentials or spikes. Since spikes are transmitted along the axon and synapses from the sending neuron to many other neurons, spiking neurons are considered to be a major information processing unit of the nervous system. Spiking neuron models can be divided into different categories: the most detailed mathematical models are biophysical neuron models that describe the membrane voltage as a function of the input current and the activation of ion channels. Mathematically simpler are integrate-and-fire models that describe the membrane voltage as a function of the input current and predict the spike times without a description of the biophysical processes that shape the time course of an action potential. Even more abstract models only predict output spikes as a function of the stimulation where the stimulation can occur through sensory input or pharmacologically. This article provides a short overview of different spiking neuron models and links, whenever possible to experimental phenomena. It includes deterministic and probabilistic models.

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

<span class="mw-page-title-main">Linear-nonlinear-Poisson cascade model</span>

The linear-nonlinear-Poisson (LNP) cascade model is a simplified functional model of neural spike responses. It has been successfully used to describe the response characteristics of neurons in early sensory pathways, especially the visual system. The LNP model is generally implicit when using reverse correlation or the spike-triggered average to characterize neural responses with white-noise stimuli.

Spike-triggered covariance (STC) analysis is a tool for characterizing a neuron's response properties using the covariance of stimuli that elicit spikes from a neuron. STC is related to the spike-triggered average (STA), and provides a complementary tool for estimating linear filters in a linear-nonlinear-Poisson (LNP) cascade model. Unlike STA, the STC can be used to identify a multi-dimensional feature space in which a neuron computes its response.

In probability theory, the family of complex normal distributions, denoted or , characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix , and the relation matrix . The standard complex normal is the univariate distribution with , , and .

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

Within bayesian statistics for machine learning, kernel methods arise from the assumption of an inner product space or similarity structure on inputs. For some such methods, such as support vector machines (SVMs), the original formulation and its regularization were not Bayesian in nature. It is helpful to understand them from a Bayesian perspective. Because the kernels are not necessarily positive semidefinite, the underlying structure may not be inner product spaces, but instead more general reproducing kernel Hilbert spaces. In Bayesian probability kernel methods are a key component of Gaussian processes, where the kernel function is known as the covariance function. Kernel methods have traditionally been used in supervised learning problems where the input space is usually a space of vectors while the output space is a space of scalars. More recently these methods have been extended to problems that deal with multiple outputs such as in multi-task learning.

Maximally informative dimensions is a dimensionality reduction technique used in the statistical analyses of neural responses. Specifically, it is a way of projecting a stimulus onto a low-dimensional subspace so that as much information as possible about the stimulus is preserved in the neural response. It is motivated by the fact that natural stimuli are typically confined by their statistics to a lower-dimensional space than that spanned by white noise but correctly identifying this subspace using traditional techniques is complicated by the correlations that exist within natural images. Within this subspace, stimulus-response functions may be either linear or nonlinear. The idea was originally developed by Tatyana Sharpee, Nicole C. Rust, and William Bialek in 2003.

References

  1. de Boer and Kuyper (1968) Triggered Correlation. IEEE Transact. Biomed. Eng., 15:169-179
  2. Marmarelis, P. Z. and Naka, K. (1972). White-noise analysis of a neuron chain: an application of the Wiener theory. Science, 175:1276-1278
  3. 1 2 Chichilnisky, E. J. (2001). A simple white noise analysis of neuronal light responses. Network: Computation in Neural Systems, 12:199-213
  4. 1 2 Simoncelli, E. P., Paninski, L., Pillow, J. & Swartz, O. (2004). "Characterization of neural responses with stochastic stimuli". In M. Gazzaniga (Ed.) The Cognitive Neurosciences, III (pp. 327-338). MIT press.
  5. 1 2 3 4 5 Paninski, L. (2003). Convergence properties of some spike-triggered analysis techniques. Network: Computation in Neural Systems 14:437-464
  6. 1 2 3 Sharpee, T.O., Rust, N.C., & Bialek, W. (2004). Analyzing neural responses to natural signals: Maximally informative dimensions. Neural Computation 16:223-250
  7. Sakai and Naka (1987).
  8. Meister, Pine, and Baylor (1994).
  9. Jones and Palmer (1987).
  10. McLean and Palmer (1989).
  11. Lin, Yihan (2015). "Combinatorial gene regulation by modulation of relative pulse timing". Nature. 527 (7576): 54–58. doi:10.1038/nature15710. PMC   4870307 . PMID   26466562.
  12. Lee and Schetzen (1965). Measurement of the Wiener kernels of a non- linear system by cross-correlation. International Journal of Control, First Series, 2:237-254
  13. Kouh M. & Sharpee, T.O. (2009). Estimating linear-nonlinear models using Rényi divergences, Network: Computation in Neural Systems 20(2): 49–68