Mercer's theorem

Last updated

In mathematics, specifically functional analysis, Mercer's theorem is a representation of a symmetric positive-definite function on a square as a sum of a convergent sequence of product functions. This theorem, presented in ( Mercer 1909 ), is one of the most notable results of the work of James Mercer (1883–1932). It is an important theoretical tool in the theory of integral equations; it is used in the Hilbert space theory of stochastic processes, for example the Karhunen–Loève theorem; and it is also used in the reproducing kernel Hilbert space theory where it characterizes a symmetric positive-definite kernel as a reproducing kernel. [1]

Contents

Introduction

To explain Mercer's theorem, we first consider an important special case; see below for a more general formulation. A kernel, in this context, is a symmetric continuous function

where symmetric means that for all .

K is said to be a positive-definite kernel if and only if

for all finite sequences of points x1, ..., xn of [a, b] and all choices of real numbers c1, ..., cn. Note that the term "positive-definite" is well-established in literature despite the weak inequality in the definition. [2] [3]

Associated to K is a linear operator (more specifically a Hilbert–Schmidt integral operator) on functions defined by the integral

For technical considerations we assume can range through the space L2[a, b] (see Lp space) of square-integrable real-valued functions. Since TK is a linear operator, we can talk about eigenvalues and eigenfunctions of TK.

Theorem. Suppose K is a continuous symmetric positive-definite kernel. Then there is an orthonormal basis {ei}i of L2[a, b] consisting of eigenfunctions of TK such that the corresponding sequence of eigenvalues {λi}i is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on [a, b] and K has the representation

where the convergence is absolute and uniform.

Details

We now explain in greater detail the structure of the proof of Mercer's theorem, particularly how it relates to spectral theory of compact operators.

To show compactness, show that the image of the unit ball of L2[a,b] under TK equicontinuous and apply Ascoli's theorem, to show that the image of the unit ball is relatively compact in C([a,b]) with the uniform norm and a fortiori in L2[a,b].

Now apply the spectral theorem for compact operators on Hilbert spaces to TK to show the existence of the orthonormal basis {ei}i of L2[a,b]

If λi 0, the eigenvector (eigenfunction) ei is seen to be continuous on [a,b]. Now

which shows that the sequence

converges absolutely and uniformly to a kernel K0 which is easily seen to define the same operator as the kernel K. Hence K=K0 from which Mercer's theorem follows.

Finally, to show non-negativity of the eigenvalues one can write and expressing the right hand side as an integral well-approximated by its Riemann sums, which are non-negative by positive-definiteness of K, implying , implying .

Trace

The following is immediate:

Theorem. Suppose K is a continuous symmetric positive-definite kernel; TK has a sequence of nonnegative eigenvalues {λi}i. Then

This shows that the operator TK is a trace class operator and

Generalizations

Mercer's theorem itself is a generalization of the result that any symmetric positive-semidefinite matrix is the Gramian matrix of a set of vectors.

The first generalization [ citation needed ] replaces the interval [a, b] with any compact Hausdorff space and Lebesgue measure on [a, b] is replaced by a finite countably additive measure μ on the Borel algebra of X whose support is X. This means that μ(U) > 0 for any nonempty open subset U of X.

A recent generalization [ citation needed ] replaces these conditions by the following: the set X is a first-countable topological space endowed with a Borel (complete) measure μ. X is the support of μ and, for all x in X, there is an open set U containing x and having finite measure. Then essentially the same result holds:

Theorem. Suppose K is a continuous symmetric positive-definite kernel on X. If the function κ is L1μ(X), where κ(x)=K(x,x), for all x in X, then there is an orthonormal set {ei}i of L2μ(X) consisting of eigenfunctions of TK such that corresponding sequence of eigenvalues {λi}i is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on X and K has the representation

where the convergence is absolute and uniform on compact subsets of X.

The next generalization [ citation needed ] deals with representations of measurable kernels.

Let (X, M, μ) be a σ-finite measure space. An L2 (or square-integrable) kernel on X is a function

L2 kernels define a bounded operator TK by the formula

TK is a compact operator (actually it is even a Hilbert–Schmidt operator). If the kernel K is symmetric, by the spectral theorem, TK has an orthonormal basis of eigenvectors. Those eigenvectors that correspond to non-zero eigenvalues can be arranged in a sequence {ei}i (regardless of separability).

Theorem. If K is a symmetric positive-definite kernel on (X, M, μ), then

where the convergence in the L2 norm. Note that when continuity of the kernel is not assumed, the expansion no longer converges uniformly.

Mercer's condition

In mathematics, a real-valued function K(x,y) is said to fulfill Mercer's condition if for all square-integrable functions g(x) one has

Discrete analog

This is analogous to the definition of a positive-semidefinite matrix. This is a matrix of dimension , which satisfies, for all vectors , the property

.

Examples

A positive constant function

satisfies Mercer's condition, as then the integral becomes by Fubini's theorem

which is indeed non-negative.

See also

Notes

  1. Bartlett, Peter (2008). "Reproducing Kernel Hilbert Spaces" (PDF). Lecture notes of CS281B/Stat241B Statistical Learning Theory. University of California at Berkeley.
  2. Mohri, Mehryar (2018). Foundations of machine learning. Afshin Rostamizadeh, Ameet Talwalkar (Second ed.). Cambridge, Massachusetts. ISBN   978-0-262-03940-6. OCLC   1041560990.{{cite book}}: CS1 maint: location missing publisher (link)
  3. Berlinet, A. (2004). Reproducing kernel Hilbert spaces in probability and statistics. Christine Thomas-Agnan. New York: Springer Science+Business Media. ISBN   1-4419-9096-8. OCLC   844346520.

Related Research Articles

In mathematics, particularly linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix can be diagonalized. This is extremely useful because computations involving a diagonalizable matrix can often be reduced to much simpler computations involving the corresponding diagonal matrix. The concept of diagonalization is relatively straightforward for operators on finite-dimensional vector spaces but requires some modification for operators on infinite-dimensional spaces. In general, the spectral theorem identifies a class of linear operators that can be modeled by multiplication operators, which are as simple as one can hope to find. In more abstract language, the spectral theorem is a statement about commutative C*-algebras. See also spectral theory for a historical perspective.

In mathematics, a self-adjoint operator on an infinite-dimensional complex vector space V with inner product is a linear map A that is its own adjoint. If V is finite-dimensional with a given orthonormal basis, this is equivalent to the condition that the matrix of A is a Hermitian matrix, i.e., equal to its conjugate transpose A. By the finite-dimensional spectral theorem, V has an orthonormal basis such that the matrix of A relative to this basis is a diagonal matrix with entries in the real numbers. This article deals with applying generalizations of this concept to operators on Hilbert spaces of arbitrary dimension.

In mathematics, specifically functional analysis, a trace-class operator is a linear operator for which a trace may be defined, such that the trace is a finite number independent of the choice of basis used to compute the trace. This trace of trace-class operators generalizes the trace of matrices studied in linear algebra. All trace-class operators are compact operators.

In mathematics and its applications, a Sturm–Liouville problem is a second-order linear ordinary differential equation of the form:

In mathematics, spectral theory is an inclusive term for theories extending the eigenvector and eigenvalue theory of a single square matrix to a much broader theory of the structure of operators in a variety of mathematical spaces. It is a result of studies of linear algebra and the solutions of systems of linear equations and their generalizations. The theory is connected to that of analytic functions because the spectral properties of an operator are related to analytic functions of the spectral parameter.

<span class="mw-page-title-main">Reproducing kernel Hilbert space</span> In functional analysis, a Hilbert space

In functional analysis, a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which point evaluation is a continuous linear functional. Roughly speaking, this means that if two functions and in the RKHS are close in norm, i.e., is small, then and are also pointwise close, i.e., is small for all . The converse does not need to be true. Informally, this can be shown by looking at the supremum norm: the sequence of functions converges pointwise, but does not converge uniformly i.e. does not converge with respect to the supremum norm.

In linear algebra and functional analysis, the min-max theorem, or variational theorem, or Courant–Fischer–Weyl min-max principle, is a result that gives a variational characterization of eigenvalues of compact Hermitian operators on Hilbert spaces. It can be viewed as the starting point of many results of similar nature.

In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem states that a stochastic process can be represented as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

The spectrum of a linear operator that operates on a Banach space is a fundamental concept of functional analysis. The spectrum consists of all scalars such that the operator does not have a bounded inverse on . The spectrum has a standard decomposition into three parts:

In linear algebra, it is often important to know which vectors have their directions unchanged by a given linear transformation. An eigenvector or characteristic vector is such a vector. Thus an eigenvector of a linear transformation is scaled by a constant factor when the linear transformation is applied to it: . The corresponding eigenvalue, characteristic value, or characteristic root is the multiplying factor .

In mathematics, the square root of a matrix extends the notion of square root from numbers to matrices. A matrix B is said to be a square root of A if the matrix product BB is equal to A.

In the mathematical discipline of functional analysis, the concept of a compact operator on Hilbert space is an extension of the concept of a matrix acting on a finite-dimensional vector space; in Hilbert space, compact operators are precisely the closure of finite-rank operators in the topology induced by the operator norm. As such, results from matrix theory can sometimes be extended to compact operators using similar arguments. By contrast, the study of general operators on infinite-dimensional spaces often requires a genuinely different approach.

In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving integral operator equations. Since then, positive-definite functions and their various analogues and generalizations have arisen in diverse parts of mathematics. They occur naturally in Fourier analysis, probability theory, operator theory, complex function-theory, moment problems, integral equations, boundary-value problems for partial differential equations, machine learning, embedding problem, information theory, and other areas.

In operator theory, a bounded operator T: XY between normed vector spaces X and Y is said to be a contraction if its operator norm ||T || ≤ 1. This notion is a special case of the concept of a contraction mapping, but every bounded operator becomes a contraction after suitable scaling. The analysis of contractions provides insight into the structure of operators, or a family of operators. The theory of contractions on Hilbert space is largely due to Béla Szőkefalvi-Nagy and Ciprian Foias.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

In mathematics, a zonal spherical function or often just spherical function is a function on a locally compact group G with compact subgroup K (often a maximal compact subgroup) that arises as the matrix coefficient of a K-invariant vector in an irreducible representation of G. The key examples are the matrix coefficients of the spherical principal series, the irreducible representations appearing in the decomposition of the unitary representation of G on L2(G/K). In this case the commutant of G is generated by the algebra of biinvariant functions on G with respect to K acting by right convolution. It is commutative if in addition G/K is a symmetric space, for example when G is a connected semisimple Lie group with finite centre and K is a maximal compact subgroup. The matrix coefficients of the spherical principal series describe precisely the spectrum of the corresponding C* algebra generated by the biinvariant functions of compact support, often called a Hecke algebra. The spectrum of the commutative Banach *-algebra of biinvariant L1 functions is larger; when G is a semisimple Lie group with maximal compact subgroup K, additional characters come from matrix coefficients of the complementary series, obtained by analytic continuation of the spherical principal series.

In mathematics, a commutation theorem for traces explicitly identifies the commutant of a specific von Neumann algebra acting on a Hilbert space in the presence of a trace.

<span class="mw-page-title-main">Hilbert space</span> Type of topological vector space

In mathematics, Hilbert spaces allow the methods of linear algebra and calculus to be generalized from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise naturally and frequently in mathematics and physics, typically as function spaces. Formally, a Hilbert space is a vector space equipped with an inner product that induces a distance function for which the space is a complete metric space.

In mathematics, the Neumann–Poincaré operator or Poincaré–Neumann operator, named after Carl Neumann and Henri Poincaré, is a non-self-adjoint compact operator introduced by Poincaré to solve boundary value problems for the Laplacian on bounded domains in Euclidean space. Within the language of potential theory it reduces the partial differential equation to an integral equation on the boundary to which the theory of Fredholm operators can be applied. The theory is particularly simple in two dimensions—the case treated in detail in this article—where it is related to complex function theory, the conjugate Beurling transform or complex Hilbert transform and the Fredholm eigenvalues of bounded planar domains.

Functional principal component analysis (FPCA) is a statistical method for investigating the dominant modes of variation of functional data. Using this method, a random function is represented in the eigenbasis, which is an orthonormal basis of the Hilbert space L2 that consists of the eigenfunctions of the autocovariance operator. FPCA represents functional data in the most parsimonious way, in the sense that when using a fixed number of basis functions, the eigenfunction basis explains more variation than any other basis expansion. FPCA can be applied for representing random functions, or in functional regression and classification.

References