Lebesgue constant

Last updated

In mathematics, the Lebesgue constants (depending on a set of nodes and of its size) give an idea of how good the interpolant of a function (at the given nodes) is in comparison with the best polynomial approximation of the function (the degree of the polynomials are fixed). The Lebesgue constant for polynomials of degree at most n and for the set of n + 1 nodes T is generally denoted by Λn(T). These constants are named after Henri Lebesgue.

Contents

Definition

We fix the interpolation nodes and an interval containing all the interpolation nodes. The process of interpolation maps the function to a polynomial . This defines a mapping from the space C([a, b]) of all continuous functions on [a, b] to itself. The map X is linear and it is a projection on the subspace Πn of polynomials of degree n or less.

The Lebesgue constant is defined as the operator norm of X. This definition requires us to specify a norm on C([a, b]). The uniform norm is usually the most convenient.

Properties

The Lebesgue constant bounds the interpolation error: let p denote the best approximation of f among the polynomials of degree n or less. In other words, p minimizes ||pf|| among all p in Πn. Then

We will here prove this statement with the maximum norm.

by the triangle inequality. But X is a projection on Πn, so

pX(f) = X(p) − X(f) = X(pf).

This finishes the proof since . Note that this relation comes also as a special case of Lebesgue's lemma.

In other words, the interpolation polynomial is at most a factor Λn(T) + 1 worse than the best possible approximation. This suggests that we look for a set of interpolation nodes with a small Lebesgue constant.

The Lebesgue constant can be expressed in terms of the Lagrange basis polynomials:

In fact, we have the Lebesgue function

and the Lebesgue constant (or Lebesgue number) for the grid is its maximum value

Nevertheless, it is not easy to find an explicit expression for Λn(T).

Minimal Lebesgue constants

In the case of equidistant nodes, the Lebesgue constant grows exponentially. More precisely, we have the following asymptotic estimate

On the other hand, the Lebesgue constant grows only logarithmically if Chebyshev nodes are used, since we have

We conclude again that Chebyshev nodes are a very good choice for polynomial interpolation. However, there is an easy (linear) transformation of Chebyshev nodes that gives a better Lebesgue constant. Let ti denote the i-th Chebyshev node. Then, define

For such nodes:

Those nodes are, however, not optimal (i.e. they do not minimize the Lebesgue constants) and the search for an optimal set of nodes (which has already been proved to be unique under some assumptions) is still an intriguing topic in mathematics today. However, this set of nodes is optimal for interpolation over the set of n times differentiable functions whose n-th derivatives are bounded in absolute values by a constant M as shown by N. S. Hoang. Using a computer, one can approximate the values of the minimal Lebesgue constants, here for the canonical interval [−1, 1]:

n123456789
Λn(T)1.00001.25001.42291.55951.67221.76811.85161.92551.9917

There are uncountable infinitely many sets of nodes in [−1,1] that minimize, for fixed n > 1, the Lebesgue constant. Though if we assume that we always take −1 and 1 as nodes for interpolation (which is called a canonical node configuration), then such a set is unique and zero-symmetric. To illustrate this property, we shall see what happens when n = 2 (i.e. we consider 3 interpolation nodes in which case the property is not trivial). One can check that each set of (zero-symmetric) nodes of type (−a, 0, a) is optimal when 8/3a ≤ 1 (we consider only nodes in [−1, 1]). If we force the set of nodes to be of the type (−1, b, 1), then b must equal 0 (look at the Lebesgue function, whose maximum is the Lebesgue constant). All arbitrary (i.e. zero-symmetric or zero-asymmetric) optimal sets of nodes in [−1,1] when n = 2 have been determined by F. Schurer, and in an alternative fashion by H.-J. Rack and R. Vajda (2014).

If we assume that we take −1 and 1 as nodes for interpolation, then as shown by H.-J. Rack (1984 and 2013), for the case n = 3, the explicit values of the optimal (unique and zero-symmetric) 4 interpolation nodes and the explicit value of the minimal Lebesgue constant are known. All arbitrary optimal sets of 4 interpolation nodes in [1,1] when n = 3 have been explicitly determined, in two different but equivalent fashions, by H.-J. Rack and R. Vajda (2015).

The Padua points provide another set of nodes with slow growth (although not as slow as the Chebyshev nodes) and with the additional property of being a unisolvent point set.

Sensitivity of the values of a polynomial

The Lebesgue constants also arise in another problem. Let p(x) be a polynomial of degree n expressed in the Lagrangian form associated with the points in the vector t (i.e. the vector u of its coefficients is the vector containing the values ). Let be a polynomial obtained by slightly changing the coefficients u of the original polynomial p(x) to . Consider the inequality:

This means that the (relative) error in the values of will not be higher than the appropriate Lebesgue constant times the relative error in the coefficients. In this sense, the Lebesgue constant can be viewed as the relative condition number of the operator mapping each coefficient vector u to the set of the values of the polynomial with coefficients u in the Lagrange form. We can actually define such an operator for each polynomial basis but its condition number is greater than the optimal Lebesgue constant for most convenient bases.

Related Research Articles

Reed–Solomon codes are a group of error-correcting codes that were introduced by Irving S. Reed and Gustave Solomon in 1960. They have many applications, the most prominent of which include consumer technologies such as MiniDiscs, CDs, DVDs, Blu-ray discs, QR codes, data transmission technologies such as DSL and WiMAX, broadcast systems such as satellite communications, DVB and ATSC, and storage systems such as RAID 6.

In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints. It is named after the mathematician Joseph-Louis Lagrange. The basic idea is to convert a constrained problem into a form such that the derivative test of an unconstrained problem can still be applied. The relationship between the gradient of the function and gradients of the constraints rather naturally leads to a reformulation of the original problem, known as the Lagrangian function.

<span class="mw-page-title-main">Chebyshev polynomials</span> Polynomial sequence

The Chebyshev polynomials are two sequences of polynomials related to the cosine and sine functions, notated as and . They can be defined in several equivalent ways, one of which starts with trigonometric functions:

<span class="mw-page-title-main">Runge's phenomenon</span> Failure of convergence in interpolation

In the mathematical field of numerical analysis, Runge's phenomenon is a problem of oscillation at the edges of an interval that occurs when using polynomial interpolation with polynomials of high degree over a set of equispaced interpolation points. It was discovered by Carl David Tolmé Runge (1901) when exploring the behavior of errors when using polynomial interpolation to approximate certain functions. The discovery was important because it shows that going to higher degrees does not always improve accuracy. The phenomenon is similar to the Gibbs phenomenon in Fourier series approximations.

In numerical analysis, polynomial interpolation is the interpolation of a given data set by the polynomial of lowest possible degree that passes through the points of the dataset.

<span class="mw-page-title-main">Lagrange polynomial</span> Polynomials used for interpolation

In numerical analysis, the Lagrange interpolating polynomial is the unique polynomial of lowest degree that interpolates a given set of data.

In mathematics, the classical orthogonal polynomials are the most widely used orthogonal polynomials: the Hermite polynomials, Laguerre polynomials, Jacobi polynomials.

<span class="mw-page-title-main">Chebyshev nodes</span>

In numerical analysis, Chebyshev nodes are specific real algebraic numbers, namely the roots of the Chebyshev polynomials of the first kind. They are often used as nodes in polynomial interpolation because the resulting interpolation polynomial minimizes the effect of Runge's phenomenon.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

In mathematics, the von Mangoldt function is an arithmetic function named after German mathematician Hans von Mangoldt. It is an example of an important arithmetic function that is neither multiplicative nor additive.

The Remez algorithm or Remez exchange algorithm, published by Evgeny Yakovlevich Remez in 1934, is an iterative algorithm used to find simple approximations to functions, specifically, approximations by functions in a Chebyshev space that are the best in the uniform norm L sense. It is sometimes referred to as Remes algorithm or Reme algorithm.

<span class="mw-page-title-main">Chebyshev function</span>

In mathematics, the Chebyshev function is either a scalarising function or one of two related functions. The first Chebyshev functionϑ  (x) or θ (x) is given by

Clenshaw–Curtis quadrature and Fejér quadrature are methods for numerical integration, or "quadrature", that are based on an expansion of the integrand in terms of Chebyshev polynomials. Equivalently, they employ a change of variables and use a discrete cosine transform (DCT) approximation for the cosine series. Besides having fast-converging accuracy comparable to Gaussian quadrature rules, Clenshaw–Curtis quadrature naturally leads to nested quadrature rules, which is important for both adaptive quadrature and multidimensional quadrature (cubature).

In mathematics, the Remez inequality, discovered by the Soviet mathematician Evgeny Yakovlevich Remez, gives a bound on the sup norms of certain polynomials, the bound being attained by the Chebyshev polynomials.

In polynomial interpolation of two variables, the Padua points are the first known example of a unisolvent point set with minimal growth of their Lebesgue constant, proven to be . Their name is due to the University of Padua, where they were originally discovered.

In number theory, an average order of an arithmetic function is some simpler or better-understood function which takes the same values "on average".

Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods.

<span class="mw-page-title-main">Grunsky matrix</span>

In complex analysis and geometric function theory, the Grunsky matrices, or Grunsky operators, are infinite matrices introduced in 1939 by Helmut Grunsky. The matrices correspond to either a single holomorphic function on the unit disk or a pair of holomorphic functions on the unit disk and its complement. The Grunsky inequalities express boundedness properties of these matrices, which in general are contraction operators or in important special cases unitary operators. As Grunsky showed, these inequalities hold if and only if the holomorphic function is univalent. The inequalities are equivalent to the inequalities of Goluzin, discovered in 1947. Roughly speaking, the Grunsky inequalities give information on the coefficients of the logarithm of a univalent function; later generalizations by Milin, starting from the Lebedev–Milin inequality, succeeded in exponentiating the inequalities to obtain inequalities for the coefficients of the univalent function itself. The Grunsky matrix and its associated inequalities were originally formulated in a more general setting of univalent functions between a region bounded by finitely many sufficiently smooth Jordan curves and its complement: the results of Grunsky, Goluzin and Milin generalize to that case.

In mathematics, singular integral operators of convolution type are the singular integral operators that arise on Rn and Tn through convolution by distributions; equivalently they are the singular integral operators that commute with translations. The classical examples in harmonic analysis are the harmonic conjugation operator on the circle, the Hilbert transform on the circle and the real line, the Beurling transform in the complex plane and the Riesz transforms in Euclidean space. The continuity of these operators on L2 is evident because the Fourier transform converts them into multiplication operators. Continuity on Lp spaces was first established by Marcel Riesz. The classical techniques include the use of Poisson integrals, interpolation theory and the Hardy–Littlewood maximal function. For more general operators, fundamental new techniques, introduced by Alberto Calderón and Antoni Zygmund in 1952, were developed by a number of authors to give general criteria for continuity on Lp spaces. This article explains the theory for the classical operators and sketches the subsequent general theory.

References