Inverse function theorem

Last updated

In mathematics, specifically differential calculus, the inverse function theorem gives a sufficient condition for a function to be invertible in a neighborhood of a point in its domain: namely, that its derivative is continuous and non-zero at the point. The theorem also gives a formula for the derivative of the inverse function. In multivariable calculus, this theorem can be generalized to any continuously differentiable, vector-valued function whose Jacobian determinant is nonzero at a point in its domain, giving a formula for the Jacobian matrix of the inverse. There are also versions of the inverse function theorem for complex holomorphic functions, for differentiable maps between manifolds, for differentiable functions between Banach spaces, and so forth.

Contents

The theorem was first established by Picard and Goursat using an iterative scheme: the basic idea is to prove a fixed point theorem using the contraction mapping theorem.

Statements

For functions of a single variable, the theorem states that if is a continuously differentiable function with nonzero derivative at the point ; then is injective (or bijective onto the image) in a neighborhood of , the inverse is continuously differentiable near , and the derivative of the inverse function at is the reciprocal of the derivative of at :

It can happen that a function may be injective near a point while . An example is . In fact, for such a function, the inverse cannot be differentiable at , since if were differentiable at , then, by the chain rule, , which implies . (The situation is different for holomorphic functions; see #Holomorphic inverse function theorem below.)

For functions of more than one variable, the theorem states that if f is a continuously differentiable function from an open subset of into , and the derivative is invertible at a point a (that is, the determinant of the Jacobian matrix of f at a is non-zero), then there exist neighborhoods of in and of such that and is bijective. [1] Writing , this means that the system of n equations has a unique solution for in terms of when . Note that the theorem does not say is bijective onto the image where is invertible but that it is locally bijective where is invertible.

Moreover, the theorem says that the inverse function is continuously differentiable, and its derivative at is the inverse map of ; i.e.,

In other words, if are the Jacobian matrices representing , this means:

The hard part of the theorem is the existence and differentiability of . Assuming this, the inverse derivative formula follows from the chain rule applied to . (Indeed, ) Since taking the inverse is infinitely differentiable, the formula for the derivative of the inverse shows that if is continuously times differentiable, with invertible derivative at the point a, then the inverse is also continuously times differentiable. Here is a positive integer or .

There are two variants of the inverse function theorem. [1] Given a continuously differentiable map , the first is

and the second is

In the first case (when is surjective), the point is called a regular value. Since , the first case is equivalent to saying is not in the image of critical points (a critical point is a point such that the kernel of is nonzero). The statement in the first case is a special case of the submersion theorem.

These variants are restatements of the inverse functions theorem. Indeed, in the first case when is surjective, we can find an (injective) linear map such that . Define so that we have:

Thus, by the inverse function theorem, has inverse near ; i.e., near . The second case ( is injective) is seen in the similar way.

Example

Consider the vector-valued function defined by:

The Jacobian matrix is:

with Jacobian determinant:

The determinant is nonzero everywhere. Thus the theorem guarantees that, for every point p in , there exists a neighborhood about p over which F is invertible. This does not mean F is invertible over its entire domain: in this case F is not even injective since it is periodic: .

Counter-example

The function
f
(
x
)
=
x
+
2
x
2
sin
[?]
(
1
x
)
{\displaystyle f(x)=x+2x^{2}\sin({\tfrac {1}{x}})}
is bounded inside a quadratic envelope near the line
y
=
x
{\displaystyle y=x}
, so
f
'
(
0
)
=
1
{\displaystyle f'(0)=1}
. Nevertheless, it has local max/min points accumulating at
x
=
0
{\displaystyle x=0}
, so it is not one-to-one on any surrounding interval. Inv-Fun-Thm-3.png
The function is bounded inside a quadratic envelope near the line , so . Nevertheless, it has local max/min points accumulating at , so it is not one-to-one on any surrounding interval.

If one drops the assumption that the derivative is continuous, the function no longer need be invertible. For example and has discontinuous derivative and , which vanishes arbitrarily close to . These critical points are local max/min points of , so is not one-to-one (and not invertible) on any interval containing . Intuitively, the slope does not propagate to nearby points, where the slopes are governed by a weak but rapid oscillation.

Methods of proof

As an important result, the inverse function theorem has been given numerous proofs. The proof most commonly seen in textbooks relies on the contraction mapping principle, also known as the Banach fixed-point theorem (which can also be used as the key step in the proof of existence and uniqueness of solutions to ordinary differential equations). [2] [3]

Since the fixed point theorem applies in infinite-dimensional (Banach space) settings, this proof generalizes immediately to the infinite-dimensional version of the inverse function theorem [4] (see Generalizations below).

An alternate proof in finite dimensions hinges on the extreme value theorem for functions on a compact set. [5]

Yet another proof uses Newton's method, which has the advantage of providing an effective version of the theorem: bounds on the derivative of the function imply an estimate of the size of the neighborhood on which the function is invertible. [6]

A proof using successive approximation

To prove existence, it can be assumed after an affine transformation that and , so that .

By the mean value theorem for vector-valued functions, for a function , . Setting , it follows that

Now choose so that for . Suppose that and define inductively by and . The assumptions show that if then

.

In particular implies . In the inductive scheme and . Thus is a Cauchy sequence tending to . By construction as required.

To check that is C1, write so that . By the inequalities above, so that . On the other hand if , then . Using the geometric series for , it follows that . But then

tends to 0 as and tend to 0, proving that is C1 with .

The proof above is presented for a finite-dimensional space, but applies equally well for Banach spaces. If an invertible function is Ck with , then so too is its inverse. This follows by induction using the fact that the map on operators is Ck for any (in the finite-dimensional case this is an elementary fact because the inverse of a matrix is given as the adjugate matrix divided by its determinant). [1] [7] The method of proof here can be found in the books of Henri Cartan, Jean Dieudonné, Serge Lang, Roger Godement and Lars Hörmander.

A proof using the contraction mapping principle

Here is a proof based on the contraction mapping theorem. Specifically, following T. Tao, [8] it uses the following consequence of the contraction mapping theorem.

Lemma  Let denote an open ball of radius r in with center 0. If is a map such that and there exists a constant such that

for all in , then is injective on and .

(More generally, the statement remains true if is replaced by a Banach space.)

Basically, the lemma says that a small perturbation of the identity map by a contraction map is injective and preserves a ball in some sense. Assuming the lemma for a moment, we prove the theorem first. As in the above proof, it is enough to prove the special case when and . Let . The mean value inequality applied to says:

Since and is continuous, we can find an such that

for all in . Then the early lemma says that is injective on and . Then

is bijective and thus has an inverse. Next, we show the inverse is continuously differentiable (this part of the argument is the same as that in the previous proof). This time, let denote the inverse of and . For , we write or . Now, by the early estimate, we have

and so . Writing for the operator norm,

As , we have and is bounded. Hence, is differentiable at with the derivative . Also, is the same as the composition where ; so is continuous.

It remains to show the lemma. First, the map is injective on since if , then and so

,

which is a contradiction unless . (This part does not need the assumption .) Next we show . The idea is to note that this is equivalent to, given a point in , find a fixed point of the map

where such that and the bar means a closed ball. To find a fixed point, we use the contraction mapping theorem and checking that is a well-defined strict-contraction mapping is straightforward. Finally, we have: since

As might be clear, this proof is not substantially different from the previous one, as the proof of the contraction mapping theorem is by successive approximation.

Applications

Implicit function theorem

The inverse function theorem can be used to solve a system of equations

i.e., expressing as functions of , provided the Jacobian matrix is invertible. The implicit function theorem allows to solve a more general system of equations:

for in terms of . Though more general, the theorem is actually a consequence of the inverse function theorem. First, the precise statement of the implicit function theorem is as follows: [9]

To see this, consider the map . By the inverse function theorem, has the inverse for some neighborhoods . We then have:

implying and Thus has the required property.

Giving a manifold structure

In differential geometry, the inverse function theorem is used to show that the pre-image of a regular value under a smooth map is a manifold. [10] Indeed, let be such a smooth map from an open subset of (since the result is local, there is no loss of generality with considering such a map). Fix a point in and then, by permuting the coordinates on , assume the matrix has rank . Then the map is such that has rank . Hence, by the inverse function theorem, we find the smooth inverse of defined in a neighborhood of . We then have

which implies

That is, after the change of coordinates by , is a coordinate projection (this fact is known as the submersion theorem). Moreover, since is bijective, the map

is bijective with the smooth inverse. That is to say, gives a local parametrization of around . Hence, is a manifold. (Note the proof is quite similar to the proof of the implicit function theorem and, in fact, the implicit function theorem can be also used instead.)

More generally, the theorem shows that if a smooth map is transversal to a submanifold , then the pre-image is a submanifold. [11]

Global version

The inverse function theorem is a local result; it applies to each point. A priori, the theorem thus only shows the function is locally bijective (or locally diffeomorphic of some class). The next topological lemma can be used to upgrade local injectivity to injectivity that is global to some extent.

Lemma   [12] [ full citation needed ] [13] If is a closed subset of a (second-countable) topological manifold (or, more generally, a topological space admitting an exhaustion by compact subsets) and , some topological space, is a local homeomorphism that is injective on , then is injective on some neighborhood of .

Proof: [14] First assume is compact. If the conclusion of the theorem is false, we can find two sequences such that and each converge to some points in . Since is injective on , . Now, if is large enough, are in a neighborhood of where is injective; thus, , a contradiction.

In general, consider the set . It is disjoint from for any subset where is injective. Let be an increasing sequence of compact subsets with union and with contained in the interior of . Then, by the first part of the proof, for each , we can find a neighborhood of such that . Then has the required property. (See also [15] for an alternative approach.)

The lemma implies the following (a sort of) global version of the inverse function theorem:

Inverse function theorem   [16] Let be a map between open subsets of or more generally of manifolds. Assume is continuously differentiable (or is ). If is injective on a closed subset and if the Jacobian matrix of is invertible at each point of , then is injective in a neighborhood of and is continuously differentiable (or is ).

Note that if is a point, then the above is the usual inverse function theorem.

Holomorphic inverse function theorem

There is a version of the inverse function theorem for holomorphic maps.

Theorem   [17] [18] Let be open subsets such that and a holomorphic map whose Jacobian matrix in variables is invertible (the determinant is nonzero) at . Then is injective in some neighborhood of and the inverse is holomorphic.

The theorem follows from the usual inverse function theorem. Indeed, let denote the Jacobian matrix of in variables and for that in . Then we have , which is nonzero by assumption. Hence, by the usual inverse function theorem, is injective near with continuously differentiable inverse. By chain rule, with ,

where the left-hand side and the first term on the right vanish since and are holomorphic. Thus, for each .

Similarly, there is the implicit function theorem for holomorphic functions. [19]

As already noted earlier, it can happen that an injective smooth function has the inverse that is not smooth (e.g., in a real variable). This is not the case for holomorphic functions because of:

Proposition   [19] If is an injective holomorphic map between open subsets of , then is holomorphic.

Formulations for manifolds

The inverse function theorem can be rephrased in terms of differentiable maps between differentiable manifolds. In this context the theorem states that for a differentiable map (of class ), if the differential of ,

is a linear isomorphism at a point in then there exists an open neighborhood of such that

is a diffeomorphism. Note that this implies that the connected components of M and N containing p and F(p) have the same dimension, as is already directly implied from the assumption that dFp is an isomorphism. If the derivative of F is an isomorphism at all points p in M then the map F is a local diffeomorphism.

Generalizations

Banach spaces

The inverse function theorem can also be generalized to differentiable maps between Banach spaces X and Y. [20] Let U be an open neighbourhood of the origin in X and a continuously differentiable function, and assume that the Fréchet derivative of F at 0 is a bounded linear isomorphism of X onto Y. Then there exists an open neighbourhood V of in Y and a continuously differentiable map such that for all y in V. Moreover, is the only sufficiently small solution x of the equation .

There is also the inverse function theorem for Banach manifolds. [21]

Constant rank theorem

The inverse function theorem (and the implicit function theorem) can be seen as a special case of the constant rank theorem, which states that a smooth map with constant rank near a point can be put in a particular normal form near that point. [22] Specifically, if has constant rank near a point , then there are open neighborhoods U of p and V of and there are diffeomorphisms and such that and such that the derivative is equal to . That is, F "looks like" its derivative near p. The set of points such that the rank is constant in a neighborhood of is an open dense subset of M; this is a consequence of semicontinuity of the rank function. Thus the constant rank theorem applies to a generic point of the domain.

When the derivative of F is injective (resp. surjective) at a point p, it is also injective (resp. surjective) in a neighborhood of p, and hence the rank of F is constant on that neighborhood, and the constant rank theorem applies.

Polynomial functions

If it is true, the Jacobian conjecture would be a variant of the inverse function theorem for polynomials. It states that if a vector-valued polynomial function has a Jacobian determinant that is an invertible polynomial (that is a nonzero constant), then it has an inverse that is also a polynomial function. It is unknown whether this is true or false, even in the case of two variables. This is a major open problem in the theory of polynomials.

Selections

When with , is times continuously differentiable, and the Jacobian at a point is of rank , the inverse of may not be unique. However, there exists a local selection function such that for all in a neighborhood of , , is times continuously differentiable in this neighborhood, and ( is the Moore–Penrose pseudoinverse of ). [23]

See also

Notes

  1. 1 2 3 Theorem 1.1.7. in Hörmander, Lars (2015). The Analysis of Linear Partial Differential Operators I: Distribution Theory and Fourier Analysis. Classics in Mathematics (2nd ed.). Springer. ISBN   978-3-642-61497-2.
  2. McOwen, Robert C. (1996). "Calculus of Maps between Banach Spaces". Partial Differential Equations: Methods and Applications. Upper Saddle River, NJ: Prentice Hall. pp. 218–224. ISBN   0-13-121880-8.
  3. Tao, Terence (12 September 2011). "The inverse function theorem for everywhere differentiable maps" . Retrieved 26 July 2019.
  4. Jaffe, Ethan. "Inverse Function Theorem" (PDF).
  5. Spivak 1965 , pages 31–35
  6. Hubbard, John H.; Hubbard, Barbara Burke (2001). Vector Analysis, Linear Algebra, and Differential Forms: A Unified Approach (Matrix ed.).
  7. Cartan, Henri (1971). Calcul Differentiel (in French). Hermann. pp. 55–61. ISBN   978-0-395-12033-0.
  8. Theorem 17.7.2 in Tao, Terence (2014). Analysis. II. Texts and Readings in Mathematics. Vol. 38 (Third edition of 2006 original ed.). New Delhi: Hindustan Book Agency. ISBN   978-93-80250-65-6. MR   3310023. Zbl   1300.26003.
  9. Spivak 1965 , Theorem 2-12.
  10. Spivak 1965 , Theorem 5-1. and Theorem 2-13.
  11. "Transversality" (PDF). northwestern.edu.
  12. One of Spivak's books (Editorial note: give the exact location).
  13. Hirsch 1976, Ch. 2, § 1., Exercise 7. NB: This one is for a -immersion.
  14. Lemma 13.3.3. of Lectures on differential topology utoronto.ca
  15. Dan Ramras (https://mathoverflow.net/users/4042/dan-ramras), On a proof of the existence of tubular neighborhoods., URL (version: 2017-04-13): https://mathoverflow.net/q/58124
  16. Ch. I., § 3, Exercise 10. and § 8, Exercise 14. in V. Guillemin, A. Pollack. "Differential Topology". Prentice-Hall Inc., 1974. ISBN 0-13-212605-2.
  17. Griffiths & Harris 1978, p. 18.
  18. Fritzsche, K.; Grauert, H. (2002). From Holomorphic Functions to Complex Manifolds. Springer. pp. 33–36. ISBN   978-0-387-95395-3.
  19. 1 2 Griffiths & Harris 1978 , p. 19.
  20. Luenberger, David G. (1969). Optimization by Vector Space Methods. New York: John Wiley & Sons. pp. 240–242. ISBN   0-471-55359-X.
  21. Lang, Serge (1985). Differential Manifolds. New York: Springer. pp. 13–19. ISBN   0-387-96113-5.
  22. Boothby, William M. (1986). An Introduction to Differentiable Manifolds and Riemannian Geometry (Second ed.). Orlando: Academic Press. pp.  46–50. ISBN   0-12-116052-1.
  23. Dontchev, Asen L.; Rockafellar, R. Tyrrell (2014). Implicit Functions and Solution Mappings: A View from Variational Analysis (Second ed.). New York: Springer-Verlag. p. 54. ISBN   978-1-4939-1036-6.

Related Research Articles

<span class="mw-page-title-main">Diffeomorphism</span> Isomorphism of smooth manifolds; a smooth bijection with a smooth inverse

In mathematics, a diffeomorphism is an isomorphism of smooth manifolds. It is an invertible function that maps one differentiable manifold to another such that both the function and its inverse are continuously differentiable.

The Hahn–Banach theorem is a central tool in functional analysis. It allows the extension of bounded linear functionals defined on a vector subspace of some vector space to the whole space, and it also shows that there are "enough" continuous linear functionals defined on every normed vector space to make the study of the dual space "interesting". Another version of the Hahn–Banach theorem is known as the Hahn–Banach separation theorem or the hyperplane separation theorem, and has numerous uses in convex geometry.

<span class="mw-page-title-main">Mean value theorem</span> On the existence of a tangent to an arc parallel to the line through its endpoints

In mathematics, the mean value theorem states, roughly, that for a given planar arc between two endpoints, there is at least one point at which the tangent to the arc is parallel to the secant through its endpoints. It is one of the most important results in real analysis. This theorem is used to prove statements about a function on an interval starting from local hypotheses about derivatives at points of the interval.

The Riesz representation theorem, sometimes called the Riesz–Fréchet representation theorem after Frigyes Riesz and Maurice René Fréchet, establishes an important connection between a Hilbert space and its continuous dual space. If the underlying field is the real numbers, the two are isometrically isomorphic; if the underlying field is the complex numbers, the two are isometrically anti-isomorphic. The (anti-) isomorphism is a particular natural isomorphism.

In mathematics, an injective function (also known as injection, or one-to-one function ) is a function f that maps distinct elements of its domain to distinct elements; that is, x1x2 implies f(x1) ≠ f(x2). (Equivalently, f(x1) = f(x2) implies x1 = x2 in the equivalent contrapositive statement.) In other words, every element of the function's codomain is the image of at most one element of its domain. The term one-to-one function must not be confused with one-to-one correspondence that refers to bijective functions, which are functions such that each element in the codomain is an image of exactly one element in the domain.

<span class="mw-page-title-main">Ring (mathematics)</span> Algebraic structure with addition and multiplication

In mathematics, rings are algebraic structures that generalize fields: multiplication need not be commutative and multiplicative inverses need not exist. Informally, a ring is a set equipped with two binary operations satisfying properties analogous to those of addition and multiplication of integers. Ring elements may be numbers such as integers or complex numbers, but they may also be non-numerical objects such as polynomials, square matrices, functions, and power series.

Distributions, also known as Schwartz distributions or generalized functions, are objects that generalize the classical notion of functions in mathematical analysis. Distributions make it possible to differentiate functions whose derivatives do not exist in the classical sense. In particular, any locally integrable function has a distributional derivative.

<span class="mw-page-title-main">Semi-continuity</span> Property of functions which is weaker than continuity

In mathematical analysis, semicontinuity is a property of extended real-valued functions that is weaker than continuity. An extended real-valued function is uppersemicontinuous at a point if, roughly speaking, the function values for arguments near are not much higher than

<span class="mw-page-title-main">Function (mathematics)</span> Association of one output to each input

In mathematics, a function from a set X to a set Y assigns to each element of X exactly one element of Y. The set X is called the domain of the function and the set Y is called the codomain of the function.

In vector calculus, Green's theorem relates a line integral around a simple closed curve C to a double integral over the plane region D bounded by C. It is the two-dimensional special case of Stokes' theorem.

In mathematics, the uniform boundedness principle or Banach–Steinhaus theorem is one of the fundamental results in functional analysis. Together with the Hahn–Banach theorem and the open mapping theorem, it is considered one of the cornerstones of the field. In its basic form, it asserts that for a family of continuous linear operators whose domain is a Banach space, pointwise boundedness is equivalent to uniform boundedness in operator norm.

<span class="mw-page-title-main">Foliation</span> In mathematics, a type of equivalence relation on an n-manifold

In mathematics, a foliation is an equivalence relation on an n-manifold, the equivalence classes being connected, injectively immersed submanifolds, all of the same dimension p, modeled on the decomposition of the real coordinate space Rn into the cosets x + Rp of the standardly embedded subspace Rp. The equivalence classes are called the leaves of the foliation. If the manifold and/or the submanifolds are required to have a piecewise-linear, differentiable, or analytic structure then one defines piecewise-linear, differentiable, or analytic foliations, respectively. In the most important case of differentiable foliation of class Cr it is usually understood that r ≥ 1. The number p is called the dimension of the foliation and q = np is called its codimension.

In mathematics, more specifically in topology, an open map is a function between two topological spaces that maps open sets to open sets. That is, a function is open if for any open set in the image is open in Likewise, a closed map is a function that maps closed sets to closed sets. A map may be open, closed, both, or neither; in particular, an open map need not be closed and vice versa.

In functional analysis and related areas of mathematics, locally convex topological vector spaces (LCTVS) or locally convex spaces are examples of topological vector spaces (TVS) that generalize normed spaces. They can be defined as topological vector spaces whose topology is generated by translations of balanced, absorbent, convex sets. Alternatively they can be defined as a vector space with a family of seminorms, and a topology can be defined in terms of that family. Although in general such spaces are not necessarily normable, the existence of a convex local base for the zero vector is strong enough for the Hahn–Banach theorem to hold, yielding a sufficiently rich theory of continuous linear functionals.

In multivariable calculus, the implicit function theorem is a tool that allows relations to be converted to functions of several real variables. It does so by representing the relation as the graph of a function. There may not be a single function whose graph can represent the entire relation, but there may be such a function on a restriction of the domain of the relation. The implicit function theorem gives a sufficient condition to ensure that there is such a function.

The Arzelà–Ascoli theorem is a fundamental result of mathematical analysis giving necessary and sufficient conditions to decide whether every sequence of a given family of real-valued continuous functions defined on a closed and bounded interval has a uniformly convergent subsequence. The main condition is the equicontinuity of the family of functions. The theorem is the basis of many proofs in mathematics, including that of the Peano existence theorem in the theory of ordinary differential equations, Montel's theorem in complex analysis, and the Peter–Weyl theorem in harmonic analysis and various results concerning compactness of integral operators.

<span class="mw-page-title-main">Bump function</span> Smooth and compactly supported function

In mathematics, a bump function is a function on a Euclidean space which is both smooth and compactly supported. The set of all bump functions with domain forms a vector space, denoted or The dual space of this space endowed with a suitable topology is the space of distributions.

In mathematics, subharmonic and superharmonic functions are important classes of functions used extensively in partial differential equations, complex analysis and potential theory.

In algebraic geometry, a morphism between algebraic varieties is a function between the varieties that is given locally by polynomials. It is also called a regular map. A morphism from an algebraic variety to the affine line is also called a regular function. A regular map whose inverse is also regular is called biregular, and the biregular maps are the isomorphisms of algebraic varieties. Because regular and biregular are very restrictive conditions – there are no non-constant regular functions on projective varieties – the concepts of rational and birational maps are widely used as well; they are partial functions that are defined locally by rational fractions instead of polynomials.

In mathematics, calculus on Euclidean space is a generalization of calculus of functions in one or several variables to calculus of functions on Euclidean space as well as a finite-dimensional real vector space. This calculus is also known as advanced calculus, especially in the United States. It is similar to multivariable calculus but is somewhat more sophisticated in that it uses linear algebra more extensively and covers some concepts from differential geometry such as differential forms and Stokes' formula in terms of differential forms. This extensive use of linear algebra also allows a natural generalization of multivariable calculus to calculus on Banach spaces or topological vector spaces.

References