In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969. [1] [2]
In these methods the idea is to find for some smooth . Each step often involves approximately solving the subproblem where is the current best guess, is a search direction, and is the step length.
The inexact line searches provide an efficient way of computing an acceptable step length that reduces the objective function 'sufficiently', rather than minimizing the objective function over exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed , before finding a new search direction .
A step length is said to satisfy the Wolfe conditions, restricted to the direction , if the following two inequalities hold:
with . (In examining condition (ii), recall that to ensure that is a descent direction, we have , as in the case of gradient descent, where , or Newton–Raphson, where with positive definite.)
is usually chosen to be quite small while is much larger; Nocedal and Wright give example values of and for Newton or quasi-Newton methods and for the nonlinear conjugate gradient method. [3] Inequality i) is known as the Armijo rule [4] and ii) as the curvature condition; i) ensures that the step length decreases 'sufficiently', and ii) ensures that the slope has been reduced sufficiently. Conditions i) and ii) can be interpreted as respectively providing an upper and lower bound on the admissible step length values.
Denote a univariate function restricted to the direction as . The Wolfe conditions can result in a value for the step length that is not close to a minimizer of . If we modify the curvature condition to the following,
then i) and iii) together form the so-called strong Wolfe conditions, and force to lie close to a critical point of .
The principal reason for imposing the Wolfe conditions in an optimization algorithm where is to ensure convergence of the gradient to zero. In particular, if the cosine of the angle between and the gradient, is bounded away from zero and the i) and ii) conditions hold, then .
An additional motivation, in the case of a quasi-Newton method, is that if , where the matrix is updated by the BFGS or DFP formula, then if is positive definite ii) implies is also positive definite.
Wolfe's conditions are more complicated than Armijo's condition, and a gradient descent algorithm based on Armijo's condition has a better theoretical guarantee than one based on Wolfe conditions (see the sections on "Upper bound for learning rates" and "Theoretical guarantee" in the Backtracking line search article).
In mathematics, any vector space has a corresponding dual vector space consisting of all linear forms on together with the vector space structure of pointwise addition and scalar multiplication by constants.
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field whose value at a point gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of . If the gradient of a function is non-zero at a point , the direction of the gradient is the direction in which the function increases most quickly from , and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative. Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to minimize a function by gradient descent. In coordinate-free terms, the gradient of a function may be defined by:
The Navier–Stokes equations are partial differential equations which describe the motion of viscous fluid substances. They were named after French engineer and physicist Claude-Louis Navier and the Irish physicist and mathematician George Gabriel Stokes. They were developed over several decades of progressively building the theories, from 1822 (Navier) to 1842–1850 (Stokes).
Poisson's equation is an elliptic partial differential equation of broad utility in theoretical physics. For example, the solution to Poisson's equation is the potential field caused by a given electric charge or mass density distribution; with the potential field known, one can then calculate the corresponding electrostatic or gravitational (force) field. It is a generalization of Laplace's equation, which is also frequently seen in physics. The equation is named after French mathematician and physicist Siméon Denis Poisson who published it in 1823.
In the calculus of variations, a field of mathematical analysis, the functional derivative relates a change in a functional to a change in a function on which the functional depends.
In mathematics, a Green's function is the impulse response of an inhomogeneous linear differential operator defined on a domain with specified initial conditions or boundary conditions.
In vector calculus, a conservative vector field is a vector field that is the gradient of some function. A conservative vector field has the property that its line integral is path independent; the choice of path between two points does not change the value of the line integral. Path independence of the line integral is equivalent to the vector field under the line integral being conservative. A conservative vector field is also irrotational; in three dimensions, this means that it has vanishing curl. An irrotational vector field is necessarily conservative provided that the domain is simply connected.
A Halbach array is a special arrangement of permanent magnets that augments the magnetic field on one side of the array while cancelling the field to near zero on the other side. This is achieved by having a spatially rotating pattern of magnetisation.
In mathematics, the total variation identifies several slightly different concepts, related to the (local or global) structure of the codomain of a function or a measure. For a real-valued continuous function f, defined on an interval [a, b] ⊂ R, its total variation on the interval of definition is a measure of the one-dimensional arclength of the curve with parametric equation x ↦ f(x), for x ∈ [a, b]. Functions whose total variation is finite are called functions of bounded variation.
In physics and mathematics, the Helmholtz decomposition theorem or the fundamental theorem of vector calculus states that certain differentiable vector fields can be resolved into the sum of an irrotational (curl-free) vector field and a solenoidal (divergence-free) vector field. In physics, often only the decomposition of sufficiently smooth, rapidly decaying vector fields in three dimensions is discussed. It is named after Hermann von Helmholtz.
In optimization, line search is a basic iterative approach to find a local minimum of an objective function . It first finds a descent direction along which the objective function will be reduced, and then computes a step size that determines how far should move along that direction. The descent direction can be computed by various methods, such as gradient descent or quasi-Newton method. The step size can be determined either exactly or inexactly.
In (unconstrained) mathematical optimization, a backtracking line search is a line search method to determine the amount to move along a given search direction. Its use requires that the objective function is differentiable and that its gradient is known.
In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines the descent direction by preconditioning the gradient with curvature information. It does so by gradually improving an approximation to the Hessian matrix of the loss function, obtained only from gradient evaluations via a generalized secant method.
The diabatic representation as a mathematical tool for theoretical calculations of atomic collisions and of molecular interactions.
The Frank–Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method, reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956. In each iteration, the Frank–Wolfe algorithm considers a linear approximation of the objective function, and moves towards a minimizer of this linear function.
In electromagnetism and applications, an inhomogeneous electromagnetic wave equation, or nonhomogeneous electromagnetic wave equation, is one of a set of wave equations describing the propagation of electromagnetic waves generated by nonzero source charges and currents. The source terms in the wave equations make the partial differential equations inhomogeneous, if the source terms are zero the equations reduce to the homogeneous electromagnetic wave equations, which follow from Maxwell's equations.
The gradient theorem, also known as the fundamental theorem of calculus for line integrals, says that a line integral through a gradient field can be evaluated by evaluating the original scalar field at the endpoints of the curve. The theorem is a generalization of the second fundamental theorem of calculus to any curve in a plane or space rather than just the real line.
The Navier–Stokes existence and smoothness problem concerns the mathematical properties of solutions to the Navier–Stokes equations, a system of partial differential equations that describe the motion of a fluid in space. Solutions to the Navier–Stokes equations are used in many practical applications. However, theoretical understanding of the solutions to these equations is incomplete. In particular, solutions of the Navier–Stokes equations often include turbulence, which remains one of the greatest unsolved problems in physics, despite its immense importance in science and engineering.
There are various mathematical descriptions of the electromagnetic field that are used in the study of electromagnetism, one of the four fundamental interactions of nature. In this article, several approaches are discussed, although the equations are in terms of electric and magnetic fields, potentials, and charges with currents, generally speaking.
In differential calculus, there is no single uniform notation for differentiation. Instead, various notations for the derivative of a function or variable have been proposed by various mathematicians. The usefulness of each notation varies with the context, and it is sometimes advantageous to use more than one notation in a given context. The most common notations for differentiation are listed below.