Automatic basis function construction

Last updated

In machine learning, automatic basis function construction (or basis discovery) is the mathematical method of looking for a set of task-independent basis functions that map the state space to a lower-dimensional embedding, while still representing the value function accurately. Automatic basis construction is independent of prior knowledge of the domain, which allows it to perform well where expert-constructed basis functions are difficult or impossible to create.

Contents

Motivation

In reinforcement learning (RL), most real-world Markov Decision Process (MDP) problems have large or continuous state spaces, which typically require some sort of approximation to be represented efficiently.

 Linear function approximators [1]  (LFAs) are widely adopted for their low theoretical complexity. Two sub-problems needs to be solved for better approximation: weight optimization and basis construction. To solve the second problem, one way is to design special basis functions. Those basis functions work well in specific tasks but are significantly restricted to domains. Thus constructing basis construction functions automatically is preferred for broader applications.[ citation needed ]

Problem definition

A Markov decision process with finite state space and fixed policy is defined with a 5-tuple , which includes the finite state space , the finite action space , the reward function , discount factor , and the transition model .

Bellman equation is defined as:

When the number of elements in is small, is usually maintained as tabular form. While grows too large for this kind of representation. is commonly being approximated via a linear combination of basis function , [2] so that we have:

Here is a matrix in which every row contains a feature vector for corresponding row, is a weight vector with n parameters and usually .

Basis construction looks for ways to automatically construct better basis function which can represent the value function well.

A good construction method should have the following characteristics:

Proto-value basis

In this approach, Mahadevan analyzes the connectivity graph between states to determine a set of basis functions. [3]

The normalized graph Laplacian is defined as:

Here W is an adjacency matrix which represents the states of fixed policy MDP which forms an undirected graph (N,E). D is a diagonal matrix related to nodes' degrees.

In discrete state space, the adjacency matrix could be constructed by simply checking whether two states are connected, and D could be calculated by summing up every row of W. In continuous state space, we could take random walk Laplacian of W.

This spectral framework can be used for value function approximation (VFA). Given the fixed policy, the edge weights are determined by corresponding states' transition probability. To get smooth value approximation, diffusion wavelets are used. [3]

Krylov basis

Krylov basis construction uses the actual transition matrix instead of random walk Laplacian. The assumption of this method is that transition model P and reward r are available.

The vectors in Neumann series are denoted as for all .

It shows that Krylov space spanned by is enough to represent any value function, [4] and m is the degree of minimal polynomial of .

Suppose the minimal polynomial is , and we have , the value function can be written as:

[5]
Algorithm Augmented Krylov Method [5]
are top real eigenvectors of P
fordo
ifthen
;
end if
fordo
end for
ifthen
break;
end if
end for
  • k: number of eigenvectors in basis
  • l: total number of vectors

Bellman error basis

Bellman error (or BEBFs) is defined as: .

Loosely speaking, Bellman error points towards the optimal value function. [6] The sequence of BEBF form a basis space which is orthogonal to the real value function space; thus with sufficient number of BEBFs, any value function can be represented exactly.

Algorithm BEBF
stage stage i=1, ;
stage
compute the weight vector according to current basis function ;
compute new bellman error by ;
add bellman error to form new basis function: ;
  • N represents the number of iterations till convergence.
  • ":" means juxtaposing matrices or vectors.

Bellman average reward bases

Bellman Average Reward Bases (or BARBs) [7] is similar to Krylov Bases, but the reward function is being dilated by the average adjusted transition matrix . Here can be calculated by many methods in. [8]

BARBs converges faster than BEBFs and Krylov when is close to 1.

Algorithm BARBs
stage stage i=1, ;
stage
compute the weight vector according to current basis function ;
compute new basis: , and add it to form new bases matrix;
  • N represents the number of iterations till convergence.
  • ":" means juxtaposing matrices or vectors.

Discussion and analysis

There are two principal types of basis construction methods.

The first type of methods are reward-sensitive, like Krylov and BEBFs; they dilate the reward function geometrically through transition matrix. However, when discount factor approaches to 1, Krylov and BEBFs converge slowly. This is because the error Krylov based methods are restricted by Chebyshev polynomial bound. [5] To solve this problem, methods such as BARBs are proposed. BARBs is an incremental variant of Drazin bases, and converges faster than Krylov and BEBFs when becomes large.

Another type is reward-insensitive proto value basis function derived from graph Lapalacian. This method uses graph information, but the construction of adjacency matrix makes this method hard to analyze. [5]

See also

Related Research Articles

In optics, polarized light can be described using the Jones calculus, invented by R. C. Jones in 1941. Polarized light is represented by a Jones vector, and linear optical elements are represented by Jones matrices. When light crosses an optical element the resulting polarization of the emerging light is found by taking the product of the Jones matrix of the optical element and the Jones vector of the incident light. Note that Jones calculus is only applicable to light that is already fully polarized. Light which is randomly polarized, partially polarized, or incoherent must be treated using Mueller calculus.

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition.

<span class="mw-page-title-main">Spherical harmonics</span> Special mathematical functions defined on the surface of a sphere

In mathematics and physical science, spherical harmonics are special functions defined on the surface of a sphere. They are often employed in solving partial differential equations in many scientific fields. A list of the spherical harmonics is available in Table of spherical harmonics.

Superspace is the coordinate space of a theory exhibiting supersymmetry. In such a formulation, along with ordinary space dimensions x, y, z, ..., there are also "anticommuting" dimensions whose coordinates are labeled in Grassmann numbers rather than real numbers. The ordinary space dimensions correspond to bosonic degrees of freedom, the anticommuting dimensions to fermionic degrees of freedom.

<span class="mw-page-title-main">Scalar potential</span> When potential energy difference depends only on displacement

In mathematical physics, scalar potential, simply stated, describes the situation where the difference in the potential energies of an object in two different positions depends only on the positions, not upon the path taken by the object in traveling from one position to the other. It is a scalar field in three-space: a directionless value (scalar) that depends only on its location. A familiar example is potential energy due to gravity.

In physics, the Hamilton–Jacobi equation, named after William Rowan Hamilton and Carl Gustav Jacob Jacobi, is an alternative formulation of classical mechanics, equivalent to other formulations such as Newton's laws of motion, Lagrangian mechanics and Hamiltonian mechanics.

The solar azimuth angle is the azimuth of the Sun's position. This horizontal coordinate defines the Sun's relative direction along the local horizon, whereas the solar zenith angle defines the Sun's apparent altitude.

In rotordynamics, the rigid rotor is a mechanical model of rotating systems. An arbitrary rigid rotor is a 3-dimensional rigid object, such as a top. To orient such an object in space requires three angles, known as Euler angles. A special rigid rotor is the linear rotor requiring only two angles to describe, for example of a diatomic molecule. More general molecules are 3-dimensional, such as water, ammonia, or methane.

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

In mathematics and physics, the Christoffel symbols are an array of numbers describing a metric connection. The metric connection is a specialization of the affine connection to surfaces or other manifolds endowed with a metric, allowing distances to be measured on that surface. In differential geometry, an affine connection can be defined without reference to a metric, and many additional concepts follow: parallel transport, covariant derivatives, geodesics, etc. also do not require the concept of a metric. However, when a metric is available, these concepts can be directly tied to the "shape" of the manifold itself; that shape is determined by how the tangent space is attached to the cotangent space by the metric tensor. Abstractly, one would say that the manifold has an associated (orthonormal) frame bundle, with each "frame" being a possible choice of a coordinate frame. An invariant metric implies that the structure group of the frame bundle is the orthogonal group O(p, q). As a result, such a manifold is necessarily a (pseudo-)Riemannian manifold. The Christoffel symbols provide a concrete representation of the connection of (pseudo-)Riemannian geometry in terms of coordinates on the manifold. Additional concepts, such as parallel transport, geodesics, etc. can then be expressed in terms of Christoffel symbols.

In probability and statistics, a circular distribution or polar distribution is a probability distribution of a random variable whose values are angles, usually taken to be in the range [0, 2π). A circular distribution is often a continuous probability distribution, and hence has a probability density, but such distributions can also be discrete, in which case they are called circular lattice distributions. Circular distributions can be used even when the variables concerned are not explicitly angles: the main consideration is that there is not usually any real distinction between events occurring at the lower or upper end of the range, and the division of the range could notionally be made at any point.

<span class="mw-page-title-main">Arc length</span> Distance along a curve

Arc length is the distance between two points along a section of a curve.

In geometry, various formalisms exist to express a rotation in three dimensions as a mathematical transformation. In physics, this concept is applied to classical mechanics where rotational kinematics is the science of quantitative description of a purely rotational motion. The orientation of an object at a given instant is described with the same tools, as it is defined as an imaginary rotation from a reference placement in space, rather than an actually observed rotation from a previous placement in space.

A hydrogen-like atom (or hydrogenic atom) is any atom or ion with a single valence electron. These atoms are isoelectronic with hydrogen. Examples of hydrogen-like atoms include, but are not limited to, hydrogen itself, all alkali metals such as Rb and Cs, singly ionized alkaline earth metals such as Ca+ and Sr+ and other ions such as He+, Li2+, and Be3+ and isotopes of any of the above. A hydrogen-like atom includes a positively charged core consisting of the atomic nucleus and any core electrons as well as a single valence electron. Because helium is common in the universe, the spectroscopy of singly ionized helium is important in EUV astronomy, for example, of DO white dwarf stars.

<span class="mw-page-title-main">Gravitational lensing formalism</span>

In general relativity, a point mass deflects a light ray with impact parameter by an angle approximately equal to

In mathematics, the Butcher group, named after the New Zealand mathematician John C. Butcher by Hairer & Wanner (1974), is an infinite-dimensional Lie group first introduced in numerical analysis to study solutions of non-linear ordinary differential equations by the Runge–Kutta method. It arose from an algebraic formalism involving rooted trees that provides formal power series solutions of the differential equation modeling the flow of a vector field. It was Cayley (1857), prompted by the work of Sylvester on change of variables in differential calculus, who first noted that the derivatives of a composition of functions can be conveniently expressed in terms of rooted trees and their combinatorics.

In probability theory and directional statistics, a wrapped probability distribution is a continuous probability distribution that describes data points that lie on a unit n-sphere. In one dimension, a wrapped distribution consists of points on the unit circle. If is a random variate in the interval with probability density function (PDF) , then is a circular variable distributed according to the wrapped distribution and is an angular variable in the interval distributed according to the wrapped distribution .

<span class="mw-page-title-main">Wrapped Cauchy distribution</span>

In probability theory and directional statistics, a wrapped Cauchy distribution is a wrapped probability distribution that results from the "wrapping" of the Cauchy distribution around the unit circle. The Cauchy distribution is sometimes known as a Lorentzian distribution, and the wrapped Cauchy distribution may sometimes be referred to as a wrapped Lorentzian distribution.

<span class="mw-page-title-main">Wrapped exponential distribution</span> Probability distribution

In probability theory and directional statistics, a wrapped exponential distribution is a wrapped probability distribution that results from the "wrapping" of the exponential distribution around the unit circle.

Curvilinear coordinates can be formulated in tensor calculus, with important applications in physics and engineering, particularly for describing transportation of physical quantities and deformation of matter in fluid mechanics and continuum mechanics.

References

  1. Keller, Philipp; Mannor, Shie; Precup, Doina. (2006) Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA.
  2. Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction.(1998) MIT Press, Cambridge, MA, chapter 8
  3. 1 2 Mahadevan, Sridhar; Maggioni, Mauro. (2005) Value function approximation with diffusion wavelets and Laplacian eigenfuctions. Proceedings of Advances in Neural Information Processing Systems.
  4. Ilse C. F. Ipsen and Carl D. Meyer. The idea behind Krylov methods. American Mathematical Monthly, 105(10):889–899, 1998.
  5. 1 2 3 4 M. Petrik. An analysis of Laplacian methods for value function approximation in MDPs. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2574–2579, 2007
  6. R. Parr, C. Painter-Wakefield, L.-H. Li, and M. Littman. Analyzing feature generation for value-function approximation. In ICML’07, 2007.
  7. S. Mahadevan and B. Liu. Basis construction from power series expansions of value functions. In NIPS’10, 2010
  8. William J. Stewart. Numerical methods for computing stationary distributions of finite irreducible Markov chains. In Advances in Computational Probability. Kluwer Academic Publishers, 1997.