Proto-value function

Last updated December 14, 2021

In applied mathematics, proto-value functions (PVFs) are automatically learned basis functions that are useful in approximating task-specific value functions, providing a compact representation of the powers of transition matrices. They provide a novel framework for solving the credit assignment problem. The framework introduces a novel approach to solving Markov decision processes (MDP) and reinforcement learning problems, using multiscale spectral and manifold learning methods. Proto-value functions are generated by spectral analysis of a graph, using the graph Laplacian.

Motivation

Value function approximation is a critical component to solving Markov decision processes (MDPs) defined over a continuous state space. A good function approximator allows a reinforcement learning (RL) agent to accurately represent the value of any state it has experienced, without explicitly storing its value. Linear function approximation using basis functions is a common way of constructing a value function approximation, like radial basis functions, polynomial state encodings, and CMACs. However, parameters associated with these basis functions often require significant domain-specific hand-engineering.^[2] Proto-value functions attempts to solve this required hand-engineering by accounting for the underlying manifold structure of the problem domain.^[1]

Overview

Proto-value functions are task-independent global basis functions that collectively span the entire space of possible value functions for a given state space.^[1] They incorporate geometric constraints intrinsic to the environment. For example, states close in Euclidean distance (such as states on opposite sides of a wall) may be far apart in manifold space. Previous approaches to this nonlinearity problem lacked a broad theoretical framework, and consequently have only been explored in the context of discrete MDPs.

Proto-value functions arise from reformulating the problem of value function approximation as real-valued function approximation on a graph or manifold. This results in broader applicability of the learned bases and enables a new class of learning algorithms, which learn representations and policies at the same time.^[3]

Basis functions from graph Laplacian

This approach constructs the basis functions by spectral analysis of the graph Laplacian, a self-adjoint (or symmetric) operator on the space of functions on the graph, closely related to the random walk operator.

For the sake of simplicity, assume that the underlying state space can be represented as an undirected unweighted graph $G=(V,E)$ The combinatorial Laplacian $L$ is defined as the operator $L=D-A$ , where $D$ is a diagonal matrix called the degree matrix and $A$ is the adjacency matrix.^[1]

The spectral analysis of the Laplace operator on a graph consists of finding the eigenvalues and eigenfunctions which solve the equation

L\varphi _{\lambda }=\lambda \varphi _{\lambda },

where $L$ is the combinatorial Laplacian, $\varphi _{\lambda }$ is an eigenfunction associated with the eigenvalue $\lambda$ . Here the term "eigenfunction" is used to denote what is traditionally referred to as eigenvector in linear algebra, because the Laplacian eigenvectors can naturally be viewed as functions that map each vertex to a real number.^[3]

The combinatorial Laplacian is not the only operator on graphs to select from. Other possible graph operators include:

Normalized Laplacian $L_{\text{normalized}}=I-D^{-1/2}AD^{-1/2}$ ^[4]
Random Walk $P=D^{-1}A$ ^[4]

Graph construction on discrete state space

For a finite state space the graph $G$ mentioned above can be simply constructed by examining the connections between states. Let $S_{i}$ and $S_{j}$ be any two states. Then

G_{i,j}={\begin{cases}1&{\text{if }}S_{i}\leftrightarrow S_{j}\\0&{\text{otherwise}}\end{cases}}

It is important to note that this can only be done when the state space is finite and of reasonable size.

Graph construction on continuous or large state space

For a continuous state space or simply a very large discrete state space, it is necessary to sample from the manifold in state space. Then constructing the Graph $G$ based on the samples. There are a few issues to consider here:^[4]

How to sample the manifold
- Random walk or guided exploration
How to determine if two sample should be connected

Application

Once the PVFs are generated, they can be plugged into a traditional function approximation framework. One such method is least-squares approximation.

Least-squares approximation using proto-value functions

Let $\Phi _{G}=\left\{V_{1}^{G},\dots ,V_{k}^{G}\right\}$ be the basis set of PVFs, where each $V_{i}^{G}$ is the eigenfunction defined over all states in the graph $G$ . Let ${\widehat {V}}^{\pi }$ be the target value function that is only known for a subset of states $S_{M}^{G}=\left\{s_{1},\dots ,s_{m}\right\}$ .

Define the gram matrix

K_{G}=\left(\Phi _{m}^{G}\right)^{T}\Phi _{m}^{G}.

Here $S_{m}^{G}$ is the component wise projection of the PVFs onto the states in $S_{G}^{m}$ . Hence, each entry of the gram matrix is

K_{G}(i,j)=\sum _{k}V_{i}^{G}(k)V_{j}^{G}(k).

The coefficients that minimize the least squares error are then described by the equation

\alpha =K_{G}^{-1}\left(\Phi _{M}^{G}\right)^{T}{\widehat {V}}^{\pi }.

A nonlinear least-squares approach is possible by using the k PVFs with the largest absolute coefficients to compute the approximation.^[1]

Related Research Articles

In mathematics, a self-adjoint operator on an infinite-dimensional complex vector space V with inner product $is a linear map A that is its own adjoint. If V is finite-dimensional with a given orthonormal basis, this is equivalent to the condition that the matrix of A is a Hermitian matrix, i.e., equal to its conjugate transpose A * . By the finite-dimensional spectral theorem, V has an orthonormal basis such that the matrix of A relative to this basis is a diagonal matrix with entries in the real numbers. In this article, we consider generalizations of this concept to operators on Hilbert spaces of arbitrary dimension.$

Eigenfunction Mathematical function of a linear operator

In mathematics, an eigenfunction of a linear operator D defined on some function space is any non-zero function f in that space that, when acted upon by D, is only multiplied by some scaling factor called an eigenvalue. As an equation, this condition can be written as

In mathematics, specifically functional analysis, Mercer's theorem is a representation of a symmetric positive-definite function on a square as a sum of a convergent sequence of product functions. This theorem, presented in, is one of the most notable results of the work of James Mercer (1883–1932). It is an important theoretical tool in the theory of integral equations; it is used in the Hilbert space theory of stochastic processes, for example the Karhunen–Loève theorem; and it is also used to characterize a symmetric positive semi-definite kernel.

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lies within lower-dimensional space. If the data of interest is of low enough dimension, the data can be visualised in the low-dimensional space.

In mathematics, spectral theory is an inclusive term for theories extending the eigenvector and eigenvalue theory of a single square matrix to a much broader theory of the structure of operators in a variety of mathematical spaces. It is a result of studies of linear algebra and the solutions of systems of linear equations and their generalizations. The theory is connected to that of analytic functions because the spectral properties of an operator are related to analytic functions of the spectral parameter.

In mathematics, spectral graph theory is the study of the properties of a graph in relationship to the characteristic polynomial, eigenvalues, and eigenvectors of matrices associated with the graph, such as its adjacency matrix or Laplacian matrix.

In the theory of stochastic processes, the Kosambi–Karhunen–Loève theorem is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

In the mathematical field of graph theory, the Laplacian matrix, also called the graph Laplacian, admittance matrix, Kirchhoff matrix or discrete Laplacian, is a matrix representation of a graph. The Laplacian matrix can be used to find many useful properties of a graph. Together with Kirchhoff's theorem, it can be used to calculate the number of spanning trees for a given graph. The sparsest cut of a graph can be approximated through the second smallest eigenvalue of its Laplacian by Cheeger's inequality. It can also be used to construct low dimensional embeddings, which can be useful for a variety of machine learning applications.

In differential geometry, the Laplace–Beltrami operator is a generalization of the Laplace operator to functions defined on submanifolds in Euclidean space and, even more generally, on Riemannian and pseudo-Riemannian manifolds. It is named after Pierre-Simon Laplace and Eugenio Beltrami.

In linear algebra, an eigenvector or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted by $, is the factor by which the eigenvector is scaled.$

In the mathematical study of heat conduction and diffusion, a heat kernel is the fundamental solution to the heat equation on a specified domain with appropriate boundary conditions. It is also one of the main tools in the study of the spectrum of the Laplace operator, and is thus of some auxiliary importance throughout mathematical physics. The heat kernel represents the evolution of temperature in a region whose boundary is held fixed at a particular temperature, such that an initial unit of heat energy is placed at a point at time t = 0.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

In mathematics, the Plancherel theorem for spherical functions is an important result in the representation theory of semisimple Lie groups, due in its final form to Harish-Chandra. It is a natural generalisation in non-commutative harmonic analysis of the Plancherel formula and Fourier inversion formula in the representation theory of the group of real numbers in classical harmonic analysis and has a similarly close interconnection with the theory of differential equations. It is the special case for zonal spherical functions of the general Plancherel theorem for semisimple Lie groups, also proved by Harish-Chandra. The Plancherel theorem gives the eigenfunction expansion of radial functions for the Laplacian operator on the associated symmetric space X; it also gives the direct integral decomposition into irreducible representations of the regular representation on L²(X). In the case of hyperbolic space, these expansions were known from prior results of Mehler, Weyl and Fock.

Manifold alignment is a class of machine learning algorithms that produce projections between sets of data, given that the original data sets lie on a common manifold. The concept was first introduced as such by Ham, Lee, and Saul in 2003, adding a manifold constraint to the general problem of correlating sets of high-dimensional vectors.

A heat kernel signature (HKS) is a feature descriptor for use in deformable shape analysis and belongs to the group of spectral shape analysis methods. For each point in the shape, HKS defines its feature vector representing the point's local and global geometric properties. Applications include segmentation, classification, structure discovery, shape matching and shape retrieval.

Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon which computes a family of embeddings of a data set into Euclidean space whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the "diffusion distance" between probability distributions centered at those points. Different from linear dimensionality reduction methods such as principal component analysis (PCA), diffusion maps is part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive.

Automatic basis function construction is the mathematical method of looking for a set of task-independent basis functions that map the state space to a lower-dimensional embedding, while still representing the value function accurately. Automatic basis construction is independent of prior knowledge of the domain, which allows it to perform well where expert-constructed basis functions are difficult or impossible to create.

Diffusion wavelets are a fast multiscale framework for the analysis of functions on discrete structures like graphs, manifolds, and point clouds in Euclidean space. Diffusion wavelets are an extension of classical wavelet theory from harmonic analysis. Unlike classical wavelets whose basis functions are predetermined, diffusion wavelets are adapted to the geometry of a given diffusion operator $. Moreover, the diffusion wavelet basis functions are constructed by dilation using the dyadic powers of . These dyadic powers of diffusion over the space and propagate local relationships in the function throughout the space until they become global. And if the rank of higher powers of decrease, then these higher powers become compressible. From these decaying dyadic powers of comes a chain of decreasing subspaces. These subspaces are the scaling function approximation subspaces, and the differences in the subspace chain are the wavelet subspaces.$

Spectral shape analysis relies on the spectrum of the Laplace–Beltrami operator to compare and analyze geometric shapes. Since the spectrum of the Laplace–Beltrami operator is invariant under isometries, it is well suited for the analysis or retrieval of non-rigid shapes, i.e. bendable objects such as humans, animals, plants, etc.

References

1 2 3 4 5 Mahadevan, S. Proto-Value Functions: Developmental Reinforcement Learning. Proceedings of the International Conference on Machine Learning ICML 2005
↑ Johns, J. and Mahadevan, S., Constructing Basis Functions from Directed Graphs for Value Function Approximation, International Conference on Machine Learning (ICML), 2007
1 2 Mahadevan, S. and Maggiono, M., Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes, University of Massachusetts, Department of Computer Science Technical Report TR-2006-35, 2006
1 2 3 Mahadevan, S. and Maggiono, M., ICML 2006 tutorial.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[key1-1] 1 2 3 4 5 Mahadevan, S. Proto-Value Functions: Developmental Reinforcement Learning. Proceedings of the International Conference on Machine Learning ICML 2005

[key2-2] Johns, J. and Mahadevan, S., Constructing Basis Functions from Directed Graphs for Value Function Approximation, International Conference on Machine Learning (ICML), 2007

[key3-3] 1 2 Mahadevan, S. and Maggiono, M., Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes, University of Massachusetts, Department of Computer Science Technical Report TR-2006-35, 2006

[key4-4] 1 2 3 Mahadevan, S. and Maggiono, M., ICML 2006 tutorial.

[1]

[2]

[3]

[4]