Constructing skill trees

Last updated December 31, 2022

Constructing skill trees (CST) is a hierarchical reinforcement learning algorithm which can build skill trees from a set of sample solution trajectories obtained from demonstration. CST uses an incremental MAP (maximum a posteriori) change point detection algorithm to segment each demonstration trajectory into skills and integrate the results into a skill tree. CST was introduced by George Konidaris, Scott Kuindersma, Andrew Barto and Roderic Grupen in 2010.^[1]

Algorithm

CST consists of mainly three parts;change point detection, alignment and merging. The main focus of CST is online change-point detection. The change-point detection algorithm is used to segment data into skills and uses the sum of discounted reward $R_{t}$ as the target regression variable. Each skill is assigned an appropriate abstraction. A particle filter is used to control the computational complexity of CST.

The change point detection algorithm is implemented as follows. The data for times $t\in T$ and models $Q$ with prior $p(q\in Q)$ are given. The algorithm is assumed to be able to fit a segment from time $j+1$ to $t$ using model $q$ with the fit probability $P(j,t,q)_{}^{}$ . A linear regression model with Gaussian noise is used to compute $P(j,t,q)$ . The Gaussian noise prior has mean zero, and variance which follows $\mathrm {InverseGamma} \left({\frac {v}{2}},{\frac {u}{2}}\right)$ . The prior for each weight follows $\mathrm {Normal} (0,\sigma ^{2}\delta )$ .

The fit probability $P(j,t,q)$ is computed by the following equation.

P(j,t,q)={\frac {\pi ^{-{\frac {n}{2}}}}{\delta ^{m}}}\left|(A+D)^{-1}\right|^{\frac {1}{2}}{\frac {u^{\frac {v}{2}}}{(y+u)^{\frac {u+v}{2}}}}{\frac {\Gamma ({\frac {n+v}{2}})}{\Gamma ({\frac {v}{2}})}}

Then, CST compute the probability of the changepoint at time $j$ with model $q$ , $P_{t}(j,q)$ and $P_{j}^{\text{MAP}}$ using a Viterbi algorithm.

P_{t}(j,q)=(1-G(t-j-1))P(j,t,q)p(q)P_{j}^{\text{MAP}}

P_{j}^{\text{MAP}}=\max _{i,q}{\frac {P_{j}(i,q)g(j-i)}{1-G(j-i-1)}},\forall j<t

The descriptions of the parameters and variables are as follows;

A=\sum _{i=j}^{t}\Phi (x_{i})\Phi (x_{i})^{T}

\Phi (x_{i})

: a vector of m basis functions evaluated at state

x_{i}

y=(\sum _{i=j}^{t}R_{i}^{2})-b^{T}(A+D)^{-1}b

b=\sum _{i=j}^{t}R_{i}\Phi (x_{i})

R_{i}=\sum _{j=i}^{T}\gamma ^{j-i}r_{j}

𝛾: Gamma function

n=t-j

m

: The number of basis functions q has.

D

: an m by m matrix with

\delta ^{-1}

on the diagonal and zeros elsewhere

The skill length $l$ is assumed to follow a Geometric distribution with parameter $p$

g_{}^{}(l)=(1-p)^{l-1}p

G_{}^{}(l)=(1-(1-p)^{l})

p_{}^{}={\frac {1}{k}}

k

: Expected skill length

Using the method above, CST can segment data into a skill chain. The time complexity of the change point detection is $O(NL)$ and storage size is $O(Nc)$ , where $N$ is the number of particles, $L$ is the time of computing $P(j,t,q)$ , and there are $O(c)$ change points.

Next step is alignment. CST needs to align the component skills because the change-point does not occur in the exactly same places. Thus, when segmenting second trajectory after segmenting the first trajectory, it has a bias on the location of change point in the second trajectory. This bias follows a mixture of gaussians.

The last step is merging. CST merges skill chains into a skill tree. CST merges a pair of trajectory segments by allocating the same skill. All trajectories have the same goal and it merges two chains by starting at their final segments. If two segments are statistically similar, it merges them. This procedure is repeated until it fails to merge a pair of skill segments. $P(j,t,q)$ are used to determine whether a pair of trajectories are modeled better as one skill or as two different skills.

Pseudocode

The following pseudocode describes the change point detection algorithm:

particles := []; Process each incoming data point for t = 1:T do     //Compute fit probabilities for all particles           for p ∈ particles do         p_tjq := (1 − G(t − p.pos − 1)) × p.fit_prob × model_prior(p.model) × p.prev_MAP          p.MAP := p_tjq × g(t−p.pos) / (1 − G(t − p.pos − 1))     end// Filter if necessaryif the number of particles ≥ N then         particles := particle_filter(p.MAP, M)     end// Determine the Viterbi pathfor t = 1 do         max_path := []         max_MAP := 1/|Q|     else         max_particle := maxp p.MAP         max_path := max_particle.path ∪ max_particle         max_MAP := max_particle.MAP     end// Create new particles for a changepoint at time tfor q ∈ Q do         new_p := create_particle(model=q, pos=t, prev_MAP=max_MAP, path=max_path)         p := p ∪ new_p     end// Update all particlesfor p ∈ P do         particles := update_particle(current_state, current_reward, p)           endend// Return the most likely path to the final pointreturn max_path

function update_particle(current_state, current_reward, particle) is     p := particle     r_t := current_reward     // Initializationif t = 0 then         p.A := zero matrix(p.m, p.m)         p.b := zero vector(p.m)         p.z := zero vector(p.m)         p.sum r := 0         p.tr1 := 0         p.tr2 := 0     end if// Compute the basis function vector for the current stateΦ_t := p.Φ (current state)     // Update sufficient statistics     p.A := p.A + Φ_tΦ^T
_t     p.z := 𝛾p.z + Φ_t     p.b := p.b + r_t p.z     p.tr1 := 1 + 𝛾² p.tr1     p.sum r := sum p.r + r²
_t p.tr1 + 2𝛾r_t p.tr2     p.tr2 := 𝛾p.tr2 + r_t p.tr1     p.fit_prob := compute_fit_prob(p, v, u, delta, 𝛾)

Assumptions

CTS assume that the demonstrated skills form a tree, the domain reward function is known and the best model for merging a pair of skills is the model selected for representing both individually.

Advantages

CST is much faster learning algorithm than skill chaining. CST can be applied to learning higher dimensional policies. Even unsuccessful episode can improve skills. Skills acquired using agent-centric features can be used for other problems.

Uses

CST has been used to acquire skills from human demonstration in the PinBall domain. It has been also used to acquire skills from human demonstration on a mobile manipulator.

Related Research Articles

In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues SVMs are one of the most robust prediction methods, being based on statistical learning frameworks or VC theory proposed by Vapnik and Chervonenkis (1974). Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. SVM maps training examples to points in space so as to maximise the width of the gap between the two categories. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.

In physics, work is the energy transferred to or from an object via the application of force along a displacement. In its simplest form, for a constant force aligned with the direction of motion, the work equals the product of the force strength and the distance traveled. A force is said to do positive work if when applied it has a component in the direction of the displacement of the point of application. A force does negative work if it has a component opposite to the direction of the displacement at the point of application of the force.

Noether's theorem or Noether's first theorem states that every differentiable symmetry of the action of a physical system with conservative forces has a corresponding conservation law. The theorem was proven by mathematician Emmy Noether in 1915 and published in 1918. The action of a physical system is the integral over time of a Lagrangian function, from which the system's behavior can be determined by the principle of least action. This theorem only applies to continuous and smooth symmetries over physical space.

In linear algebra, the Cayley–Hamilton theorem states that every square matrix over a commutative ring satisfies its own characteristic equation.

In physics, Liouville's theorem, named after the French mathematician Joseph Liouville, is a key theorem in classical statistical and Hamiltonian mechanics. It asserts that the phase-space distribution function is constant along the trajectories of the system—that is that the density of system points in the vicinity of a given system point traveling through phase-space is constant with time. This time-independent density is in statistical mechanics known as the classical a priori probability.

<span class="mw-page-title-main">Path integral formulation</span> Formulation of quantum mechanics

The path integral formulation is a description in quantum mechanics that generalizes the action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional integral, over an infinity of quantum-mechanically possible trajectories to compute a quantum amplitude.

In analytical mechanics, generalized coordinates are a set of parameters used to represent the state of a system in a configuration space. These parameters must uniquely define the configuration of the system relative to a reference state. The generalized velocities are the time derivatives of the generalized coordinates of the system. The adjective "generalized" distinguishes these parameters from the traditional use of the term "coordinate" to refer to Cartesian coordinates

In physics, canonical quantization is a procedure for quantizing a classical theory, while attempting to preserve the formal structure, such as symmetries, of the classical theory, to the greatest extent possible.

In mathematics, a symplectic integrator (SI) is a numerical integration scheme for Hamiltonian systems. Symplectic integrators form the subclass of geometric integrators which, by definition, are canonical transformations. They are widely used in nonlinear dynamics, molecular dynamics, discrete element methods, accelerator physics, plasma physics, quantum physics, and celestial mechanics.

In physics, a sigma model is a field theory that describes the field as a point particle confined to move on a fixed manifold. This manifold can be taken to be any Riemannian manifold, although it is most commonly taken to be either a Lie group or a symmetric space. The model may or may not be quantized. An example of the non-quantized version is the Skyrme model; it cannot be quantized due to non-linearities of power greater than 4. In general, sigma models admit (classical) topological soliton solutions, for example, the Skyrmion for the Skyrme model. When the sigma field is coupled to a gauge field, the resulting model is described by Ginzburg–Landau theory. This article is primarily devoted to the classical field theory of the sigma model; the corresponding quantized theory is presented in the article titled "non-linear sigma model".

Ewald summation, named after Paul Peter Ewald, is a method for computing long-range interactions in periodic systems. It was first developed as the method for calculating electrostatic energies of ionic crystals, and is now commonly used for calculating long-range interactions in computational chemistry. Ewald summation is a special case of the Poisson summation formula, replacing the summation of interaction energies in real space with an equivalent summation in Fourier space. In this method, the long-range interaction is divided into two parts: a short-range contribution, and a long-range contribution which does not have a singularity. The short-range contribution is calculated in real space, whereas the long-range contribution is calculated using a Fourier transform. The advantage of this method is the rapid convergence of the energy compared with that of a direct summation. This means that the method has high accuracy and reasonable speed when computing long-range interactions, and it is thus the de facto standard method for calculating long-range interactions in periodic systems. The method requires charge neutrality of the molecular system in order to accurately calculate the total Coulombic interaction. A study of the truncation errors introduced in the energy and force calculations of disordered point-charge systems is provided by Kolafa and Perram.

In statistical physics, the BBGKY hierarchy is a set of equations describing the dynamics of a system of a large number of interacting particles. The equation for an s-particle distribution function in the BBGKY hierarchy includes the (s + 1)-particle distribution function, thus forming a coupled chain of equations. This formal theoretic result is named after Nikolay Bogolyubov, Max Born, Herbert S. Green, John Gamble Kirkwood, and Jacques Yvon.

The Glicko rating system and Glicko-2 rating system are methods of assessing a player's strength in games of skill, such as chess and Go. The Glicko rating system was invented by Mark Glickman in 1995 as an improvement on the Elo rating system, and initially intended for the primary use as a chess rating system. Glickman's principal contribution to measurement is "ratings reliability", called RD, for ratings deviation.

<span class="mw-page-title-main">Lagrangian mechanics</span> Formulation of classical mechanics

In physics, Lagrangian mechanics is a formulation of classical mechanics founded on the stationary-action principle. It was introduced by the Italian-French mathematician and astronomer Joseph-Louis Lagrange in his 1788 work, Mécanique analytique.

Curvilinear coordinates can be formulated in tensor calculus, with important applications in physics and engineering, particularly for describing transportation of physical quantities and deformation of matter in fluid mechanics and continuum mechanics.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space $on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in, discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.$

Surface hopping is a mixed quantum-classical technique that incorporates quantum mechanical effects into molecular dynamics simulations. Traditional molecular dynamics assume the Born-Oppenheimer approximation, where the lighter electrons adjust instantaneously to the motion of the nuclei. Though the Born-Oppenheimer approximation is applicable to a wide range of problems, there are several applications, such as photoexcited dynamics, electron transfer, and surface chemistry where this approximation falls apart. Surface hopping partially incorporates the non-adiabatic effects by including excited adiabatic surfaces in the calculations, and allowing for 'hops' between these surfaces, subject to certain criteria.

Computational anatomy (CA) is a discipline within medical imaging focusing on the study of anatomical shape and form at the visible or gross anatomical scale of morphology. The field is broadly defined and includes foundations in anatomy, applied mathematics and pure mathematics, including medical imaging, neuroscience, physics, probability, and statistics. It focuses on the anatomical structures being imaged, rather than the medical imaging devices. The central focus of the sub-field of computational anatomy within medical imaging is mapping information across anatomical coordinate systems most often dense information measured within a magnetic resonance image (MRI). The introduction of flows into CA, which are akin to the equations of motion used in fluid dynamics, exploit the notion that dense coordinates in image analysis follow the Lagrangian and Eulerian equations of motion. In models based on Lagrangian and Eulerian flows of diffeomorphisms, the constraint is associated to topological properties, such as open sets being preserved, coordinates not crossing implying uniqueness and existence of the inverse mapping, and connected sets remaining connected. The use of diffeomorphic methods grew quickly to dominate the field of mapping methods post Christensen's original paper, with fast and symmetric methods becoming available.

In computer science, the Aharonov–Jones–Landau algorithm is an efficient quantum algorithm for obtaining an additive approximation of the Jones polynomial of a given link at an arbitrary root of unity. Finding a multiplicative approximation is a #P-hard problem, so a better approximation is considered unlikely. However, it is known that computing an additive approximation of the Jones polynomial is a BQP-complete problem.

Single-particle trajectories (SPTs) consist of a collection of successive discrete points causal in time. These trajectories are acquired from images in experimental data. In the context of cell biology, the trajectories are obtained by the transient activation by a laser of small dyes attached to a moving molecule.

References

↑ Jeevanandam, Nivash (2021-09-13). "Underrated But Fascinating ML Concepts #5 – CST, PBWM, SARSA, & Sammon Mapping". Analytics India Magazine. Retrieved 2021-12-05.

Konidaris, George; Scott Kuindersma; Andrew Barto; Roderic Grupen (2010). "Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories". Advances in Neural Information Processing Systems 23.

Konidaris, George; Andrew Barto (2009). "Skill discovery in continuous reinforcement learning domains using skill chaining". Advances in Neural Information Processing Systems 22.

Fearnhead, Paul; Zhen Liu (2007). "On-line Inference for Multiple Change Points". Journal of the Royal Statistical Society.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Jeevanandam, Nivash (2021-09-13). "Underrated But Fascinating ML Concepts #5 – CST, PBWM, SARSA, & Sammon Mapping". Analytics India Magazine. Retrieved 2021-12-05.

[1]