Skill chaining

Last updated January 14, 2021

Skill chaining is a skill discovery method in continuous reinforcement learning. It has been extended to high-dimensional continuous domains by the related Deep skill chaining algorithm.

Related Research Articles

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

Operant conditioning is a type of associative learning process through which the strength of a behavior is modified by reinforcement or punishment. It is also a procedure that is used to bring about such learning.

Reinforcement A consequence applied that will strengthen an organisms future behavior

In behavioral psychology, reinforcement is a consequence applied that will strengthen an organism's future behavior whenever that behavior is preceded by a specific antecedent stimulus. This strengthening effect may be measured as a higher frequency of behavior, longer duration, greater magnitude, or shorter latency. There are two types of reinforcement, known as positive reinforcement and negative reinforcement; positive is where by a reward is offered on expression of the wanted behaviour and negative is taking away an undesirable element in the persons environment whenever the desired behaviour is achieved. Rewarding stimuli, which are associated with "wanting" and "liking" and appetitive behavior, function as positive reinforcers; the converse statement is also true: positive reinforcers provide a desirable stimulus. Reinforcement does not require an individual to consciously perceive an effect elicited by the stimulus. Thus, reinforcement occurs only if there is an observable strengthening in behavior. However, there is also negative reinforcement, which is characterized by taking away an undesirable stimulus. Changing someone's job might serve as a negative reinforcer to someone who suffers from back problems, i.e. Changing from a labourers job to an office position for instance.

Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Intelligent control is a class of control techniques that use various artificial intelligence computing approaches like neural networks, Bayesian probability, fuzzy logic, machine learning, reinforcement learning, evolutionary computation and genetic algorithms.

Learning classifier systems, or LCS, are a paradigm of rule-based machine learning methods that combine a discovery component with a learning component. Learning classifier systems seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions. This approach allows complex solution spaces to be broken up into smaller, simpler parts.

In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Ronald Howard's 1960 book, Dynamic Programming and Markov Processes. They are used in many disciplines, including robotics, automatic control, economics and manufacturing. The name of MDPs comes from the Russian mathematician Andrey Markov as they are an extension of Markov chains.

Q-learning is a model-free reinforcement learning algorithm to learn quality of actions telling an agent what action to take under what circumstances. It does not require a model of the environment, and it can handle problems with stochastic transitions and rewards, without requiring adaptations.

Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks. This area of research bears some relation to the long history of psychological literature on transfer of learning, although formal ties between the two fields are limited. From the practical standpoint, reusing or transferring information from previously learned tasks for the learning of new tasks has the potential to significantly improve the sample efficiency of a reinforcement learning agent.

Professional audio, abbreviated as pro audio, refers to both an activity and a category of high quality, studio-grade audio equipment. Typically it encompasses sound recording, sound reinforcement system setup and audio mixing, and studio music production by trained sound engineers, audio engineers, record producers, and audio technicians who work in live event support and recording using mixing consoles, recording equipment and sound reinforcement systems. Professional audio is differentiated from consumer- or home-oriented audio, which are typically geared toward listening in a non-commercial environment.

Practice is the act of rehearsing a behaviour over and over, or engaging in an activity again and again, for the purpose of improving or mastering it, as in the phrase 'practice makes perfect'. It is important to note that practise is a verb and should not be confused with the noun practice. Sports teams practise to prepare for actual games. Playing a musical instrument well takes much practice. It is a method of learning and of acquiring experience. The word derives from the Greek "πρακτική" (praktike), feminine of "πρακτικός" (praktikos), "fit for or concerned with action, practical", and that from the verb "πράσσω" (prasso), "to achieve, bring about, effect, accomplish". In English, practice is the noun and practise is the verb, but in American-English it is now common for practice to be used both as a noun and a verb.

In artificial intelligence, apprenticeship learning is the process of learning by observing an expert. It can be viewed as a form of supervised learning, where the training dataset consists of task executions by a demonstration teacher.

Training and development involves improving the effectiveness of organizations and the individuals and teams within them. Training may be viewed as related to immediate changes in organizational effectiveness via organized instruction, while development is related to the progress of longer-term organizational and employee goals. While training and development technically have differing definitions, the two are oftentimes used interchangeably and/or together. Training and development has historically been a topic within applied psychology but has within the last two decades become closely associated with human resources management, talent management, human resources development, instructional design, human factors, and knowledge management.

Constructing skill trees (CST) is a hierarchical reinforcement learning algorithm which can build skill trees from a set of sample solution trajectories obtained from demonstration. CST uses an incremental MAP change point detection algorithm to segment each demonstration trajectory into skills and integrate the results into a skill tree. CST was introduced by George Konidaris, Scott Kuindersma, Andrew Barto and Roderic Grupen in 2010.

Mountain Car, a standard testing domain in Reinforcement Learning, is a problem in which an under-powered car must drive up a steep hill. Since gravity is stronger than the car's engine, even at full throttle, the car cannot simply accelerate up the steep slope. The car is situated in a valley and must learn to leverage potential energy by driving up the opposite hill before the car is able to make it to the goal at the top of the rightmost hill. The domain has been used as a test bed in various Reinforcement Learning papers.

DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in September 2010. DeepMind was acquired by Google in 2014. The company is based in London, with research centres in Canada, France, and the United States. In 2015, it became a wholly owned subsidiary of Alphabet Inc, Google's parent company.

David Silver leads the reinforcement learning research group at DeepMind and was lead researcher on AlphaGo, AlphaZero and co-lead on AlphaStar.

Deep reinforcement learning is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of state spaces. Deep RL algorithms are able to take in very large inputs and decide what actions to perform to optimize an objective. Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare.

Amazon SageMaker is a cloud machine-learning platform that was launched in November 2017. SageMaker enables developers to create, train, and deploy machine-learning (ML) models in the cloud. SageMaker also enables developers to deploy ML models on embedded systems and edge-devices.

MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules. Its release in 2019 included benchmarks of its performance in go, chess, shogi, and a standard suite of Atari games. The algorithm uses an approach similar to AlphaZero. It matched AlphaZero's performance in chess and shogi, improved on its performance in Go, and improved on the state of the art in mastering a suite of 57 Atari games, a visually-complex domain.

References

Konidaris, George; Andrew Barto (2009). "Skill discovery in continuous reinforcement learning domains using skill chaining". Advances in Neural Information Processing Systems 22.
Bagaria, Akhil; George Konidaris (2020). "Option discovery using deep skill chaining". International Conference on Learning Representations.

This computer science article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.