Empowerment (artificial intelligence)

Last updated

Empowerment in the field of artificial intelligence formalises and quantifies (via information theory) the potential an agent perceives that it has to influence its environment. [1] [2] An agent which follows an empowerment maximising policy, acts to maximise future options (typically up to some limited horizon). Empowerment can be used as a (pseudo) utility function that depends only on information gathered from the local environment to guide action, rather than seeking an externally imposed goal, thus is a form of intrinsic motivation. [3]

Contents

The empowerment formalism depends on a probabilistic model commonly used in artificial intelligence. An autonomous agent operates in the world by taking in sensory information and acting to change its state, or that of the environment, in a cycle of perceiving and acting known as the perception-action loop. Agent state and actions are modelled by random variables () and time (). The choice of action depends on the current state, and the future state depends on the choice of action, thus the perception-action loop unrolled in time forms a causal bayesian network.

Definition

Empowerment () is defined as the channel capacity () of the actuation channel of the agent, and is formalised as the maximal possible information flow between the actions of the agent and the effect of those actions some time later. Empowerment can be thought of as the future potential of the agent to affect its environment, as measured by its sensors. [3]

In a discrete time model, Empowerment can be computed for a given number of cycles into the future, which is referred to in the literature as 'n-step' empowerment. [4]

The unit of empowerment depends on the logarithm base. Base 2 is commonly used in which case the unit is bits.

Contextual Empowerment

In general the choice of action (action distribution) that maximises empowerment varies from state to state. Knowing the empowerment of an agent in a specific state is useful, for example to construct an empowerment maximising policy. State-specific empowerment can be found using the more general formalism for 'contextual empowerment'. [4] is a random variable describing the context (e.g. state).

Application

Empowerment maximisation can be used as a pseudo-utility function to enable agents to exhibit intelligent behaviour without requiring the definition of external goals, for example balancing a pole in a cart-pole balancing scenario where no indication of the task is provided to the agent. [4] Empowerment has been applied in studies of collective behaviour [5] and in continuous domains. [6] [7] As is the case with Bayesian methods in general, computation of empowerment becomes computationally expensive as the number of actions and time horizon extends, but approaches to improve efficiency have led to usage in real-time control. [8] Empowerment has been used for intrinsically motivated reinforcement learning agents playing video games, [9] and in the control of underwater vehicles. [10]

Related Research Articles

<span class="mw-page-title-main">Artificial neural network</span> Computational model used in machine learning, based on connected, hierarchical functions

Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains.

<span class="mw-page-title-main">Reinforcement learning</span> Field of machine learning

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

<span class="mw-page-title-main">Q-learning</span> Model-free reinforcement learning algorithm

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment, and it can handle problems with stochastic transitions and rewards without requiring adaptations.

<span class="mw-page-title-main">Recurrent neural network</span> Computational model used in machine learning

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs.

In game theory, a Bayesian game is a game that models the outcome of player interactions using aspects of Bayesian probability. Bayesian games are notable because they allowed, for the first time in game theory, for the specification of the solutions to games with incomplete information.

<span class="mw-page-title-main">Multi-armed bandit</span> Machine Learning

In probability theory and machine learning, the multi-armed bandit problem is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. The name comes from imagining a gambler at a row of slot machines, who has to decide which machines to play, how many times to play each machine and in which order to play them, and whether to continue with the current machine or try a different machine. The multi-armed bandit problem also falls into the broad category of stochastic scheduling.

A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot directly observe the underlying state. Instead, it must maintain a sensor model and the underlying MDP. Unlike the policy function in MDP which maps the underlying states to the actions, POMDP's policy is a mapping from the history of observations to the actions.

<span class="mw-page-title-main">Quantum neural network</span> Quantum Mechanics in Neural Networks

Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments.

Distributed constraint optimization is the distributed analogue to constraint optimization. A DCOP is a problem in which a group of agents must distributedly choose values for a set of variables such that the cost of a set of constraints over the variables is minimized.

In conformal field theory and representation theory, a W-algebra is an associative algebra that generalizes the Virasoro algebra. W-algebras were introduced by Alexander Zamolodchikov, and the name "W-algebra" comes from the fact that Zamolodchikov used the letter W for one of the elements of one of his examples.

<span class="mw-page-title-main">Long short-term memory</span> Artificial recurrent neural network architecture used in deep learning

Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network (RNN) can process not only single data points, but also entire sequences of data. This characteristic makes LSTM networks ideal for processing and predicting data. For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.

AIXI['ai̯k͡siː] is a theoretical mathematical formalism for artificial general intelligence. It combines Solomonoff induction with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000 and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence.

Circuit quantum electrodynamics provides a means of studying the fundamental interaction between light and matter. As in the field of cavity quantum electrodynamics, a single photon within a single mode cavity coherently couples to a quantum object (atom). In contrast to cavity QED, the photon is stored in a one-dimensional on-chip resonator and the quantum object is no natural atom but an artificial one. These artificial atoms usually are mesoscopic devices which exhibit an atom-like energy spectrum. The field of circuit QED is a prominent example for quantum information processing and a promising candidate for future quantum computation.

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

The free energy principle is a mathematical principle in biophysics and cognitive science. It describes a formal account of the representational capacities of physical systems: that is, why things that exist look as if they track properties of the systems to which they are coupled. It establishes that the dynamics of physical systems minimise a quantity known as surprisal ; or equivalently, its variational upper bound, called free energy. The principle is formally related to variational Bayesian methods and was originally introduced by Karl Friston as an explanation for embodied perception-action loops in neuroscience, where it is also known as active inference.

<span class="mw-page-title-main">Causal fermion systems</span> Candidate unified theory of physics

The theory of causal fermion systems is an approach to describe fundamental physics. It provides a unification of the weak, the strong and the electromagnetic forces with gravity at the level of classical field theory. Moreover, it gives quantum mechanics as a limiting case and has revealed close connections to quantum field theory. Therefore, it is a candidate for a unified physical theory. Instead of introducing physical objects on a preexisting spacetime manifold, the general concept is to derive spacetime as well as all the objects therein as secondary objects from the structures of an underlying causal fermion system. This concept also makes it possible to generalize notions of differential geometry to the non-smooth setting. In particular, one can describe situations when spacetime no longer has a manifold structure on the microscopic scale. As a result, the theory of causal fermion systems is a proposal for quantum geometry and an approach to quantum gravity.

<span class="mw-page-title-main">Multi-agent pathfinding</span> Pathfinding problem

The problem of Multi-Agent Path Finding (MAPF) is an instance of multi-agent planning and consists in the computation of collision-free paths for a group of agents from their location to an assigned target. It is an optimization problem, since the aim is to find those paths that optimize a given objective function, usually defined as the number of time steps until all agents reach their goal cells. MAPF is the multi-agent generalization of the pathfinding problem, and it is closely related to the shortest path problem in the context of graph theory.

<span class="mw-page-title-main">Multi-agent reinforcement learning</span> Sub-field of reinforcement learning

Multi-agent reinforcement learning (MARL) is a sub-field of reinforcement learning. It focuses on studying the behavior of multiple learning agents that coexist in a shared environment. Each agent is motivated by its own rewards, and does actions to advance its own interests; in some environments these interests are opposed to the interests of other agents, resulting in complex group dynamics.

Daniel Polani is a professor of Artificial Intelligence and Director of the Centre for Computer Science and Informatics Research (CCSIR), and Head of the Adaptive Systems Research Group, and leader of the SEPIA Lab at the University of Hertfordshire.

Intrinsic motivation in the study of artificial intelligence and robotics is a mechanism for enabling artificial agents to exhibit inherently rewarding behaviours such as exploration and curiosity, grouped under the same term in the study of psychology. Psychologists consider intrinsic motivation in humans to be the drive to perform an activity for inherent satisfaction – just for the fun or challenge of it.

References

  1. Klyubin, A., Polani, D., and Nehaniv, C. (2005a). All else being equal be empowered. Advances in Artificial Life, pages 744–753.
  2. Klyubin, A., Polani, D., and Nehaniv, C. (2005b). Empowerment: A universal agent- centric measure of control. In Evolutionary Computation, 2005. The 2005 IEEE Congress on, volume 1, pages 128–135. IEEE.
  3. 1 2 Salge, C; Glackin, C; Polani, D (2014). "Empowerment -- An Introduction". In Prokopenko, M (ed.). Guided Self-Organization: Inception. Emergence, Complexity and Computation. Vol. 9. Springer. pp. 67–114. arXiv: 1310.1863 . doi:10.1007/978-3-642-53734-9_4. ISBN   978-3-642-53733-2. S2CID   9662065.
  4. 1 2 3 Klyubin, A., Polani, D., and Nehaniv, C. (2008). Keep your options open: an information-based driving principle for sensorimotor systems. PLOS ONE, 3(12):e4018. https://dx.doi.org/10.1371%2Fjournal.pone.0004018
  5. Capdepuy, P., Polani, D., & Nehaniv, C. L. (2007, April). Maximization of potential information flow as a universal utility for collective behaviour. In 2007 IEEE Symposium on Artificial Life (pp. 207-213). Ieee.
  6. Jung, T., Polani, D., & Stone, P. (2011). Empowerment for continuous agent—environment systems. Adaptive Behavior, 19(1), 16-39.
  7. Salge, C., Glackin, C., & Polani, D. (2013). Approximation of empowerment in the continuous domain. Advances in Complex Systems, 16(02n03), 1250079.
  8. Karl, M., Soelch, M., Becker-Ehmck, P., Benbouzid, D., van der Smagt, P., & Bayer, J. (2017). Unsupervised real-time control through variational empowerment. arXiv preprint arXiv:1710.05101.
  9. Mohamed, S., & Rezende, D. J. (2015). Variational information maximisation for intrinsically motivated reinforcement learning. arXiv preprint arXiv:1509.08731.
  10. Volpi, N. C., De Palma, D., Polani, D., & Indiveri, G. (2016). Computation of empowerment for an autonomous underwater vehicle. IFAC-PapersOnLine, 49(15), 81-87.