Empowerment (artificial intelligence)

Last updated November 22, 2024

Empowerment in the field of artificial intelligence formalises and quantifies (via information theory) the potential an agent perceives that it has to influence its environment.^[1]^[2] An agent which follows an empowerment maximising policy, acts to maximise future options (typically up to some limited horizon). Empowerment can be used as a (pseudo) utility function that depends only on information gathered from the local environment to guide action, rather than seeking an externally imposed goal, thus is a form of intrinsic motivation.^[3]

The empowerment formalism depends on a probabilistic model commonly used in artificial intelligence. An autonomous agent operates in the world by taking in sensory information and acting to change its state, or that of the environment, in a cycle of perceiving and acting known as the perception-action loop. Agent state and actions are modelled by random variables ( $S:s\in {\mathcal {S}},A:a\in {\mathcal {A}}$ ) and time ( $t$ ). The choice of action depends on the current state, and the future state depends on the choice of action, thus the perception-action loop unrolled in time forms a causal bayesian network.

Definition

Empowerment ( ${\mathfrak {E}}$ ) is defined as the channel capacity ( $C$ ) of the actuation channel of the agent, and is formalised as the maximal possible information flow between the actions of the agent and the effect of those actions some time later. Empowerment can be thought of as the future potential of the agent to affect its environment, as measured by its sensors.^[3]

${\mathfrak {E}}:=C(A_{t}\longrightarrow S_{t+1})\equiv \max _{p(a_{t})}I(A_{t};S_{t+1})$

In a discrete time model, Empowerment can be computed for a given number of cycles into the future, which is referred to in the literature as 'n-step' empowerment.^[4]

${\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n})=\max _{p(a_{t},...,a_{t+n-1})}I(A_{t},...,A_{t+n-1};S_{t+n})$

The unit of empowerment depends on the logarithm base. Base 2 is commonly used in which case the unit is bits.

Contextual Empowerment

In general the choice of action (action distribution) that maximises empowerment varies from state to state. Knowing the empowerment of an agent in a specific state is useful, for example to construct an empowerment maximising policy. State-specific empowerment can be found using the more general formalism for 'contextual empowerment'.^[4] $C$ is a random variable describing the context (e.g. state).

${\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C)=\sum _{c{\in }C}p(c){\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C=c)$

Application

Empowerment maximisation can be used as a pseudo-utility function to enable agents to exhibit intelligent behaviour without requiring the definition of external goals, for example balancing a pole in a cart-pole balancing scenario where no indication of the task is provided to the agent.^[4] Empowerment has been applied in studies of collective behaviour^[5] and in continuous domains.^[6]^[7] As is the case with Bayesian methods in general, computation of empowerment becomes computationally expensive as the number of actions and time horizon extends, but approaches to improve efficiency have led to usage in real-time control.^[8] Empowerment has been used for intrinsically motivated reinforcement learning agents playing video games,^[9] and in the control of underwater vehicles.^[10]

Related Research Articles

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment, and it can handle problems with stochastic transitions and rewards without requiring adaptations.

In probability theory and machine learning, the multi-armed bandit problem is a problem in which a decision maker iteratively selects one of multiple fixed choices when the properties of each choice are only partially known at the time of allocation, and may become better understood as time passes. A fundamental aspect of bandit problems is that choosing an arm does not affect the properties of the arm or other arms.

A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot directly observe the underlying state. Instead, it must maintain a sensor model and the underlying MDP. Unlike the policy function in MDP which maps the underlying states to the actions, POMDP's policy is a mapping from the history of observations to the actions.

Distributed constraint optimization is the distributed analogue to constraint optimization. A DCOP is a problem in which a group of agents must distributedly choose values for a set of variables such that the cost of a set of constraints over the variables is minimized.

In conformal field theory and representation theory, a W-algebra is an associative algebra that generalizes the Virasoro algebra. W-algebras were introduced by Alexander Zamolodchikov, and the name "W-algebra" comes from the fact that Zamolodchikov used the letter W for one of the elements of one of his examples.

A Boolean network consists of a discrete set of Boolean variables each of which has a Boolean function assigned to it which takes inputs from a subset of those variables and output that determines the state of the variable it is assigned to. This set of functions in effect determines a topology (connectivity) on the set of variables, which then become nodes in a network. Usually, the dynamics of the system is taken as a discrete time series where the state of the entire network at time t+1 is determined by evaluating each variable's function on the state of the network at time t. This may be done synchronously or asynchronously.

Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models, and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps. The name is made in analogy with long-term memory and short-term memory and their relationship, studied by cognitive psychologists since the early 20th century.

<span class="mw-page-title-main">One-way quantum computer</span> Method of quantum computing

The one-way quantum computer, also known as measurement-based quantum computer (MBQC), is a method of quantum computing that first prepares an entangled resource state, usually a cluster state or graph state, then performs single qubit measurements on it. It is "one-way" because the resource state is destroyed by the measurements.

A memristor is a non-linear two-terminal electrical component relating electric charge and magnetic flux linkage. It was described and named in 1971 by Leon Chua, completing a theoretical quartet of fundamental electrical components which also comprises the resistor, capacitor and inductor.

Adiabatic quantum computation (AQC) is a form of quantum computing which relies on the adiabatic theorem to perform calculations and is closely related to quantum annealing.

Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions. Since the 1980s, this research field has captured the attention of several computer science communities due to its strength in providing personalized support for many different applications and its connection to many different fields of study such as medicine, human-computer interaction, or sociology.

In algorithmic game theory, a succinct game or a succinctly representable game is a game which may be represented in a size much smaller than its normal form representation. Without placing constraints on player utilities, describing a game of $players, each facing strategies, requires listing utility values. Even trivial algorithms are capable of finding a Nash equilibrium in a time polynomial in the length of such a large input. A succinct game is of polynomial type if in a game represented by a string of length n the number of players, as well as the number of strategies of each player, is bounded by a polynomial in n .$

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that address the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Mean-field game theory is the study of strategic decision making by small interacting agents in very large populations. It lies at the intersection of game theory with stochastic analysis and control theory. The use of the term "mean field" is inspired by mean-field theory in physics, which considers the behavior of systems of large numbers of particles where individual particles have negligible impacts upon the system. In other words, each agent acts according to his minimization or maximization problem taking into account other agents’ decisions and because their population is large we can assume the number of agents goes to infinity and a representative agent exists.

The free energy principle is a theoretical framework suggesting that the brain reduces surprise or uncertainty by making predictions based on internal models and updating them using sensory input. It highlights the brain's objective of aligning its internal model and the external world to enhance prediction accuracy. This principle integrates Bayesian inference with active inference, where actions are guided by predictions and sensory feedback refines them. It has wide-ranging implications for comprehending brain function, perception, and action.

The theory of causal fermion systems is an approach to describe fundamental physics. It provides a unification of the weak, the strong and the electromagnetic forces with gravity at the level of classical field theory. Moreover, it gives quantum mechanics as a limiting case and has revealed close connections to quantum field theory. Therefore, it is a candidate for a unified physical theory. Instead of introducing physical objects on a preexisting spacetime manifold, the general concept is to derive spacetime as well as all the objects therein as secondary objects from the structures of an underlying causal fermion system. This concept also makes it possible to generalize notions of differential geometry to the non-smooth setting. In particular, one can describe situations when spacetime no longer has a manifold structure on the microscopic scale. As a result, the theory of causal fermion systems is a proposal for quantum geometry and an approach to quantum gravity.

Daniel Polani is a professor of Artificial Intelligence and Director of the Centre for Computer Science and Informatics Research (CCSIR), and Head of the Adaptive Systems Research Group, and leader of the SEPIA Lab at the University of Hertfordshire.

Intrinsic motivation in the study of artificial intelligence and any robotics is a mechanism for enabling artificial agents to exhibit inherently rewarding behaviours such as exploration and curiosity, grouped under the same term in the study of psychology. Psychologists consider intrinsic motivation in humans to be the drive to perform an activity for inherent satisfaction – just for the fun or challenge of it.

References

↑ Klyubin, A., Polani, D., and Nehaniv, C. (2005a). All else being equal be empowered. Advances in Artificial Life, pages 744–753.
↑ Klyubin, A., Polani, D., and Nehaniv, C. (2005b). Empowerment: A universal agent- centric measure of control. In Evolutionary Computation, 2005. The 2005 IEEE Congress on, volume 1, pages 128–135. IEEE.
1 2 Salge, C; Glackin, C; Polani, D (2014). "Empowerment -- An Introduction". In Prokopenko, M (ed.). Guided Self-Organization: Inception. Emergence, Complexity and Computation. Vol. 9. Springer. pp. 67–114. arXiv: 1310.1863 . doi:10.1007/978-3-642-53734-9_4. ISBN 978-3-642-53733-2. S2CID 9662065.
1 2 3 Klyubin, A., Polani, D., and Nehaniv, C. (2008). Keep your options open: an information-based driving principle for sensorimotor systems. PLOS ONE, 3(12):e4018. https://dx.doi.org/10.1371%2Fjournal.pone.0004018
↑ Capdepuy, P., Polani, D., & Nehaniv, C. L. (2007, April). Maximization of potential information flow as a universal utility for collective behaviour. In 2007 IEEE Symposium on Artificial Life (pp. 207-213). Ieee.
↑ Jung, T., Polani, D., & Stone, P. (2011). Empowerment for continuous agent—environment systems. Adaptive Behavior, 19(1), 16-39.
↑ Salge, C., Glackin, C., & Polani, D. (2013). Approximation of empowerment in the continuous domain. Advances in Complex Systems, 16(02n03), 1250079.
↑ Karl, M., Soelch, M., Becker-Ehmck, P., Benbouzid, D., van der Smagt, P., & Bayer, J. (2017). Unsupervised real-time control through variational empowerment. arXiv preprint arXiv:1710.05101.
↑ Mohamed, S., & Rezende, D. J. (2015). Variational information maximisation for intrinsically motivated reinforcement learning. arXiv preprint arXiv:1509.08731.
↑ Volpi, N. C., De Palma, D., Polani, D., & Indiveri, G. (2016). Computation of empowerment for an autonomous underwater vehicle. IFAC-PapersOnLine, 49(15), 81-87.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[klyubin2005a-1] Klyubin, A., Polani, D., and Nehaniv, C. (2005a). All else being equal be empowered. Advances in Artificial Life, pages 744–753.

[klyubin2005b-2] Klyubin, A., Polani, D., and Nehaniv, C. (2005b). Empowerment: A universal agent- centric measure of control. In Evolutionary Computation, 2005. The 2005 IEEE Congress on, volume 1, pages 128–135. IEEE.

[salge2014-3] 1 2 Salge, C; Glackin, C; Polani, D (2014). "Empowerment -- An Introduction". In Prokopenko, M (ed.). Guided Self-Organization: Inception. Emergence, Complexity and Computation. Vol. 9. Springer. pp. 67–114. arXiv: 1310.1863 . doi:10.1007/978-3-642-53734-9_4. ISBN 978-3-642-53733-2. S2CID 9662065.

[klyubin2008-4] 1 2 3 Klyubin, A., Polani, D., and Nehaniv, C. (2008). Keep your options open: an information-based driving principle for sensorimotor systems. PLOS ONE, 3(12):e4018. https://dx.doi.org/10.1371%2Fjournal.pone.0004018

[capdepuy2007-5] Capdepuy, P., Polani, D., & Nehaniv, C. L. (2007, April). Maximization of potential information flow as a universal utility for collective behaviour. In 2007 IEEE Symposium on Artificial Life (pp. 207-213). Ieee.

[jung2011-6] Jung, T., Polani, D., & Stone, P. (2011). Empowerment for continuous agent—environment systems. Adaptive Behavior, 19(1), 16-39.

[salge2013-7] Salge, C., Glackin, C., & Polani, D. (2013). Approximation of empowerment in the continuous domain. Advances in Complex Systems, 16(02n03), 1250079.

[karl2017-8] Karl, M., Soelch, M., Becker-Ehmck, P., Benbouzid, D., van der Smagt, P., & Bayer, J. (2017). Unsupervised real-time control through variational empowerment. arXiv preprint arXiv:1710.05101.

[rezende2015-9] Mohamed, S., & Rezende, D. J. (2015). Variational information maximisation for intrinsically motivated reinforcement learning. arXiv preprint arXiv:1509.08731.

[volpi2016-10] Volpi, N. C., De Palma, D., Polani, D., & Indiveri, G. (2016). Computation of empowerment for an autonomous underwater vehicle. IFAC-PapersOnLine, 49(15), 81-87.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]