Intrinsic motivation (artificial intelligence)

Last updated

Intrinsic motivation in the study of artificial intelligence and any robotics is a mechanism for enabling artificial agents (including robots) to exhibit inherently rewarding behaviours such as exploration and curiosity, grouped under the same term in the study of psychology. Psychologists consider intrinsic motivation in humans to be the drive to perform an activity for inherent satisfaction – just for the fun or challenge of it. [1]

Contents

Definition

An intelligent agent is intrinsically motivated to act if the information content alone, or the experience resulting from the action, is the motivating factor.

Information content in this context is measured in the information-theoretic sense of quantifying uncertainty. A typical intrinsic motivation is to search for unusual, surprising situations (exploration), in contrast to a typical extrinsic motivation such as the search for food (homeostasis). [2] Extrinsic motivations are typically described in artificial intelligence as task-dependent or goal-directed.

Origins in psychology

The study of intrinsic motivation in psychology and neuroscience began in the 1950s with some psychologists explaining exploration through drives to manipulate and explore, however, this homeostatic view was criticised by White. [3] An alternative explanation from Berlyne in 1960 was the pursuit of an optimal balance between novelty and familiarity. [4] Festinger described the difference between internal and external view of the world as dissonance that organisms are motivated to reduce. [5] A similar view was expressed in the '70s by Kagan as the desire to reduce the incompatibility between cognitive structure and experience. [6] In contrast to the idea of optimal incongruity, Deci and Ryan identified in the mid 80's an intrinsic motivation based on competence and self-determination. [7]

Computational models

An influential early computational approach to implement artificial curiosity in the early 1990s by Schmidhuber, has since been developed into a "Formal theory of creativity, fun, and intrinsic motivation”. [8]

Intrinsic motivation is often studied in the framework of computational reinforcement learning [9] [10] (introduced by Sutton and Barto), where the rewards that drive agent behaviour are intrinsically derived rather than externally imposed and must be learnt from the environment. [11] Reinforcement learning is agnostic to how the reward is generated - an agent will learn a policy (action strategy) from the distribution of rewards afforded by actions and the environment. Each approach to intrinsic motivation in this scheme is essentially a different way of generating the reward function for the agent.

Curiosity vs. exploration

Intrinsically motivated artificial agents exhibit behaviour that resembles curiosity or exploration. Exploration in artificial intelligence and robotics has been extensively studied in reinforcement learning models, [12] usually by encouraging the agent to explore as much of the environment as possible, to reduce uncertainty about the dynamics of the environment (learning the transition function) and how best to achieve its goals (learning the reward function). Intrinsic motivation, in contrast, encourages the agent to first explore aspects of the environment that confer more information, to seek out novelty. Recent work unifying state visit count exploration and intrinsic motivation has shown faster learning in a video game setting. [13]

Types of models

Ouedeyer and Kaplan have made a substantial contribution to the study of intrinsic motivation. [14] [2] [15] They define intrinsic motivation based on Berlyne's theory, [4] and divide approaches to the implementation of intrinsic motivation into three categories that broadly follow the roots in psychology: "knowledge-based models", "competence-based models" and "morphological models". [2] Knowledge-based models are further subdivided into "information-theoretic" and "predictive". [15] Baldassare and Mirolli present a similar typology, differentiating knowledge-based models between prediction-based and novelty-based. [16]

Information-theoretic intrinsic motivation

The quantification of prediction and novelty to drive behaviour is generally enabled through the application of information-theoretic models, where agent state and strategy (policy) over time are represented by probability distributions describing a markov decision process and the cycle of perception and action treated as an information channel. [17] [18] These approaches claim biological feasibility as part of a family of bayesian approaches to brain function. The main criticism and difficulty of these models is the intractability of computing probability distributions over large discrete or continuous state spaces. [2] Nonetheless, a considerable body of work has built up modelling the flow of information around the sensorimotor cycle, leading to de facto reward functions derived from the reduction of uncertainty, including most notably active inference, [19] but also infotaxis, [20] predictive information, [21] [22] and empowerment. [23]

Competence-based models

Steels' autotelic principle [24] is an attempt to formalise flow (psychology). [25]

Achievement, affiliation and power models

Other intrinsic motives that have been modelled computationally include achievement, affiliation and power motivation. [26] These motives can be implemented as functions of probability of success or incentive. Populations of agents can include individuals with different profiles of achievement, affiliation and power motivation, modelling population diversity and explaining why different individuals take different actions when faced with the same situation.

Beyond achievement, affiliation and power

A more recent computational theory of intrinsic motivation attempts to explain a large variety of psychological findings based on such motives. Notably this model of intrinsic motivation goes beyond just achievement, affiliation and power, by taking into consideration other important human motives. Empirical data from psychology were computationally simulated and accounted for using this model. [27]

Intrinsically Motivated Learning

Intrinsically motivated (or curiosity-driven) learning is an emerging research topic in artificial intelligence and developmental robotics [28] that aims to develop agents that can learn general skills or behaviours, that can be deployed to improve performance in extrinsic tasks, such as acquiring resources. [29] Intrinsically motivated learning has been studied as an approach to autonomous lifelong learning in machines [30] [31] and open-ended learning in computer game characters. [32] In particular, when the agent learns a meaningful abstract representation, a notion of distance between two representations can be used to gauge novelty, hence allowing for an efficient exploration of its environment. [33] Despite the impressive success of deep learning in specific domains (e.g. AlphaGo), many in the field (e.g. Gary Marcus) have pointed out that the ability to generalise remains a fundamental challenge in artificial intelligence. Intrinsically motivated learning, although promising in terms of being able to generate goals from the structure of the environment without externally imposed tasks, faces the same challenge of generalisation – how to reuse policies or action sequences, how to compress and represent continuous or complex state spaces and retain and reuse the salient features that have been learnt. [29]

See also

Related Research Articles

Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

Content theory is a subset of motivational theories that try to define what motivates people. Content theories of motivation often describe a system of needs that motivate peoples' actions. While process theories of motivation attempt to explain how and why our motivations affect our behaviors, content theories of motivation attempt to define what those motives or needs are. Content theory includes the work of David McClelland, Abraham Maslow and other psychologists.

<span class="mw-page-title-main">Curiosity</span> Quality related to inquisitive thinking

Curiosity is a quality related to inquisitive thinking, such as exploration, investigation, and learning, evident in humans and other animals. Curiosity helps human development, from which derives the process of learning and desire to acquire knowledge and skill.

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment, and it can handle problems with stochastic transitions and rewards without requiring adaptations.

Developmental robotics (DevRob), sometimes called epigenetic robotics, is a scientific field which aims at studying the developmental mechanisms, architectures and constraints that allow lifelong and open-ended learning of new skills and new knowledge in embodied machines. As in human children, learning is expected to be cumulative and of progressively increasing complexity, and to result from self-exploration of the world in combination with social interaction. The typical methodological approach consists in starting from theories of human and animal development elaborated in fields such as developmental psychology, neuroscience, developmental and evolutionary biology, and linguistics, then to formalize and implement them in robots, sometimes exploring extensions or variants of them. The experimentation of those models in robots allows researchers to confront them with reality, and as a consequence, developmental robotics also provides feedback and novel hypotheses on theories of human and animal development.

The overjustification effect occurs when an expected external incentive such as money or prizes decreases a person's intrinsic motivation to perform a task. Overjustification is an explanation for the phenomenon known as motivational "crowding out". The overall effect of offering a reward for a previously unrewarded activity is a shift to extrinsic motivation and the undermining of pre-existing intrinsic motivation. Once rewards are no longer offered, interest in the activity is lost; prior intrinsic motivation does not return, and extrinsic rewards must be continuously offered as motivation to sustain the activity.

Self-determination theory (SDT) is a macro theory of human motivation and personality that concerns people's innate growth tendencies and innate psychological needs. It pertains to the motivation behind people's choices in the absence of external influences and distractions. SDT focuses on the degree to which human behavior is self-motivated and self-determined.

Cognitive Robotics or Cognitive Technology is a subfield of robotics concerned with endowing a robot with intelligent behavior by providing it with a processing architecture that will allow it to learn and reason about how to behave in response to complex goals in a complex world. Cognitive robotics may be considered the engineering branch of embodied cognitive science and embodied embedded cognition, consisting of Robotic Process Automation, Artificial Intelligence, Machine Learning, Deep Learning, Optical Character Recognition, Image Processing, Process Mining, Analytics, Software Development and System Integration.

Motivation crowding theory is the theory from psychology and microeconomics suggesting that providing extrinsic incentives for certain kinds of behavior—such as promising monetary rewards for accomplishing some task—can sometimes undermine intrinsic motivation for performing that behavior. The result of lowered motivation, in contrast with the predictions of neoclassical economics, can be an overall decrease in the total performance.

Neurorobotics is the combined study of neuroscience, robotics, and artificial intelligence. It is the science and technology of embodied autonomous neural systems. Neural systems include brain-inspired algorithms, computational models of biological neural networks and actual biological systems. Such neural systems can be embodied in machines with mechanic or any other forms of physical actuation. This includes robots, prosthetic or wearable systems but also, at smaller scale, micro-machines and, at the larger scales, furniture and infrastructures.

Psi-theory, developed by Dietrich Dörner at the University of Bamberg, is a systemic psychological theory covering human action regulation, intention selection and emotion. It models the human mind as an information processing agent, controlled by a set of basic physiological, social and cognitive drives. Perceptual and cognitive processing are directed and modulated by these drives, which allow the autonomous establishment and pursuit of goals in an open environment.

Incentivisation or incentivization is the practice of building incentives into an arrangement or system in order to motivate the actors within it. It is based on the idea that individuals within such systems can perform better not only when they are coerced but also when they are given rewards.

The LIDA cognitive architecture attempts to model a broad spectrum of cognition in biological systems, from low-level perception/action to high-level reasoning. Developed primarily by Stan Franklin and colleagues at the University of Memphis, the LIDA architecture is empirically grounded in cognitive science and cognitive neuroscience. It is an extension of IDA, which adds mechanisms for learning. In addition to providing hypotheses to guide further research, the architecture can support control structures for software agents and robots. Providing plausible explanations for many cognitive processes, the LIDA conceptual model is also intended as a tool with which to think about how minds work.

Dr. Pierre-Yves Oudeyer is Research Director at the French Institute for Research in Computer Science and Automation (Inria) and head of the Inria and Ensta-ParisTech FLOWERS team. Before, he has been a permanent researcher in Sony Computer Science Laboratory for 8 years (1999-2007). He studied theoretical computer science at Ecole Normale Supérieure in Lyon, and received his Ph.D. degree in artificial intelligence from the University Paris VI, France. After working on computational models of language evolution, he is now working on developmental and social robotics, focusing on sensorimotor development, language acquisition and lifelong learning in robots. Strongly inspired by infant development, the mechanisms he studies include artificial curiosity, intrinsic motivation, the role of morphology in learning motor control, human-robot interfaces, joint attention and joint intentional understanding, and imitation learning. He has published a book, more than 80 papers in international journals and conferences, holds 8 patents, gave several invited keynote lectures in international conferences, and received several prizes for his work in developmental robotics and on the origins of language. In particular, he is laureate of the ERC Starting Grant EXPLORERS. He is editor of the IEEE CIS Newsletter on Autonomous Mental Development, and associate editor of IEEE Transactions on Autonomous Mental Development, Frontiers in Neurorobotics, and of the International Journal of Social Robotics. He is also working actively for the diffusion of science towards the general public, through the writing of popular science articles and participation to radio and TV programs as well as science exhibitions.

Praise as a form of social interaction expresses recognition, reassurance or admiration.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence (AI), its subdisciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

Deep reinforcement learning is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs and decide what actions to perform to optimize an objective. Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare.

<span class="mw-page-title-main">Multi-agent reinforcement learning</span> Sub-field of reinforcement learning

Multi-agent reinforcement learning (MARL) is a sub-field of reinforcement learning. It focuses on studying the behavior of multiple learning agents that coexist in a shared environment. Each agent is motivated by its own rewards, and does actions to advance its own interests; in some environments these interests are opposed to the interests of other agents, resulting in complex group dynamics.

Empowerment in the field of artificial intelligence formalises and quantifies the potential an agent perceives that it has to influence its environment. An agent which follows an empowerment maximising policy, acts to maximise future options. Empowerment can be used as a (pseudo) utility function that depends only on information gathered from the local environment to guide action, rather than seeking an externally imposed goal, thus is a form of intrinsic motivation.

The exploration-exploitation dilemma, also known as the explore-exploit tradeoff, is a fundamental concept in decision-making that arises in many domains. It is depicted as the balancing act between two opposing strategies. Exploitation involves choosing the best option based on current knowledge of the system, while exploration involves trying out new options that may lead to better outcomes in the future at the expense of an exploitation opportunity. Finding the optimal balance between these two strategies is a crucial challenge in many decision-making problems whose goal is to maximize long-term benefits.

References

  1. Ryan, Richard M; Deci, Edward L (2000). "Intrinsic and extrinsic motivations: Classic definitions and new directions". Contemporary Educational Psychology. 25 (1): 54–67. doi:10.1006/ceps.1999.1020. hdl: 20.500.12799/2958 . PMID   10620381. S2CID   1098145.
  2. 1 2 3 4 Oudeyer, Pierre-Yves; Kaplan, Frederic (2008). "How can we define intrinsic motivation?". Proc. of the 8th Conf. on Epigenetic Robotics. Vol. 5. pp. 29–31.
  3. White, R. (1959). "Motivation reconsidered: The concept of competence". Psychological Review. 66 (5): 297–333. doi:10.1037/h0040934. PMID   13844397. S2CID   37385966.
  4. 1 2 Berlyne, D.: Conflict, Arousal and Curiosity. McGraw-Hill, New York (1960)
  5. Festinger, L.: A theory of cognitive dissonance. Evanston, Row, Peterson (1957)
  6. Kagan, J.: Motives and development. Journal of Personality and Social Psychology 22, 51–66
  7. Deci, E.L., Ryan, R.M.: Intrinsic motivation and self-determination in human behavior. Plenum, New York (1985)
  8. Schmidhuber, J (2010). "Formal theory of creativity, fun, and intrinsic motivation (1990-2010)". IEEE Trans. Auton. Mental Dev. 2 (3): 230–247. doi:10.1109/TAMD.2010.2056368. S2CID   234198.
  9. Barto, A., Singh, S., Chentanez, N.: Intrinsically motivated learn- ing of hierarchical collections of skills. In: ICDL 2004. Proceedings of the 3rd International Conference on Development and Learning, Salk Institute, San Diego (2004)
  10. Singh, S., Barto, A. G., and Chentanez, N. (2005). Intrinsically motivated reinforcement learning. In Proceedings of the 18th Annual Conference on Neural Information Processing Systems (NIPS), Vancouver, B.C., Canada.
  11. Barto, A.G.: Intrinsic motivation and reinforcement learning. In: Baldassarre, G., Mirolli, M. (eds.) Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin (2012)
  12. Thrun, S. B. (1992). Efficient Exploration in Reinforcement Learning. https://doi.org/10.1007/978-1-4899-7687-1_244
  13. Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. Advances in Neural Information Processing Systems, 1479–1487.
  14. Kaplan, F. and Oudeyer, P. (2004). Maximizing learning progress: an internal reward system for development. Embodied artificial intelligence, pages 629–629.
  15. 1 2 Oudeyer, P. Y., & Kaplan, F. (2009). What is intrinsic motivation? A typology of computational approaches. Frontiers in Neurorobotics, 3(NOV). https://doi.org/10.3389/neuro.12.006.2007
  16. Baldassarre, Gianluca; Mirolli, Marco (2013). "Intrinsically Motivated Learning Systems: An Overview". Intrinsically Motivated Learning in Natural 1 and Artificial Systems. Rome, Italy: Springer. pp. 1–14.
  17. Klyubin, A., Polani, D., and Nehaniv, C. (2008). Keep your options open: an information-based driving principle for sensorimotor systems. PLOS ONE, 3(12):e4018. https://dx.doi.org/10.1371%2Fjournal.pone.0004018
  18. Biehl, Martin; Guckelsberger, Christian; Salge, Christoph; Smith, Simón C.; Polani, Daniel (2018). "Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop". Frontiers in Neurorobotics. 12: 45. arXiv: 1806.08083 . doi: 10.3389/fnbot.2018.00045 . ISSN   1662-5218. PMC   6125413 . PMID   30214404.
  19. Friston, Karl; Kilner, James; Harrison, Lee (2006). "A free energy principle for the brain" (PDF). Journal of Physiology-Paris. 100 (1–3). Elsevier BV: 70–87. doi:10.1016/j.jphysparis.2006.10.001. ISSN   0928-4257. PMID   17097864. S2CID   637885.
  20. Vergassola, M., Villermaux, E., & Shraiman, B. I. (2007). ‘Infotaxis’ as a strategy for searching without gradients. Nature, 445(7126), 406–409. https://doi.org/10.1038/nature05464
  21. Ay, N., Bertschinger, N., Der, R., Güttler, F. and Olbrich, E. (2008), ‘Predictive information and explorative behavior of autonomous robots’, The European Physical Journal B 63(3), 329–339.
  22. Martius, G., Der, R., and Ay, N. (2013). Information driven self-organization of complex robotic behaviors. PLOS ONE 8:e63400. doi: 10.1371/journal.pone.0063400
  23. Salge, C; Glackin, C; Polani, D (2014). "Empowerment–An Introduction". In Prokopenko, M (ed.). Guided Self-Organization: Inception. Emergence, Complexity and Computation. Vol. 9. Springer. pp. 67–114. arXiv: 1310.1863 . doi:10.1007/978-3-642-53734-9_4. ISBN   978-3-642-53733-2. S2CID   9662065.
  24. Steels, Luc: The autotelic principle. In: Iida, F., Pfeifer, R., Steels, L., Kuniyoshi, Y. (eds.) Embodied Artificial Intelligence. LNCS (LNAI), vol. 3139, pp. 231–242. Springer, Heidelberg (2004)
  25. Csikszentmihalyi, M. (2000). Beyond boredom and anxiety. Jossey-Bass.
  26. Merrick, K. E. (2016). Computational Models of Motivation for Game-Playing Agents. Springer International Publishing, https://doi.org/10.1007/978-3-319-33459-2.
  27. Sun, R., Bugrov, S, and Dai, D. (2022). A unified framework for interpreting a range of motivation-performance phenomena. Cognitive Systems Research, 71, 24–40.
  28. Lungarella, M., Metta, G., Pfeifer, R., and Sandini, G. (2003). Developmental robotics: a survey. Connect. Sci. 15, 151–190. doi: 10.1080/09540090310001655110
  29. 1 2 Santucci, V. G., Oudeyer, P. Y., Barto, A., & Baldassarre, G. (2020). Editorial: Intrinsically motivated open-ended learning in autonomous robots. Frontiers in Neurorobotics, 13(January), 2019–2021. https://doi.org/10.3389/fnbot.2019.00115
  30. Barto, A. G. (2013). “Intrinsic motivation and reinforcement learning,” in Intrinsically Motivated Learning in Natural and Artificial Systems (Berlin; Heidelberg: Springer), 17–47
  31. Mirolli, M., and Baldassarre, G. (2013). “Functions and mechanisms of intrinsic motivations,” in Intrinsically Motivated Learning in Natural and Artificial Systems, eds G. Baldassarre and M. Mirolli (Berlin; Heidelberg: Springer), 49–72
  32. Merrick, K. E., Maher, M-L (2009). Motivated Reinforcement Learning: Curious Characters for Multiuser Games. Springer-Verlag Berlin Heidelberg, https://doi.org/10.1007/978-3-540-89187-1.
  33. Tao, Ruo Yu and Francois-Lavet, Vincent and Pineau, Joelle (2020). Novelty search in representational space for sample efficient exploration. Neural Information Processing Systems, 2020. https://arxiv.org/abs/2009.13579