Action model learning

Last updated

Action model learning (sometimes abbreviated action learning) is an area of machine learning concerned with creation and modification of software agent's knowledge about effects and preconditions of the actions that can be executed within its environment. This knowledge is usually represented in logic-based action description language and used as the input for automated planners.

Contents

Learning action models is important when goals change. When an agent acted for a while, it can use its accumulated knowledge about actions in the domain to make better decisions. Thus, learning action models differs from reinforcement learning. It enables reasoning about actions instead of expensive trials in the world. [1] Action model learning is a form of inductive reasoning, where new knowledge is generated based on agent's observations. It differs from standard supervised learning in that correct input/output pairs are never presented, nor imprecise action models explicitly corrected.

Usual motivation for action model learning is the fact that manual specification of action models for planners is often a difficult, time consuming, and error-prone task (especially in complex environments).

Action models

Given a training set consisting of examples , where are observations of a world state from two consecutive time steps and is an action instance observed in time step , the goal of action model learning in general is to construct an action model, where is a description of domain dynamics in action description formalism like STRIPS, ADL or PDDL and is a probability function defined over the elements of . [2] However, many state of the art action learning methods assume determinism and do not induce . In addition to determinism, individual methods differ in how they deal with other attributes of domain (e.g. partial observability or sensoric noise).

Action learning methods

State of the art

Recent action learning methods take various approaches and employ a wide variety of tools from different areas of artificial intelligence and computational logic. As an example of a method based on propositional logic, we can mention SLAF (Simultaneous Learning and Filtering) algorithm, [1] which uses agent's observations to construct a long propositional formula over time and subsequently interprets it using a satisfiability (SAT) solver. Another technique, in which learning is converted into a satisfiability problem (weighted MAX-SAT in this case) and SAT solvers are used, is implemented in ARMS (Action-Relation Modeling System). [3] Two mutually similar, fully declarative approaches to action learning were based on logic programming paradigm Answer Set Programming (ASP) [4] and its extension, Reactive ASP. [5] In another example, bottom-up inductive logic programming approach was employed. [6] Several different solutions are not directly logic-based. For example, the action model learning using a perceptron algorithm [7] or the multi level greedy search over the space of possible action models. [8] In the older paper from 1992, [9] the action model learning was studied as an extension of reinforcement learning.

Literature

Most action learning research papers are published in journals and conferences focused on artificial intelligence in general (e.g. Journal of Artificial Intelligence Research (JAIR), Artificial Intelligence, Applied Artificial Intelligence (AAI) or AAAI conferences). Despite mutual relevance of the topics, action model learning is usually not addressed in planning conferences like the International Conference on Automated Planning and Scheduling (ICAPS).

See also

Related Research Articles

Logic programming is a programming, database and knowledge representation paradigm based on formal logic. A logic program is a set of sentences in logical form, representing knowledge about some problem domain. Computation is performed by applying logical reasoning to that knowledge, to solve problems in the domain. Major logic programming language families include Prolog, Answer Set Programming (ASP) and Datalog. In all of these languages, rules are written in the form of clauses:

Inductive logic programming (ILP) is a subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. The term "inductive" here refers to philosophical rather than mathematical induction. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent ought to take actions in a dynamic environment in order to maximize the cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

<span class="mw-page-title-main">Symbolic artificial intelligence</span> Methods in artificial intelligence research

In artificial intelligence, symbolic artificial intelligence is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, logic and search. Symbolic AI used tools such as logic programming, production rules, semantic nets and frames, and it developed applications such as knowledge-based systems, symbolic mathematics, automated theorem provers, ontologies, the semantic web, and automated planning and scheduling systems. The Symbolic AI paradigm led to seminal ideas in search, symbolic programming languages, agents, multi-agent systems, the semantic web, and the strengths and limitations of formal knowledge and reasoning systems.

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment, and it can handle problems with stochastic transitions and rewards without requiring adaptations.

Automated planning and scheduling, sometimes denoted as simply AI planning, is a branch of artificial intelligence that concerns the realization of strategies or action sequences, typically for execution by intelligent agents, autonomous robots and unmanned vehicles. Unlike classical control and classification problems, the solutions are complex and must be discovered and optimized in multidimensional space. Planning is also related to decision theory.

The Planning Domain Definition Language (PDDL) is an attempt to standardize Artificial Intelligence (AI) planning languages. It was first developed by Drew McDermott and his colleagues in 1998 mainly to make the 1998/2000 International Planning Competition (IPC) possible, and then evolved with each competition. The standardization provided by PDDL has the benefit of making research more reusable and easily comparable, though at the cost of some expressive power, compared to domain-specific systems.

<span class="mw-page-title-main">Intelligent agent</span> Software agent which acts autonomously

In artificial intelligence, an intelligent agent (IA) is an agent acting in an intelligent manner; It perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or acquiring knowledge. An intelligent agent may be simple or complex: A thermostat or other control system is considered an example of an intelligent agent, as is a human being, as is any system that meets the definition, such as a firm, a state, or a biome.

<span class="mw-page-title-main">DPLL algorithm</span>

In logic and computer science, the Davis–Putnam–Logemann–Loveland (DPLL) algorithm is a complete, backtracking-based search algorithm for deciding the satisfiability of propositional logic formulae in conjunctive normal form, i.e. for solving the CNF-SAT problem.

In computer science, GSAT and WalkSAT are local search algorithms to solve Boolean satisfiability problems.

Satplan is a method for automated planning. It converts the planning problem instance into an instance of the Boolean satisfiability problem, which is then solved using a method for establishing satisfiability such as the DPLL algorithm or WalkSAT.

The blocks world is a planning domain in artificial intelligence. The algorithm is similar to a set of wooden blocks of various shapes and colors sitting on a table. The goal is to build one or more vertical stacks of blocks. Only one block may be moved at a time: it may either be placed on the table or placed atop another block. Because of this, any blocks that are, at a given time, under another block cannot be moved. Moreover, some kinds of blocks cannot have other blocks stacked on top of them.

A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot directly observe the underlying state. Instead, it must maintain a sensor model and the underlying MDP. Unlike the policy function in MDP which maps the underlying states to the actions, POMDP's policy is a mapping from the history of observations to the actions.

<span class="mw-page-title-main">Sven Koenig (computer scientist)</span> German computer scientist

Sven Koenig is a full professor in computer science at the University of Southern California. He received an M.S. degree in computer science from the University of California at Berkeley in 1991 and a Ph.D. in computer science from Carnegie Mellon University in 1997, advised by Reid Simmons.

In information technology a reasoning system is a software system that generates conclusions from available knowledge using logical techniques such as deduction and induction. Reasoning systems play an important role in the implementation of artificial intelligence and knowledge-based systems.

In computer science, an action language is a language for specifying state transition systems, and is commonly used to create formal models of the effects of actions on the world. Action languages are commonly used in the artificial intelligence and robotics domains, where they describe how actions affect the states of systems over time, and may be used for automated planning.

Inductive programming (IP) is a special area of automatic programming, covering research from artificial intelligence and programming, which addresses learning of typically declarative and often recursive programs from incomplete specifications, such as input/output examples or constraints.

<span class="mw-page-title-main">Shlomo Zilberstein</span>

Shlomo Zilberstein is an Israeli-American computer scientist. He is a Professor of Computer Science and Associate Dean for Research and Engagement in the College of Information and Computer Sciences at the University of Massachusetts, Amherst. He graduated with a B.A. in Computer Science summa cum laude from Technion – Israel Institute of Technology in 1982, and received a Ph.D. in Computer Science from University of California at Berkeley in 1993, advised by Stuart J. Russell. He is known for his contributions to artificial intelligence, anytime algorithms, multi-agent systems, and automated planning and scheduling algorithms, notably within the context of Markov decision processes (MDPs), Partially Observable MDPs (POMDPs), and Decentralized POMDPs (Dec-POMDPs).

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

<span class="mw-page-title-main">Multi-agent pathfinding</span> Pathfinding problem

The problem of Multi-Agent Pathfinding (MAPF) is an instance of multi-agent planning and consists in the computation of collision-free paths for a group of agents from their location to an assigned target. It is an optimization problem, since the aim is to find those paths that optimize a given objective function, usually defined as the number of time steps until all agents reach their goal cells. MAPF is the multi-agent generalization of the pathfinding problem, and it is closely related to the shortest path problem in the context of graph theory.

References

  1. 1 2 Amir, Eyal; Chang, Allen (2008). "Learning Partially Observable Deterministic Action Models". Journal of Artificial Intelligence Research. 33: 349–402. arXiv: 1401.3437 . doi:10.1613/jair.2575. S2CID   9432224.
  2. Čertický, Michal (2014). "Real-Time Action Model Learning with Online Algorithm 3SG". Applied Artificial Intelligence. 28 (7): 690–711. doi: 10.1080/08839514.2014.927692 . S2CID   8210810.
  3. Yang, Qiang; Kangheng, Wu; Yunfei, Jiang (2007). "Learning action models from plan examples using weighted MAX-SAT". Artificial Intelligence. 171 (2–3): 107–143. CiteSeerX   10.1.1.135.9266 . doi: 10.1016/j.artint.2006.11.005 .
  4. Balduccini, Marcelo (2007). "Learning Action Descriptions with A-Prolog: Action Language C". AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning: 13–18.
  5. Čertický, Michal (2012). "Action Learning with Reactive Answer Set Programming: Preliminary Report". ICAS 2012 : The Eighth International Conference on Autonomic and Autonomous Systems. pp. 107–111. ISBN   9781612081878.
  6. Benson, Scott (1995). "Inductive learning of reactive action models". Machine Learning: Proceedings of the Twelfth International Conference (ICML).
  7. Mourao, Kira; Petrick, Ronald; Steedman, Mark (2010). "Learning action effects in partially observable domains". Frontiers in Artificial Intelligence and Applications. 215 (ECAI 2010): 973–974. doi:10.3233/978-1-60750-606-5-973.
  8. Zettlemoyer, Luke; Pasula, Hanna; Kaelblin, Leslie Pack (2005). "Learning planning rules in noisy stochastic worlds". AAAI: 911–918.
  9. Lin, Long-Ji (1992). "Self-improving reactive agents based on reinforcement learning, planning and teaching". Machine Learning. 8 (3–4): 293–321. doi: 10.1023/A:1022628806385 .