Automated planning and scheduling

Last updated

Automated planning and scheduling, sometimes denoted as simply AI planning, [1] is a branch of artificial intelligence that concerns the realization of strategies or action sequences, typically for execution by intelligent agents, autonomous robots and unmanned vehicles. Unlike classical control and classification problems, the solutions are complex and must be discovered and optimized in multidimensional space. Planning is also related to decision theory.

In computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals. Computer science defines AI research as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. More specifically, Kaplan and Haenlein define AI as “a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation”. Colloquially, the term "artificial intelligence" is used to describe machines that mimic "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving".

Strategy is a high level plan to achieve one or more goals under conditions of uncertainty. In the sense of the "art of the general", which included several subsets of skills including "tactics", siegecraft, logistics etc., the term came into use in the 6th century C.E. in East Roman terminology, and was translated into Western vernacular languages only in the 18th century. From then until the 20th century, the word "strategy" came to denote "a comprehensive way to try to pursue political ends, including the threat or actual use of force, in a dialectic of wills" in a military conflict, in which both adversaries interact.

Intelligent agent

In artificial intelligence, an intelligent agent (IA) is an autonomous entity which acts, directing its activity towards achieving goals, upon an environment using observation through sensors and consequent actuators. Intelligent agents may also learn or use knowledge to achieve their goals. They may be very simple or very complex. A reflex machine, such as a thermostat, is considered an example of an intelligent agent.


In known environments with available models, planning can be done offline. Solutions can be found and evaluated prior to execution. In dynamically unknown environments, the strategy often needs to be revised online. Models and policies must be adapted. Solutions usually resort to iterative trial and error processes commonly seen in artificial intelligence. These include dynamic programming, reinforcement learning and combinatorial optimization. Languages used to describe planning and scheduling are often called action languages.

Trial and error is a fundamental method of problem solving. It is characterised by repeated, varied attempts which are continued until success, or until the agent stops trying.

Dynamic programming method for solving a complex problem by breaking it down into a collection of simpler subproblems

Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. In both contexts it refers to simplifying a complicated problem by breaking it down into simpler sub-problems in a recursive manner. While some decision problems cannot be taken apart this way, decisions that span several points in time do often break apart recursively. Likewise, in computer science, if a problem can be solved optimally by breaking it into sub-problems and then recursively finding the optimal solutions to the sub-problems, then it is said to have optimal substructure.

Reinforcement learning field of machine learning

Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In the operations research and control literature, reinforcement learning is called approximate dynamic programming, or neuro-dynamic programming. The problems of interest in reinforcement learning have also been studied in the theory of optimal control, which is concerned mostly with the existence and characterization of optimal solutions, and algorithms for their exact computation, and less with learning or approximation, particularly in the absence of a mathematical model of the environment. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality. In machine learning, the environment is typically formulated as a Markov Decision Process (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between the classical dynamic programming methods and reinforcement learning algorithms is that the latter do not assume knowledge of an exact mathematical model of the MDP and they target large MDPs where exact methods become infeasible.


Given a description of the possible initial states of the world, a description of the desired goals, and a description of a set of possible actions, the planning problem is to synthesise a plan that is guaranteed (when applied to any of the initial states) to generate a state which contains the desired goals (such a state is called a goal state).

The difficulty of planning is dependent on the simplifying assumptions employed. Several classes of planning problems can be identified depending on the properties the problems have in several dimensions.

The simplest possible planning problem, known as the Classical Planning Problem, is determined by:

Since the initial state is known unambiguously, and all actions are deterministic, the state of the world after any sequence of actions can be accurately predicted, and the question of observability is irrelevant for classical planning.

Further, plans can be defined as sequences of actions, because it is always known in advance which actions will be needed.

With nondeterministic actions or other events outside the control of the agent, the possible executions form a tree, and plans have to determine the appropriate actions for every node of the tree.

Discrete-time Markov decision processes (MDP) are planning problems with:

A Markov decision process (MDP) is a discrete time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming and reinforcement learning. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Howard's 1960 book, Dynamic Programming and Markov Processes. They are used in many disciplines, including robotics, automatic control, economics and manufacturing. The name of MDPs comes from the Russian mathematician Andrey Markov.

When full observability is replaced by partial observability, planning corresponds to partially observable Markov decision process (POMDP).

A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot directly observe the underlying state. Instead, it must maintain a probability distribution over the set of possible states, based on a set of observations and observation probabilities, and the underlying MDP.

If there are more than one agent, we have multi-agent planning, which is closely related to game theory.

In computer science multi-agent planning involves coordinating the resources and activities of multiple "agents".

Game theory is the study of mathematical models of strategic interaction between rational decision-makers. It has applications in all fields of social science, as well as in logic and computer science. Originally, it addressed zero-sum games, in which one person's gains result in losses for the other participants. Today, game theory applies to a wide range of behavioral relations, and is now an umbrella term for the science of logical decision making in humans, animals, and computers.

Domain independent planning

In AI planning, planners typically input a domain model (a description of a set of possible actions which model the domain) as well as the specific problem to be solved specified by the initial state and goal, in contrast to those in which there is no input domain specified. Such planners are called "domain independent" to emphasis the fact that they can solve planning problems from a wide range of domains. Typical examples of domains are block stacking, logistics, workflow management, and robot task planning. Hence a single domain independent planner can be used to solve planning problems in all these various domains. On the other hand, a route planner is typical of a domain specific planner.

Planning domain modelling languages

The most commonly used languages for representing planning domains and specific planning problems, such as STRIPS and PDDL for Classical Planning, are based on state variables. Each possible state of the world is an assignment of values to the state variables, and actions determine how the values of the state variables change when that action is taken. Since a set of state variables induce a state space that has a size that is exponential in the set, planning, similarly to many other computational problems, suffers from the curse of dimensionality and the combinatorial explosion.

The Planning Domain Definition Language (PDDL) is an attempt to standardize Artificial Intelligence (AI) planning languages. It was first developed by Drew McDermott and his colleagues in 1998 mainly to make the 1998/2000 International Planning Competition (IPC) possible, and then evolved with each competition. "The adoption of a common formalism for describing planning domains fosters far greater reuse of research and allows more direct comparison of systems and approaches, and therefore supports faster progress in the field. A common formalism is a compromise between expressive power and the progress of basic research. The role of a common formalism as a communication medium for exchange demands that it is provided with a clear semantics."

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic optimization.

An alternative language for describing planning problems is that of hierarchical task networks, in which a set of tasks is given, and each task can be either realized by a primitive action or decomposed into a set of other tasks. This does not necessarily involve state variables, although in more realistic applications state variables simplify the description of task networks.

Algorithms for planning

Classical planning

Reduction to other problems

Temporal planning

Temporal planning can be solved with methods similar to classical planning. The main difference is, because of the possibility of several, temporally overlapping actions with a duration being taken concurrently, that the definition of a state has to include information about the current absolute time and how far the execution of each active action has proceeded. Further, in planning with rational or real time, the state space may be infinite, unlike in classical planning or planning with integer time. Temporal planning is closely related to scheduling problems. Temporal planning can also be understood in terms of timed automata.

Probabilistic planning

Probabilistic planning can be solved with iterative methods such as value iteration and policy iteration, when the state space is sufficiently small. With partial observability, probabilistic planning is similarly solved with iterative methods, but using a representation of the value functions defined for the space of beliefs instead of states.

Preference-based planning

In preference-based planning, the objective is not only to produce a plan but also to satisfy user-specified preferences. A difference to the more common reward-based planning, for example corresponding to MDPs, preferences don't necessarily have a precise numerical value.

Conditional planning

Deterministic planning was introduced with the STRIPS planning system, which is a hierarchical planner. Action names are ordered in a sequence and this is a plan for the robot. Hierarchical planning can be compared with an automatic generated behavior tree. [2] The disadvantage is, that a normal behavior tree is not so expressive like a computer program. That means, the notation of a behavior graph contains action commands, but no loops or if-then-statements. Conditional planning overcomes the bottleneck and introduces an elaborated notation which is similar to a control flow, known from other programming languages like Pascal. It is very similar to program synthesis, that means a planner generates sourcecode which can be executed by an interpreter. [3]

An early example of a conditional planner is “Warplan-C” which was introduced in the mid 1970s. [4] Until now, the question was not answered what the difference is between a normal sequence and a complicated plan, which contains if-then-statements. It has to do with uncertainty at runtime of a plan. The idea is, that a plan can react to sensor signals which are unknown for the planner. The planner generates two choices in advance. For example, if an object was detected, then action A is executed, if an object is missing, then action B is executed. [5] A major advantage of conditional planning is the ability to handle partial plans. [6] An agent is not forced to plan everything from start to finish but can divide the problem into chunks. This helps to reduce the state space and solves much more complex problems.

Deployment of planning systems

See also


Related Research Articles

NP (complexity) computational complexity class of decision problems solvable by a non-deterministic Turing machine in polynomial time.

In computational complexity theory, NP is a complexity class used to classify decision problems. NP is the set of decision problems for which the problem instances, where the answer is "yes", have proofs verifiable in polynomial time.

Nondeterministic algorithm

In computer science, a nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs, as opposed to a deterministic algorithm. There are several ways an algorithm may behave differently from run to run. A concurrent algorithm can perform differently on different runs due to a race condition. A probabilistic algorithm's behaviors depends on a random number generator. An algorithm that solves a problem in nondeterministic polynomial time can run in polynomial time or exponential time depending on the choices it makes during execution. The nondeterministic algorithms are often used to find an approximation to a solution, when the exact solution would be too costly to obtain using a deterministic one.

Soar is a cognitive architecture, originally created by John Laird, Allen Newell, and Paul Rosenbloom at Carnegie Mellon University. It is now maintained and developed by John Laird's research group at the University of Michigan.

The StanfordResearchInstituteProblemSolver, known by its acronym STRIPS, is an automated planner developed by Richard Fikes and Nils Nilsson in 1971 at SRI International. The same name was later used to refer to the formal language of the inputs to this planner. This language is the base for most of the languages for expressing automated planning problem instances in use today; such languages are commonly known as action languages. This article only describes the language, not the planner.

In artificial intelligence, hierarchical task network (HTN) planning is an approach to automated planning in which the dependency among actions can be given in the form of hierarchically structured networks.

Motion planning is a term used in robotics for the process of breaking down a desired movement task into discrete motions that satisfy movement constraints and possibly optimize some aspect of the movement.

Action selection is a way of characterizing the most basic problem of intelligent systems: what to do next. In artificial intelligence and computational cognitive science, "the action selection problem" is typically associated with intelligent agents and animats—artificial systems that exhibit complex behaviour in an agent environment. The term is also sometimes used in ethology or animal behavior.

In artificial intelligence, Action description language (ADL) is an automated planning and scheduling system in particular for robots. It is considered an advancement of STRIPS. Edwin Pednault proposed this language in 1987. It is an example of an action language.

In computational complexity theory, the complexity class 2-EXPTIME is the set of all decision problems solvable by a deterministic Turing machine in O(22p) time, where p(n) is a polynomial function of n.

In artificial intelligence, apprenticeship learning is the process of learning by observing an expert. It can be viewed as a form of supervised learning, where the training dataset consists of task executions by a demonstration teacher.

In probability theory, a Markov model is a stochastic model used to model randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it. Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in the fields of predictive modelling and probabilistic forecasting, it is desirable for a given model to exhibit the Markov property.

Atom is a domain-specific language (DSL) in Haskell, for designing real-time embedded software.

Partial-order planning is an approach to automated planning that maintains a partial ordering between actions and only commits ordering between actions when forced to. By contrast, total-order planning maintains a total ordering between all actions at every stage of planning. Given a problem in which some sequence of actions is required in order to achieve a goal, a partial-order plan specifies all actions that need to be taken, but specifies an ordering between actions only where necessary.

In artificial intelligence, preference-based planning is a form of automated planning and scheduling which focuses on producing plans that additionally satisfy as many user-specified preferences as possible. In many problem domains, a task can be accomplished by various sequences of actions. These plans can vary in quality: there can be many ways to solve a problem but one generally prefers a way that is, e.g., cost-effective, quick and safe.

Action model learning

Action model learning is an area of machine learning concerned with creation and modification of software agent's knowledge about effects and preconditions of the actions that can be executed within its environment. This knowledge is usually represented in logic-based action description language and used as the input for automated planners.

Most of the terms listed in Wikipedia glossaries are already defined and explained within Wikipedia itself. However, glossaries like this one are useful for looking up, comparing and reviewing large numbers of terms together. You can help enhance this page by adding new terms or writing definitions for existing ones.


  1. Ghallab, Malik; Nau, Dana S.; Traverso, Paolo (2004), Automated Planning: Theory and Practice, Morgan Kaufmann, ISBN   1-55860-856-7
  2. Neufeld, Xenija and Mostaghim, Sanaz and Sancho-Pradel, Dario and Brand, Sandy (2017). "Building a Planner: A Survey of Planning Systems Used in Commercial Video Games". IEEE Transactions on Games. IEEE.CS1 maint: Multiple names: authors list (link)
  3. Sanelli, Valerio and Cashmore, Michael and Magazzeni, Daniele and Iocchi, Luca (2017). Short-term human robot interaction through conditional planning and execution. Proc. of International Conference on Automated Planning and Scheduling (ICAPS).CS1 maint: Multiple names: authors list (link)
  4. Peot, Mark A and Smith, David E (1992). Conditional nonlinear planning. Artificial Intelligence Planning Systems. Elsevier. pp. 189--197.CS1 maint: Multiple names: authors list (link)
  5. Karlsson, Lars (2001). Conditional progressive planning under uncertainty. IJCAI. pp. 431--438.
  6. Liu, Daphne Hao (2008). A survey of planning in intelligent agents: from externally motivated to internally motivated systems (Technical report). Technical Report TR-2008-936, Department of Computer Science, University of Rochester.

Further reading