Beam search

Last updated October 02, 2024

In computer science, beam search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set. Beam search is a modification of best-first search that reduces its memory requirements. Best-first search is a graph search which orders all partial solutions (states) according to some heuristic. But in beam search, only a predetermined number of best partial solutions are kept as candidates.^[1] It is thus a greedy algorithm.

Details

Beam search uses breadth-first search to build its search tree. At each level of the tree, it generates all successors of the states at the current level, sorting them in increasing order of heuristic cost.^[2] However, it only stores a predetermined number, $\beta$ , of best states at each level (called the beam width). Only those states are expanded next. The greater the beam width, the fewer states are pruned. With an infinite beam width, no states are pruned and beam search is identical to best-first search.^[3] Conversely, a beam width of 1 corresponds to a hill-climbing algorithm.^[3] The beam width bounds the memory required to perform the search. Since a goal state could potentially be pruned, beam search sacrifices completeness (the guarantee that an algorithm will terminate with a solution, if one exists). Beam search is not optimal (that is, there is no guarantee that it will find the best solution).

Uses

A beam search is most often used to maintain tractability in large systems with insufficient amount of memory to store the entire search tree.^[4] For example, it has been used in many machine translation systems.^[5] (The state of the art now primarily uses neural machine translation based methods, especially large language models) To select the best translation, each part is processed, and many different ways of translating the words appear. The top best translations according to their sentence structures are kept, and the rest are discarded. The translator then evaluates the translations according to a given criterion, choosing the translation which best keeps the goals.

History

The Harpy Speech Recognition System (introduced in a 1976 dissertation^[6]) was the first use of what would become known as beam search.^[7] While the procedure was originally referred to as the "locus model of search", the term "beam search" was already in use by 1977.^[8]

Variants

Beam search has been made complete by combining it with depth-first search, resulting in beam stack search ^[9] and depth-first beam search,^[4] and with limited discrepancy search,^[4] resulting in beam search using limited discrepancy backtracking^[4] (BULB). The resulting search algorithms are anytime algorithms that find good but likely sub-optimal solutions quickly, like beam search, then backtrack and continue to find improved solutions until convergence to an optimal solution.

In the context of a local search, we call local beam search a specific algorithm that begins selecting $\beta$ randomly generated states and then, for each level of the search tree, it always considers $\beta$ new states among all the possible successors of the current ones, until it reaches a goal.^[10]^[11]

Since local beam search often ends up on local maxima, a common solution is to choose the next $\beta$ states in a random way, with a probability dependent from the heuristic evaluation of the states. This kind of search is called stochastic beam search.^[12]

Other variants are flexible beam search and recovery beam search.^[11]

Related Research Articles

In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the search space of a problem domain, with either discrete or continuous values.

The travelling salesman problem, also known as the travelling salesperson problem (TSP), asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.

A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time.

A* is a graph traversal and pathfinding algorithm, which is used in many fields of computer science due to its completeness, optimality, and optimal efficiency. Given a weighted graph, a source node and a goal node, the algorithm finds the shortest path from source to goal.

Alpha–beta pruning is a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree. It is an adversarial search algorithm used commonly for machine playing of two-player combinatorial games. It stops evaluating a move when at least one possibility has been found that proves the move to be worse than a previously examined move. Such moves need not be evaluated further. When applied to a standard minimax tree, it returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision.

Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. For large numbers of local optima, SA can find the global optimum. It is often used when the search space is discrete. For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, simulated annealing may be preferable to exact algorithms such as gradient descent or branch and bound.

Constraint programming (CP) is a paradigm for solving combinatorial problems that draws on a wide range of techniques from artificial intelligence, computer science, and operations research. In constraint programming, users declaratively state the constraints on the feasible solutions for a set of decision variables. Constraints differ from the common primitives of imperative programming languages in that they do not specify a step or sequence of steps to execute, but rather the properties of a solution to be found. In addition to constraints, users also need to specify a method to solve these constraints. This typically draws upon standard methods like chronological backtracking and constraint propagation, but may use customized code like a problem-specific branching heuristic.

Branch and bound is a method for solving optimization problems by breaking them down into smaller sub-problems and using a bounding function to eliminate sub-problems that cannot contain the optimal solution. It is an algorithm design paradigm for discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a systematic enumeration of candidate solutions by means of state space search: the set of candidate solutions is thought of as forming a rooted tree with the full set at the root. The algorithm explores branches of this tree, which represent subsets of the solution set. Before enumerating the candidate solutions of a branch, the branch is checked against upper and lower estimated bounds on the optimal solution, and is discarded if it cannot produce a better solution than the best one found so far by the algorithm.

In computer science and operations research, the ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems that can be reduced to finding good paths through graphs. Artificial ants represent multi-agent methods inspired by the behavior of real ants. The pheromone-based communication of biological ants is often the predominant paradigm used. Combinations of artificial ants and local search algorithms have become a preferred method for numerous optimization tasks involving some sort of graph, e.g., vehicle routing and internet routing.

The expectiminimax algorithm is a variation of the minimax algorithm, for use in artificial intelligence systems that play two-player zero-sum games, such as backgammon, in which the outcome depends on a combination of the player's skill and chance elements such as dice rolls. In addition to "min" and "max" nodes of the traditional minimax tree, this variant has "chance" nodes, which take the expected value of a random event occurring. In game theory terms, an expectiminimax tree is the game tree of an extensive-form game of perfect, but incomplete information.

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

Negamax search is a variant form of minimax search that relies on the zero-sum property of a two-player game.

Principal variation search is a negamax algorithm that can be faster than alpha–beta pruning. Like alpha–beta pruning, NegaScout is a directional search algorithm for computing the minimax value of a node in a tree. It dominates alpha–beta pruning in the sense that it will never examine a node that can be pruned by alpha–beta; however, it relies on accurate node ordering to capitalize on this advantage.

Beam stack search is a search algorithm that combines chronological backtracking with beam search and is similar to depth-first beam search. Both search algorithms are anytime algorithms that find good but likely sub-optimal solutions quickly, like beam search, then backtrack and continue to find improved solutions until convergence to an optimal solution.

Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function

SSS* is a search algorithm, introduced by George Stockman in 1979, that conducts a state space search traversing a game tree in a best-first fashion similar to that of the A* search algorithm.

In mathematical optimization and computer science, heuristic is a technique designed for problem solving more quickly when classic methods are too slow for finding an exact or approximate solution, or when classic methods fail to find any exact solution in a search space. This is achieved by trading optimality, completeness, accuracy, or precision for speed. In a way, it can be considered a shortcut.

Any-angle path planning algorithms are pathfinding algorithms that search for a Euclidean shortest path between two points on a grid map while allowing the turns in the path to have any angle. The result is a path that cuts directly through open areas and has relatively few turns. More traditional pathfinding algorithms such as A* either lack in performance or produce jagged, indirect paths.

References

↑ "beam search". Free On-line Dictionary of Computing . Retrieved 2024-03-27.
↑ "BRITISH MUSEUM SEARCH". bradley.bradley.edu. Retrieved 2016-04-11.
1 2 Norvig, Peter (1992). Paradigms of Artificial Intelligence Programming: Case Studies in Common LISP. Morgan Kaufmann. p. 196. ISBN 9781558601918.
1 2 3 4 Furcy, D.; Koenig, S. (2005). "Limited discrepancy beam search". Proceedings of the 19th international joint conference on Artificial intelligence. Morgan Kaufmann. pp. 125–131.
↑ Tillmann, C.; Ney, H. (2003). "Word reordering and a dynamic programming beam search algorithm for statistical machine translation". Computational Linguistics. 29 (1): 97–133. doi: 10.1162/089120103321337458 . S2CID 7829066.
↑ Lowerre, Bruce T. (1976). The Harpy Speech Recognition System (PDF) (PhD). Carnegie Mellon University.
↑ Ow, Peng Si; Morton, Thomas E. (1988). "Filtered beam search in scheduling†". International Journal of Production Research. 26 (1): 35–62. doi:10.1080/00207548808947840. ISSN 0020-7543.
↑ Defense Technical Information Center (1977-08-01). DTIC ADA049288: Speech Understanding Systems. Summary of Results of the Five-Year Research Effort at Carnegie-Mellon University. p. 6.
↑ Zhou, Rong; Hansen, Eric (2005). "Beam-Stack Search: Integrating Backtracking with Beam Search". ICAPS. pp. 90–98. Archived from the original on 2021-04-20. Retrieved 2011-04-09.
↑ Svetlana Lazebnik. "Local search algorithms" (PDF). University of North Carolina at Chapel Hill, Department of Computer Science. p. 15. Archived from the original (PDF) on 2011-07-05.
1 2 Pushpak Bhattacharyya. "Beam Search". Indian Institute of Technology Bombay, Department of Computer Science and Engineering (CSE). pp. 39–40. Archived from the original on 2018-11-21.
↑ James Parker (2017-09-28). "Local Search" (PDF). University of Minnesota. p. 17. Archived (PDF) from the original on 2017-10-13. Retrieved 2018-11-21.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "beam search". Free On-line Dictionary of Computing . Retrieved 2024-03-27.

[2] "BRITISH MUSEUM SEARCH". bradley.bradley.edu. Retrieved 2016-04-11.

[:0-3] 1 2 Norvig, Peter (1992). Paradigms of Artificial Intelligence Programming: Case Studies in Common LISP. Morgan Kaufmann. p. 196. ISBN 9781558601918.

[furcy-4] 1 2 3 4 Furcy, D.; Koenig, S. (2005). "Limited discrepancy beam search". Proceedings of the 19th international joint conference on Artificial intelligence. Morgan Kaufmann. pp. 125–131.

[5] Tillmann, C.; Ney, H. (2003). "Word reordering and a dynamic programming beam search algorithm for statistical machine translation". Computational Linguistics. 29 (1): 97–133. doi: 10.1162/089120103321337458 . S2CID 7829066.

[6] Lowerre, Bruce T. (1976). The Harpy Speech Recognition System (PDF) (PhD). Carnegie Mellon University.

[7] Ow, Peng Si; Morton, Thomas E. (1988). "Filtered beam search in scheduling†". International Journal of Production Research. 26 (1): 35–62. doi:10.1080/00207548808947840. ISSN 0020-7543.

[8] Defense Technical Information Center (1977-08-01). DTIC ADA049288: Speech Understanding Systems. Summary of Results of the Five-Year Research Effort at Carnegie-Mellon University. p. 6.

[9] Zhou, Rong; Hansen, Eric (2005). "Beam-Stack Search: Integrating Backtracking with Beam Search". ICAPS. pp. 90–98. Archived from the original on 2021-04-20. Retrieved 2011-04-09.

[10] Svetlana Lazebnik. "Local search algorithms" (PDF). University of North Carolina at Chapel Hill, Department of Computer Science. p. 15. Archived from the original (PDF) on 2011-07-05.

[iitb-11] 1 2 Pushpak Bhattacharyya. "Beam Search". Indian Institute of Technology Bombay, Department of Computer Science and Engineering (CSE). pp. 39–40. Archived from the original on 2018-11-21.

[12] James Parker (2017-09-28). "Local Search" (PDF). University of Minnesota. p. 17. Archived (PDF) from the original on 2017-10-13. Retrieved 2018-11-21.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

v t e Graph and tree traversal algorithms
Search	α–β pruning A* IDA* LPA* SMA* Best-first search Beam search Bidirectional search Breadth-first search Lexicographic Parallel B* Depth-first search Iterative deepening D* Fringe search Jump point search Monte Carlo tree search SSS*
Shortest path	Bellman–Ford Dijkstra's Floyd–Warshall Johnson's Shortest path faster Yen's
Minimum spanning tree	Borůvka's Kruskal's Prim's Reverse-delete
List of graph search algorithms