Brute-force search

Last updated

In computer science, brute-force search or exhaustive search, also known as generate and test, is a very general problem-solving technique and algorithmic paradigm that consists of systematically checking all possible candidates for whether or not each candidate satisfies the problem's statement.

Contents

A brute-force algorithm that finds the divisors of a natural number n would enumerate all integers from 1 to n, and check whether each of them divides n without remainder. A brute-force approach for the eight queens puzzle would examine all possible arrangements of 8 pieces on the 64-square chessboard and for each arrangement, check whether each (queen) piece can attack any other. [1]

While a brute-force search is simple to implement and will always find a solution if it exists, implementation costs are proportional to the number of candidate solutions which in many practical problems tends to grow very quickly as the size of the problem increases (§Combinatorial explosion). [2] Therefore, brute-force search is typically used when the problem size is limited, or when there are problem-specific heuristics that can be used to reduce the set of candidate solutions to a manageable size. The method is also used when the simplicity of implementation is more important than processing speed.

This is the case, for example, in critical applications where any errors in the algorithm would have very serious consequences or when using a computer to prove a mathematical theorem. Brute-force search is also useful as a baseline method when benchmarking other algorithms or metaheuristics. Indeed, brute-force search can be viewed as the simplest metaheuristic. Brute force search should not be confused with backtracking, where large sets of solutions can be discarded without being explicitly enumerated (as in the textbook computer solution to the eight queens problem above). The brute-force method for finding an item in a table namely, check all entries of the latter, sequentially is called linear search.

Basic algorithm

In order candidate for P after the current one c.

  1. valid (P, c): check whether candidate c is a solution for P.
  2. output (P, c): use the solution c of P as appropriate to the application.

The next procedure must also tell when there are no more candidates for the instance P, after the current one c. A convenient way to do that is to return a "null candidate", some conventional data value Λ that is distinct from any real candidate. Likewise the first procedure should return Λ if there are no candidates at all for the instance P. The brute-force method is then expressed by the algorithm

cfirst(P) whilec ≠ Λ doifvalid(P,c) thenoutput(P, c)     cnext(P, c) end while

For example, when looking for the divisors of an integer n, the instance data P is the number n. The call first(n) should return the integer 1 if n ≥ 1, or Λ otherwise; the call next(n,c) should return c + 1 if c < n, and Λ otherwise; and valid(n,c) should return true if and only if c is a divisor of n. (In fact, if we choose Λ to be n + 1, the tests n ≥ 1 and c < n are unnecessary.)The brute-force search algorithm above will call output for every candidate that is a solution to the given instance P. The algorithm is easily modified to stop after finding the first solution, or a specified number of solutions; or after testing a specified number of candidates, or after spending a given amount of CPU time.

Combinatorial explosion

The main disadvantage of the brute-force method is that, for many real-world problems, the number of natural candidates is prohibitively large. For instance, if we look for the divisors of a number as described above, the number of candidates tested will be the given number n. So if n has sixteen decimal digits, say, the search will require executing at least 1015 computer instructions, which will take several days on a typical PC. If n is a random 64-bit natural number, which has about 19 decimal digits on the average, the search will take about 10 years. This steep growth in the number of candidates, as the size of the data increases, occurs in all sorts of problems. For instance, if we are seeking a particular rearrangement of 10 letters, then we have 10! = 3,628,800 candidates to consider, which a typical PC can generate and test in less than one second. However, adding one more letter which is only a 10% increase in the data size will multiply the number of candidates by 11, a 1000% increase. For 20 letters, the number of candidates is 20!, which is about 2.4×1018 or 2.4 quintillion; and the search will take about 10 years. This unwelcome phenomenon is commonly called the combinatorial explosion, or the curse of dimensionality.

One example of a case where combinatorial complexity leads to solvability limit is in solving chess. Chess is not a solved game. In 2005, all chess game endings with six pieces or less were solved, showing the result of each position if played perfectly. It took ten more years to complete the tablebase with one more chess piece added, thus completing a 7-piece tablebase. Adding one more piece to a chess ending (thus making an 8-piece tablebase) is considered intractable due to the added combinatorial complexity. [3] [4]

Speeding up brute-force searches

One way to speed up a brute-force algorithm is to reduce the search space, that is, the set of candidate solutions, by using heuristics specific to the problem class. For example, in the eight queens problem the challenge is to place eight queens on a standard chessboard so that no queen attacks any other. Since each queen can be placed in any of the 64 squares, in principle there are 648 = 281,474,976,710,656 possibilities to consider. However, because the queens are all alike, and that no two queens can be placed on the same square, the candidates are all possible ways of choosing of a set of 8 squares from the set all 64 squares; which means 64 choose 8 = 64!/(56!*8!) = 4,426,165,368 candidate solutions about 1/60,000 of the previous estimate. Further, no arrangement with two queens on the same row or the same column can be a solution. Therefore, we can further restrict the set of candidates to those arrangements.

As this example shows, a little bit of analysis will often lead to dramatic reductions in the number of candidate solutions, and may turn an intractable problem into a trivial one.

In some cases, the analysis may reduce the candidates to the set of all valid solutions; that is, it may yield an algorithm that directly enumerates all the desired solutions (or finds one solution, as appropriate), without wasting time with tests and the generation of invalid candidates. For example, for the problem "find all integers between 1 and 1,000,000 that are evenly divisible by 417" a naive brute-force solution would generate all integers in the range, testing each of them for divisibility. However, that problem can be solved much more efficiently by starting with 417 and repeatedly adding 417 until the number exceeds 1,000,000 which takes only 2398 (= 1,000,000 ÷ 417) steps, and no tests.

Reordering the search space

In applications that require only one solution, rather than all solutions, the expected running time of a brute force search will often depend on the order in which the candidates are tested. As a general rule, one should test the most promising candidates first. For example, when searching for a proper divisor of a random number n, it is better to enumerate the candidate divisors in increasing order, from 2 to n − 1, than the other way around because the probability that n is divisible by c is 1/c. Moreover, the probability of a candidate being valid is often affected by the previous failed trials. For example, consider the problem of finding a 1 bit in a given 1000-bit string P. In this case, the candidate solutions are the indices 1 to 1000, and a candidate c is valid if P[c] = 1. Now, suppose that the first bit of P is equally likely to be 0 or 1, but each bit thereafter is equal to the previous one with 90% probability. If the candidates are enumerated in increasing order, 1 to 1000, the number t of candidates examined before success will be about 6, on the average. On the other hand, if the candidates are enumerated in the order 1,11,21,31...991,2,12,22,32 etc., the expected value of t will be only a little more than 2.More generally, the search space should be enumerated in such a way that the next candidate is most likely to be valid, given that the previous trials were not. So if the valid solutions are likely to be "clustered" in some sense, then each new candidate should be as far as possible from the previous ones, in that same sense. The converse holds, of course, if the solutions are likely to be spread out more uniformly than expected by chance.

There are many other search methods, or metaheuristics, which are designed to take advantage of various kinds of partial knowledge one may have about the solution. Heuristics can also be used to make an early cutoff of parts of the search. One example of this is the minimax principle for searching game trees, that eliminates many subtrees at an early stage in the search. In certain fields, such as language parsing, techniques such as chart parsing can exploit constraints in the problem to reduce an exponential complexity problem into a polynomial complexity problem. In many cases, such as in Constraint Satisfaction Problems, one can dramatically reduce the search space by means of Constraint propagation, that is efficiently implemented in Constraint programming languages. The search space for problems can also be reduced by replacing the full problem with a simplified version. For example, in computer chess, rather than computing the full minimax tree of all possible moves for the remainder of the game, a more limited tree of minimax possibilities is computed, with the tree being pruned at a certain number of moves, and the remainder of the tree being approximated by a static evaluation function.

In cryptography

In cryptography, a brute-force attack involves systematically checking all possible keys until the correct key is found. [5] This strategy can in theory be used against any encrypted data [6] (except a one-time pad) by an attacker who is unable to take advantage of any weakness in an encryption system that would otherwise make his or her task easier.

The key length used in the encryption determines the practical feasibility of performing a brute force attack, with longer keys exponentially more difficult to crack than shorter ones. Brute force attacks can be made less effective by obfuscating the data to be encoded, something that makes it more difficult for an attacker to recognise when he has cracked the code. One of the measures of the strength of an encryption system is how long it would theoretically take an attacker to mount a successful brute force attack against it.

Related Research Articles

<span class="mw-page-title-main">Algorithm</span> Sequence of operations for a task

In mathematics and computer science, an algorithm is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing. More advanced algorithms can use conditionals to divert the code execution through various routes and deduce valid inferences, achieving automation eventually. Using human characteristics as descriptors of machines in metaphorical ways was already practiced by Alan Turing with terms such as "memory", "search" and "stimulus".

The eight queens puzzle is the problem of placing eight chess queens on an 8×8 chessboard so that no two queens threaten each other; thus, a solution requires that no two queens share the same row, column, or diagonal. There are 92 solutions. The problem was first posed in the mid-19th century. In the modern era, it is often used as an example problem for various computer programming techniques.

<span class="mw-page-title-main">Brute-force attack</span> Cryptanalytic method for unauthorized users to access data

In cryptography, a brute-force attack consists of an attacker submitting many passwords or passphrases with the hope of eventually guessing correctly. The attacker systematically checks all possible passwords and passphrases until the correct one is found. Alternatively, the attacker can attempt to guess the key which is typically created from the password using a key derivation function. This is known as an exhaustive key search.

A solved game is a game whose outcome can be correctly predicted from any position, assuming that both players play perfectly. This concept is usually applied to abstract strategy games, and especially to games with full information and no element of chance; solving such a game may use combinatorial game theory and/or computer assistance.

<span class="mw-page-title-main">Computer chess</span> Computer hardware and software capable of playing chess

Computer chess includes both hardware and software capable of playing chess. Computer chess provides opportunities for players to practice even in the absence of human opponents, and also provides opportunities for analysis, entertainment and training. Computer chess applications that play at the level of a chess grandmaster or higher are available on hardware from supercomputers to smart phones. Standalone chess-playing machines are also available. Stockfish, Leela Chess Zero, GNU Chess, Fruit, and other free open source applications are available for various platforms.

Hilbert's tenth problem is the tenth on the list of mathematical problems that the German mathematician David Hilbert posed in 1900. It is the challenge to provide a general algorithm that, for any given Diophantine equation, can decide whether the equation has a solution with all unknowns taking integer values.

In computer science and artificial intelligence, combinatorial search studies search algorithms for solving instances of problems that are believed to be hard in general, by efficiently exploring the usually large solution space of these instances. Combinatorial search algorithms achieve this efficiency by reducing the effective size of the search space or employing heuristics. Some algorithms are guaranteed to find the optimal solution, while others may only return the best solution found in the part of the state space that was explored.

Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it determines that the candidate cannot possibly be completed to a valid solution.

Trial and error is a fundamental method of problem-solving characterized by repeated, varied attempts which are continued until success, or until the practicer stops trying.

Branch and bound is a method for solving optimization problems by breaking them down into smaller sub-problems and using a bounding function to eliminate sub-problems that cannot contain the optimal solution. It is an algorithm design paradigm for discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a systematic enumeration of candidate solutions by means of state space search: the set of candidate solutions is thought of as forming a rooted tree with the full set at the root. The algorithm explores branches of this tree, which represent subsets of the solution set. Before enumerating the candidate solutions of a branch, the branch is checked against upper and lower estimated bounds on the optimal solution, and is discarded if it cannot produce a better solution than the best one found so far by the algorithm.

<span class="mw-page-title-main">Equation solving</span> Finding values for variables that make an equation true

In mathematics, to solve an equation is to find its solutions, which are the values that fulfill the condition stated by the equation, consisting generally of two expressions related by an equals sign. When seeking a solution, one or more variables are designated as unknowns. A solution is an assignment of values to the unknown variables that makes the equality in the equation true. In other words, a solution is a value or a collection of values such that, when substituted for the unknowns, the equation becomes an equality. A solution of an equation is often called a root of the equation, particularly but not only for polynomial equations. The set of all solutions of an equation is its solution set.

In computing, a Las Vegas algorithm is a randomized algorithm that always gives correct results; that is, it always produces the correct result or it informs about the failure. However, the runtime of a Las Vegas algorithm differs depending on the input. The usual definition of a Las Vegas algorithm includes the restriction that the expected runtime be finite, where the expectation is carried out over the space of random information, or entropy, used in the algorithm. An alternative definition requires that a Las Vegas algorithm always terminates, but may output a symbol not part of the solution space to indicate failure in finding a solution. The nature of Las Vegas algorithms makes them suitable in situations where the number of possible solutions is limited, and where verifying the correctness of a candidate solution is relatively easy while finding a solution is complex.

In computer science, the min-conflicts algorithm is a search algorithm or heuristic method to solve constraint satisfaction problems.

<span class="mw-page-title-main">Endgame tablebase</span> Database of precalculated chess analysis

In chess, an endgame tablebase, or simply tablebase, is a computerised database containing precalculated evaluations of endgame positions. Tablebases are used to analyse finished games, as well as by chess engines to evaluate positions during play. Tablebases are typically exhaustive, covering every legal arrangement of a specific selection of pieces on the board, with both White and Black to move. For each position, the tablebase records the ultimate result of the game and the number of moves required to achieve that result, both assuming perfect play. Because every legal move in a covered position results in another covered position, the tablebase acts as an oracle that always provides the optimal move.

God's algorithm is a notion originating in discussions of ways to solve the Rubik's Cube puzzle, but which can also be applied to other combinatorial puzzles and mathematical games. It refers to any algorithm which produces a solution having the fewest possible moves. The allusion to the deity is based on the notion that an omniscient being would know an optimal step from any given configuration.

In mathematics, a combinatorial explosion is the rapid growth of the complexity of a problem due to how the combinatorics of the problem is affected by the input, constraints, and bounds of the problem. Combinatorial explosion is sometimes used to justify the intractability of certain problems. Examples of such problems include certain mathematical functions, the analysis of some puzzles and games, and some pathological examples which can be modelled as the Ackermann function.

<span class="mw-page-title-main">Sudoku solving algorithms</span> Algorithms to complete a sudoku

A standard Sudoku contains 81 cells, in a 9×9 grid, and has 9 boxes, each box being the intersection of the first, middle, or last 3 rows, and the first, middle, or last 3 columns. Each cell may contain a number from one to nine, and each number can only occur once in each row, column, and box. A Sudoku starts with some cells containing numbers (clues), and the goal is to solve the remaining cells. Proper Sudokus have one solution. Players and investigators use a wide range of computer algorithms to solve Sudokus, study their properties, and make new puzzles, including Sudokus with interesting symmetries and other properties.

<span class="mw-page-title-main">NP-completeness</span> Complexity class

In computational complexity theory, a problem is NP-complete when:

  1. It is a decision problem, meaning that for any input to the problem, the output is either "yes" or "no".
  2. When the answer is "yes", this can be demonstrated through the existence of a short solution.
  3. The correctness of each solution can be verified quickly and a brute-force search algorithm can find a solution by trying all possible solutions.
  4. The problem can be used to simulate every other problem for which we can verify quickly that a solution is correct. In this sense, NP-complete problems are the hardest of the problems to which solutions can be verified quickly. If we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of every other problem to which a given solution can be easily verified.

Solving chess consists of finding an optimal strategy for the game of chess; that is, one by which one of the players can always force a victory, or either can force a draw. It is also related to more generally solving chess-like games such as Capablanca chess and infinite chess. In a weaker sense, solving chess may refer to proving which one of the three possible outcomes is the result of two perfect players, without necessarily revealing the optimal strategy itself.

References

  1. "Brute Force Algorithms Explained". freeCodeCamp.org. 2020-01-06. Retrieved 2021-04-11.
  2. "Complexity of brute force search". coursera. Retrieved 14 June 2018.
  3. "Is there a freely available online 7 piece Endgame tablebase?". Stack Exchange.
  4. "Lomonosov Endgame Tablebases". ChessOK. Archived from the original on April 6, 2019.
  5. Mark Burnett, "Blocking Brute Force Attacks" Archived 2016-12-03 at the Wayback Machine , UVA Computer Science, 2007
  6. Christof Paar; Jan Pelzl; Bart Preneel (2010). Understanding Cryptography: A Textbook for Students and Practitioners. Springer. p. 7. ISBN   3-642-04100-0.

See also