Evaluation function

Last updated

An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing computer programs to estimate the value or goodness of a position (usually at a leaf or terminal node) in a game tree. [1] Most of the time, the value is either a real number or a quantized integer, often in nths of the value of a playing piece such as a stone in go or a pawn in chess, where n may be tenths, hundredths or other convenient fraction, but sometimes, the value is an array of three values in the unit interval, representing the win, draw, and loss percentages of the position.

Contents

There do not exist analytical or theoretical models for evaluation functions for unsolved games, nor are such functions entirely ad-hoc. The composition of evaluation functions is determined empirically by inserting a candidate function into an automaton and evaluating its subsequent performance. A significant body of evidence now exists for several games like chess, shogi and go as to the general composition of evaluation functions for them.

Games in which game playing computer programs employ evaluation functions include chess, [2] go, [2] shogi (Japanese chess), [2] othello, hex, backgammon, [3] and checkers. [4] [5] In addition, with the advent of programs such as MuZero, computer programs also use evaluation functions to play video games, such as those from the Atari 2600. [6] Some games like tic-tac-toe are strongly solved, and do not require search or evaluation because a discrete solution tree is available.

A tree of such evaluations is usually part of a search algorithm, such as Monte Carlo tree search or a minimax algorithm like alpha–beta search. The value is presumed to represent the relative probability of winning if the game tree were expanded from that node to the end of the game. The function looks only at the current position (i.e. what spaces the pieces are on and their relationship to each other) and does not take into account the history of the position or explore possible moves forward of the node (therefore static). This implies that for dynamic positions where tactical threats exist, the evaluation function will not be an accurate assessment of the position. These positions are termed non-quiescent; they require at least a limited kind of search extension called quiescence search to resolve threats before evaluation. Some values returned by evaluation functions are absolute rather than heuristic, if a win, loss or draw occurs at the node.

There is an intricate relationship between search and knowledge in the evaluation function. Deeper search favors less near-term tactical factors and more subtle long-horizon positional motifs in the evaluation. There is also a trade-off between efficacy of encoded knowledge and computational complexity: computing detailed knowledge may take so much time that performance decreases, so approximations to exact knowledge are often better. Because the evaluation function depends on the nominal depth of search as well as the extensions and reductions employed in the search, there is no generic or stand-alone formulation for an evaluation function. An evaluation function which works well in one application will usually need to be substantially re-tuned or re-trained to work effectively in another application.

In chess

In computer chess, the output of an evaluation function is typically an integer, and the units of the evaluation function are typically referred to as pawns. The term 'pawn' refers to the value when the player has one more pawn than the opponent in a position, as explained in Chess piece relative value. The integer 1 usually represents some fraction of a pawn, and commonly used in computer chess are centipawns, which are a hundredth of a pawn. Larger evaluations indicate a material imbalance or positional advantage or that a win of material is usually imminent. Very large evaluations may indicate that checkmate is imminent. An evaluation function also implicitly encodes the value of the right to move, which can vary from a small fraction of a pawn to win or loss.

Handcrafted evaluation functions

Historically in computer chess, the terms of an evaluation function are constructed (i.e. handcrafted) by the engine developer, as opposed to discovered through training neural networks. The general approach for constructing handcrafted evaluation functions is as a linear combination of various weighted terms determined to influence the value of a position. However, not all terms in a handcrafted evaluation function are linear, some, such as king safety and pawn structure, are nonlinear. Each term may be considered to be composed of first order factors (those that depend only on the space and any piece on it), second order factors (the space in relation to other spaces), and nth-order factors (dependencies on history of the position).

A handcrafted evaluation function typically has of a material balance term that usually dominates the evaluation. The conventional values used for material are Queen=9, Rook=5; Knight or Bishop=3; Pawn=1; the king is assigned an arbitrarily large value, usually larger than the total value of all the other pieces. [1] In addition, it typically has a set of positional terms usually totaling no more than the value of a pawn, though in some positions the positional terms can get much larger, such as when checkmate is imminent. Handcrafted evaluation functions typically contain dozens to hundreds of individual terms.

In practice, effective handcrafted evaluation functions are created not by ever expanding the list of evaluated parameters, but by careful tuning or training of the weights relative to each other, of a modest set of parameters such as those described above. Toward this end, positions from various databases are employed, such as from master games, engine games, Lichess games, or even from self-play, as in reinforcement learning.

Example

An example handcrafted evaluation function for chess might look like the following:

  • c1 * material + c2 * mobility + c3 * king safety + c4 * center control + c5 * pawn structure + c6 * king tropism + ...

Each of the terms is a weight multiplied by a difference factor: the value of white's material or positional terms minus black's.

  • The material term is obtained by assigning a value in pawn-units to each of the pieces.
  • Mobility is the number of legal moves available to a player, or alternately the sum of the number of spaces attacked or defended by each piece, including spaces occupied by friendly or opposing pieces. Effective mobility, or the number of "safe" spaces a piece may move to, may also be taken into account.
  • King safety is a set of bonuses and penalties assessed for the location of the king and the configuration of pawns and pieces adjacent to or in front of the king, and opposing pieces bearing on spaces around the king.
  • Center control is derived from how many pawns and pieces occupy or bear on the four center spaces and sometimes the 12 spaces of the extended center.
  • Pawn structure is a set of penalties and bonuses for various strengths and weaknesses in pawn structure, such as penalties for doubled and isolated pawns.
  • King tropism is a bonus for closeness (or penalty for distance) of certain pieces, especially queens and knights, to the opposing king.

Neural networks

While neural networks have been used in the evaluation functions of chess engines since the late 1980s, [7] [8] they did not become popular in computer chess until the late 2010s, as the hardware needed to train neural networks was not strong enough at the time, and fast training algorithms and network topology and architectures have not been developed yet. Initially, neural network based evaluation functions generally consisted of one neural network for the entire evaluation function, with input features selected from the board and whose output is an integer, normalized to the centipawn scale so that a value of 100 is roughly equivalent to a material advantage of a pawn. The parameters in neural networks are typically trained using reinforcement learning or supervised learning. More recently, evaluation functions in computer chess have started to use multiple neural networks, with each neural network trained for a specific part of the evaluation, such as pawn structure or endgames. This allows for hybrid approaches where an evaluation function consists of both neural networks and handcrafted terms.

Deep neural networks have been used, albeit infrequently, in computer chess after Matthew Lai's Giraffe [9] in 2015 and Deepmind's AlphaZero in 2017 demonstrated the feasibility of deep neural networks in evaluation functions. The distributed computing project Leela Chess Zero was started shortly after to attempt to replicate the results of Deepmind's AlphaZero paper. Apart from the size of the networks, the neural networks used in AlphaZero and Leela Chess Zero also differ from those used in traditional chess engines as they have two outputs, one for evaluation (the value head) and one for move ordering (the policy head), rather than only one output for evaluation. [10] In addition, while it is possible to set the output of the value head of Leela's neural network to a real number to approximate the centipawn scale used in traditional chess engines, by default the output is the win-draw-loss percentages, a vector of three values each from the unit interval. [10] Since deep neural networks are very large, engines using deep neural networks in their evaluation function usually require a graphics processing unit in order to efficiently calculate the evaluation function.

Piece-square tables

An important technique in evaluation since at least the early 1990s is the use of piece-square tables (also called piece-value tables) for evaluation. [11] [12] Each table is a set of 64 values corresponding to the squares of the chessboard. The most basic implementation of piece-square table consists of separate tables for each type of piece per player, which in chess results in 12 piece-square tables in total. More complex variants of piece-square tables are used in computer chess, one of the most prominent being the king-piece-square table, used in Stockfish, Komodo Dragon, Ethereal, and many other engines, where each table considers the position of every type of piece in relation to the player's king, rather than the position of the every type of piece alone. The values in the tables are bonuses/penalties for the location of each piece on each space, and encode a composite of many subtle factors difficult to quantify analytically. In handcrafted evaluation functions, there are sometimes two sets of tables: one for the opening/middlegame, and one for the endgame; positions of the middle game are interpolated between the two. [13]

Originally developed in computer shogi in 2018 by Yu Nasu, [14] [15] the most common evaluation function used in computer chess today[ citation needed ] is the efficiently updatable neural network , or NNUE for short, a sparse and shallow neural network that has only piece-square tables as the inputs into the neural network. [16] In fact, the most basic NNUE architecture is simply the 12 piece-square tables described above, a neural network with only one layer and no activation functions. An efficiently updatable neural network architecture, using king-piece-square tables as its inputs, was first ported to chess in a Stockfish derivative called Stockfish NNUE, publicly released on May 30, 2020, [17] and was adopted by many other engines before eventually being incorporated into the official Stockfish engine on August 6, 2020. [18] [19]

Endgame tablebases

Chess engines frequently use endgame tablebases in their evaluation function, as it allows the engine to play perfectly in the endgame.

In Go

Historically, evaluation functions in Computer Go took into account both territory controlled, influence of stones, number of prisoners and life and death of groups on the board. However, modern go playing computer programs largely use deep neural networks in their evaluation functions, such as AlphaGo, Leela Zero, Fine Art, and KataGo, and output a win/draw/loss percentage rather than a value in number of stones.

Related Research Articles

<span class="mw-page-title-main">Shogi</span> Japanese strategy board game

Shogi, also known as Japanese chess, is a strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as Western chess, chaturanga, xiangqi, Indian chess, and janggi. Shōgi means general's board game.

<span class="mw-page-title-main">Computer chess</span> Computer hardware and software capable of playing chess

Computer chess includes both hardware and software capable of playing chess. Computer chess provides opportunities for players to practice even in the absence of human opponents, and also provides opportunities for analysis, entertainment and training. Computer chess applications that play at the level of a chess grandmaster or higher are available on hardware from supercomputers to smart phones. Standalone chess-playing machines are also available. Stockfish, Leela Chess Zero, GNU Chess, Fruit, and other free open source applications are available for various platforms.

This glossary of chess explains commonly used terms in chess, in alphabetical order. Some of these terms have their own pages, like fork and pin. For a list of unorthodox chess pieces, see Fairy chess piece; for a list of terms specific to chess problems, see Glossary of chess problems; for a list of named opening lines, see List of chess openings; for a list of chess-related games, see List of chess variants; for a list of terms general to board games, see Glossary of board games.

A fairy chess piece, variant chess piece, unorthodox chess piece, or heterodox chess piece is a chess piece not used in conventional chess but incorporated into certain chess variants and some chess problems. Compared to conventional pieces, fairy pieces vary mostly in the way they move, but they may also follow special rules for capturing, promotions, etc. Because of the distributed and uncoordinated nature of unorthodox chess development, the same piece can have different names, and different pieces can have the same name in various contexts as it can be noted in the list of fairy chess pieces.

A bitboard is a specialized bit array data structure commonly used in computer systems that play board games, where each bit corresponds to a game board space or piece. This allows parallel bitwise operations to set or query the game state, or determine moves or plays in the game.

In chess, a tactic is a sequence of moves that each makes one or more immediate threats – a check, a material threat, a checkmating sequence threat, or the threat of another tactic – that culminates in the opponent's being unable to respond to all of the threats without making some kind of concession. Most often, the immediate benefit takes the form of a material advantage or mating attack; however, some tactics are used for defensive purposes and can salvage material that would otherwise be lost, or to induce stalemate in an otherwise lost position.

In chess, a relative value is a standard value conventionally assigned to each piece. Piece valuations have no role in the rules of chess but are useful as an aid to assessing a position.

<i>Fritz</i> (chess) Chess software

Fritz is a German chess program originally developed for Chessbase by Frans Morsch based on his Quest program, ported to DOS, and then Windows by Mathias Feist. With version 13, Morsch retired, and his engine was first replaced by Gyula Horvath's Pandix, and then with Fritz 15, Vasik Rajlich's Rybka.

Zobrist hashing is a hash function construction used in computer programs that play abstract board games, such as chess and Go, to implement transposition tables, a special kind of hash table that is indexed by a board position and used to avoid analyzing the same position more than once. Zobrist hashing is named for its inventor, Albert Lindsey Zobrist. It has also been applied as a method for recognizing substitutional alloy configurations in simulations of crystalline materials. Zobrist hashing is the first known instance of the generally useful underlying technique called tabulation hashing.

<span class="mw-page-title-main">Endgame tablebase</span> Database of precalculated chess analysis

In chess, an endgame tablebase, or simply tablebase, is a computerised database containing precalculated evaluations of endgame positions. Tablebases are used to analyse finished games, as well as by chess engines to evaluate positions during play. Tablebases are typically exhaustive, covering every legal arrangement of a specific selection of pieces on the board, with both White and Black to move. For each position, the tablebase records the ultimate result of the game and the number of moves required to achieve that result, both assuming perfect play. Because every legal move in a covered position results in another covered position, the tablebase acts as an oracle that always provides the optimal move.

<span class="mw-page-title-main">Board representation (computer chess)</span>

Board representation in computer chess is a data structure in a chess program representing the position on the chessboard and associated game state. Board representation is fundamental to all aspects of a chess program including move generation, the evaluation function, and making and unmaking moves as well as maintaining the state of the game during play. Several different board representations exist. Chess programs often utilize more than one board representation at different times, for efficiency. Execution efficiency and memory footprint are the primary factors in choosing a board representation; secondary considerations are effort required to code, test and debug the application.

In chess, an exchange or trade of chess pieces is a series of closely related moves, typically sequential, in which the two players capture each other's pieces. Any type of pieces except the kings may possibly be exchanged, i.e. captured in an exchange, although a king can capture an opponent's piece. Either the player of the white or the black pieces may make the first capture of the other player's piece in an exchange, followed by the other player capturing a piece of the first player, often referred to as a recapture. Commonly, the word "exchange" is used when the pieces exchanged are of the same type or of about equal value, which is an even exchange. According to chess tactics, a bishop and a knight are usually of about equal value. If the values of the pieces exchanged are not equal, then the player who captures the higher-valued piece can be said to be up the exchange or wins the exchange, while the opponent who captures the lower-valued piece is down the exchange or loses the exchange. Exchanges occur very frequently in chess, in almost every game and usually multiple times per game. Exchanges are often related to the tactics or strategy in a chess game, but often simply occur over the course of a game.

Shogi, like western chess, can be divided into the opening, middle game and endgame, each requiring a different strategy. The opening consists of arranging one's defenses and positioning for attack, the middle game consists of attempting to break through the opposing defenses while maintaining one's own, and the endgame starts when one side's defenses have been compromised.

<span class="mw-page-title-main">Stockfish (chess)</span> Free and open-source chess engine

Stockfish is a free and open-source chess engine, available for various desktop and mobile platforms. It can be used in chess software through the Universal Chess Interface.

<span class="mw-page-title-main">Outline of chess</span> Overview of and topical guide to chess

The following outline is provided as an overview of and topical guide to chess:

<span class="mw-page-title-main">AlphaZero</span> Game-playing artificial intelligence

AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.

<span class="mw-page-title-main">Leela Chess Zero</span> Deep neural network-based chess engine

Leela Chess Zero is a free, open-source, and deep neural network–based chess engine and volunteer computing project. Development has been spearheaded by programmer Gary Linscott, who is also a developer for the Stockfish chess engine. Leela Chess Zero was adapted from the Leela Zero Go engine, which in turn was based on Google's AlphaGo Zero project. One of the purposes of Leela Chess Zero was to verify the methods in the AlphaZero paper as applied to the game of chess.

The 18th season of the Top Chess Engine Championship began on 4 May 2020 and ended on 3 July 2020. The defending champion was Leela Chess Zero, which defeated Stockfish in the previous season's superfinal. The two season 17 superfinalists qualified again for the superfinal. This time Stockfish won, winning by 7 games (+23−16=61).

<span class="mw-page-title-main">Efficiently updatable neural network</span> Neural network based evaluation function

An efficiently updatable neural network is a neural network-based evaluation function whose inputs are piece-square tables, or variants thereof like the king-piece-square table. NNUE is used primarily for the leaf nodes of the alpha–beta tree. While being slower than handcrafted evaluation functions, NNUE does not suffer from the 'blindness beyond the current move' problem.

References

  1. 1 2 Shannon, Claude (1950), Programming a Computer for Playing Chess (PDF), Ser. 7, vol. 41, Philosophical Magazine, retrieved 12 December 2021
  2. 1 2 3 Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (7 December 2018). "A general reinforcement learning algorithm that masters chess, shogi, and go through self-play". Science . 362 (6419): 1140–1144. Bibcode:2018Sci...362.1140S. doi: 10.1126/science.aar6404 . PMID   30523106.
  3. Tesauro, Gerald (March 1995). "Temporal Difference Learning and TD-Gammon". Communications of the ACM. 38 (3): 58–68. doi: 10.1145/203330.203343 . S2CID   8763243 . Retrieved Nov 1, 2013.
  4. Schaeffer, J.; Burch, N.; Y. Björnsson; Kishimoto, A.; Müller, M.; Lake, R.; Lu, P.; Sutphen, S. (2007). "Checkers is Solved" (PDF). Science. 317 (5844): 1518–22. doi:10.1126/science.1144079. PMID   17641166. S2CID   10274228.
  5. Schaeffer, J.; Björnsson, Y.; Burch, N.; Kishimoto, A.; Müller, M.; Lake, R.; Lu, P.; Sutphen, S. "Solving Checkers" (PDF). Proceedings of the 2005 International Joint Conferences on Artificial Intelligence Organization.
  6. Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy (2020). "Mastering Atari, Go, chess and shogi by planning with a learned model". Nature. 588 (7839): 604–609. arXiv: 1911.08265 . Bibcode:2020Natur.588..604S. doi:10.1038/s41586-020-03051-4. PMID   33361790. S2CID   208158225.
  7. Thurn, Sebastian (1995), Learning to Play the Game of Chess (PDF), MIT Press, retrieved 12 December 2021
  8. Levinson, Robert (1989), A Self-Learning, Pattern-Oriented Chess Program, vol. 12, ICCA Journal
  9. Lai, Matthew (4 September 2015), Giraffe: Using Deep Reinforcement Learning to Play Chess, arXiv: 1509.01549v1
  10. 1 2 "Neural network topology". lczero.org. Retrieved 2021-12-12.
  11. Beal, Don; Smith, Martin C., Learning Piece-Square Values using Temporal Differences, vol. 22, ICCA Journal
  12. Jun Nagashima; Masahumi Taketoshi; Yoichiro Kajihara; Tsuyoshi Hashimoto; Hiroyuki Iida (2002), An Efficient Use of Piece-Square Tables in Computer Shogi, Information Processing Society of Japan
  13. Stockfish Evaluation Guide , retrieved 12 December 2021
  14. Yu Nasu (April 28, 2018). "Efficiently Updatable Neural-Network-based Evaluation Function for computer Shogi" (PDF) (in Japanese).
  15. Yu Nasu (April 28, 2018). "Efficiently Updatable Neural-Network-based Evaluation Function for computer Shogi (Unofficial English Translation)" (PDF). GitHub .
  16. Gary Linscott (April 30, 2021). "NNUE". GitHub . Retrieved December 12, 2020.
  17. Noda, Hisayori (30 May 2020). "Release stockfish-nnue-2020-05-30". Github. Retrieved 12 December 2021.
  18. "Introducing NNUE Evaluation". 6 August 2020.
  19. Joost VandeVondele (July 25, 2020). "official-stockfish / Stockfish, NNUE merge". GitHub .