AlphaDev

AlphaDev
Developer(s)	DeepMind
Type	Reinforcement learning

Last updated June 21, 2024

AlphaDev is an artificial intelligence system developed by Google DeepMind to discover enhanced computer science algorithms using reinforcement learning. AlphaDev is based on AlphaZero, a system that mastered the games of chess, shogi and go by self-play. AlphaDev applies the same approach to finding faster algorithms for fundamental tasks such as sorting and hashing.^[1]^[2]^[3]

Development

On June 7, 2023, Google DeepMind published a paper in Nature introducing AlphaDev, which discovered new algorithms that outperformed the state-of-the-art methods for small sort algorithms.^[1] For example, AlphaDev found a faster assembly language sequence for sorting 5-element sequences.^[4] Upon analysing the algorithms in-depth, AlphaDev discovered two unique sequences of assembly instructions called the AlphaDev swap and copy moves that avoid a single assembly instruction each time they are applied.^[1]^[3] For variable sort algorithms, AlphaDev discovered fundamentally different algorithm structures. For example, for VarSort4 (sort up to 4 elements) AlphaDev discovered an algorithm 29 assembly instructions shorter than the human benchmark.^[1] AlphaDev also improved on the speed of hashing algorithms by up to 30% in certain cases.^[2]

In January 2022, Google DeepMind submitted its new sorting algorithms to the organization that manages C++, one of the most popular programming languages in the world, and after independent vetting, AlphaDev's algorithms were added to the library.^[5] This was the first change to the C++ Standard Library sorting algorithms in more than a decade and the first update to involve an algorithm discovered using AI.^[5] In January 2023, DeepMind also added its hashing algorithm for inputs from 9 to 16 bytes to Abseil, an open-source collection of prewritten C++ algorithms that can be used by anyone coding with C++.^[6]^[5] Google estimates that these two algorithms are used trillions of times every day.^[7]

Design

AlphaDev is built on top of AlphaZero, the reinforcement-learning model that DeepMind trained to master games such as Go and chess.^[5] The company's breakthrough was to treat the problem of finding a faster algorithm as a game and then train its AI to win it.^[2] AlphaDev plays a single-player game where the objective is to iteratively build an algorithm in the assembly language that is both fast and correct.^[1] AlphaDev uses a neural network to guide its search for optimal moves, and learns from its own experience and synthetic demonstrations.^[1]

AlphaDev showcases the potential of AI to advance the foundations of computing and optimize code for different criteria. Google DeepMind hopes that AlphaDev will inspire further research on using AI to discover new algorithms and improve existing ones.^[2]

Algorithm

The primary learning algorithm in AlphaDev is an extension of AlphaZero.

Encoding assembly programming into a game

In order to use AlphaZero on assembly programming, the authors created a Transformer-based vector representation of assembly programs designed to capture their underlying structure.^[1] This finite representation allows a neural network to play assembly programming like a game with finitely many possible moves (like Go),

The representation uses the following components:

A Transformer network, to encode assembly opcodes are converted to one-hot encodings and concatenated to form the raw input sequence.
A multilayer perceptron network, which encodes the "CPU state", that is, the states of each register and memory location for a given set of inputs,

Playing the game

The game state is the assembly program generated up to a given point.

The game move is an extra instruction appended to the current assembly program.

The game's reward is a function of the assembly program's correctness and latency. To reduce cost, AlphaDev only computes actual measured latency on less than 0.002% of generated programs, as it does not evaluate latency during the search process. Instead, it uses two functions that estimate the correctness and latency by being trained via supervised learning using the real measured correctness and latency values.

Result

Hashing

AlphaDev developed hashing algorithms for inputs from 9 to 16 bytes to Abseil, an open-source collection of prewritten C++ algorithms.^[8]

LLVM standard sorting library

AlphaDev discovered new sorting algorithms, which led to up to 70% improvements in the LLVM libc++ sorting library for shorter sequences and about 1.7% improvements for sequences exceeding 250,000 elements. These improvements apply to the uint32, uint64 and float data types for ARMv8, Intel Skylake and AMD Zen 2 CPU architectures. AlphaDev's branchless conditional assembly and new swap move contributed to these performance improvements. The discovered algorithms were reverse-engineered from low-level assembly to C++, and have officially been included in the libc++ standard sorting library.^[6]

Improved deserialization in protobuf

AlphaDev learned an optimized VarInt deserialization function in protobuf,^[9] outperforming the human benchmark for single valued inputs by approximately three times in terms of speed. AlphaDev also discovered a new VarInt assignment move, combining two operations into a single instruction for latency savings.

Comparison with logical AI approach

The AlphaDev's performance was compared to stochastic superoptimization,^[10] a logical AI approach. The latter was run with at least the same amount of resources and wall-clock time as AlphaDev. The results showed that AlphaDev-S requires a prohibitive amount of time to optimize directly for latency, as latency needs to be computed after every mutation. As such, AlphaDev-S optimizes for a latency proxy, specifically algorithm length, and, then, at the end of training, all correct programs generated by AlphaDev-S are searched through.

Related Research Articles

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.

In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.

In computer science, program optimization, code optimization, or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.

Computer Go is the field of artificial intelligence (AI) dedicated to creating a computer program that plays the traditional board game Go. The field is sharply divided into two eras. Before 2015, the programs of the era were weak. The best efforts of the 1980s and 1990s produced only AIs that could be defeated by beginners, and AIs of the early 2000s were intermediate level at best. Professionals could defeat these programs even given handicaps of 10+ stones in favor of the AI. Many of the algorithms such as alpha-beta minimax that performed well as AIs for checkers and chess fell apart on Go's 19x19 board, as there were too many branching possibilities to consider. Creation of a human professional quality program with the techniques and hardware of the time was out of reach. Some AI researchers speculated that the problem was unsolvable without creation of human-like AI.

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Recently, artificial neural networks have been able to surpass many previous approaches in performance.

In numerical analysis, the Kahan summation algorithm, also known as compensated summation, significantly reduces the numerical error in the total obtained by adding a sequence of finite-precision floating-point numbers, compared to the obvious approach. This is done by keeping a separate running compensation, in effect extending the precision of the sum by the precision of the compensation variable.

In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing the meaning of the code:

Meta-learning is a subfield of machine learning where automatic learning algorithms are applied to metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing learning algorithms or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn.

SHA-3 is the latest member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015. Although part of the same series of standards, SHA-3 is internally different from the MD5-like structure of SHA-1 and SHA-2.

ZPAQ is an open source command line archiver for Windows and Linux. It uses a journaling or append-only format which can be rolled back to an earlier state to retrieve older versions of files and directories. It supports fast incremental update by adding only files whose last-modified date has changed since the previous update. It compresses using deduplication and several algorithms depending on the data type and the selected compression level. To preserve forward and backward compatibility between versions as the compression algorithm is improved, it stores the decompression algorithm in the archive. The ZPAQ source code includes a public domain API, libzpaq, which provides compression and decompression services to C++ applications. The format is believed to be unencumbered by patents.

In computer software and hardware, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word, designates the index or position of the least significant bit set to one in the word counting from the least significant bit position. A nearly equivalent operation is count trailing zeros (ctz) or number of trailing zeros (ntz), which counts the number of zero bits following the least significant one bit. The complementary operation that finds the index or position of the most significant set bit is log base 2, so called because it computes the binary logarithm $⌊log 2 (x)⌋$ . This is closely related to count leading zeros (clz) or number of leading zeros (nlz), which counts the number of zero bits preceding the most significant one bit. There are two common variants of find first set, the POSIX definition which starts indexing of bits at 1, herein labelled ffs, and the variant which starts indexing of bits at zero, which is equivalent to ctz and so will be called by that name.

Google DeepMind Technologies Limited is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014 and merged with Google AI's Google Brain division to become Google DeepMind in April 2023. The company is based in London, with research centres in Canada, France, Germany, and the United States.

AlphaGo is a computer program that plays the board game Go. It was developed by the London-based DeepMind Technologies, an acquired subsidiary of Google. Subsequent versions of AlphaGo became increasingly powerful, including a version that competed under the name Master. After retiring from competitive play, AlphaGo Master was succeeded by an even more powerful version known as AlphaGo Zero, which was completely self-taught without learning from human games. AlphaGo Zero was then generalized into a program known as AlphaZero, which played additional games, including chess and shogi. AlphaZero has in turn been succeeded by a program known as MuZero which learns without being taught the rules.

Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software. Google began using TPUs internally in 2015, and in 2018 made them available for third-party use, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.

The following outline is provided as an overview of and topical guide to machine learning:

AlphaGo Zero is a version of DeepMind's Go software AlphaGo. AlphaGo's team published an article in the journal Nature on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version. By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.

AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.

Deep reinforcement learning is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs and decide what actions to perform to optimize an objective. Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare.

Artificial intelligence and machine learning techniques are used in video games for a wide variety of applications such as non-player character (NPC) control and procedural content generation (PCG). Machine learning is a subset of artificial intelligence that uses historical data to build predictive and analytical models. This is in sharp contrast to traditional methods of artificial intelligence such as search trees and expert systems.

References

1 2 3 4 5 6 7 Mankowitz, Daniel J.; Michi, Andrea; Zhernov, Anton; Gelmi, Marco; Selvi, Marco; Paduraru, Cosmin; Leurent, Edouard; Iqbal, Shariq; Lespiau, Jean-Baptiste; Ahern, Alex; Koppe, Thomas; Millikin, Kevin; Gaffney, Stephen; Elster, Sophie; Broshear, Jackson; Gamble, Chris; Milan, Kieran; Tung, Robert; Hwang, Minjae; Cemgil, Taylan; Barekatain, Mohammadamin; Li, Yujia; Mandhane, Amol; Hubert, Thomas; Schrittwieser, Julian; Hassabis, Demis; Kohli, Pushmeet; Riedmiller, Martin; Vinyals, Oriol; Silver, David (2023). "Faster sorting algorithms discovered using deep reinforcement learning". Nature. 618: 257–263. doi:10.1038/s41586-023-06004-9. PMC 10247365 .
1 2 3 4 "AlphaDev discovers faster sorting algorithms". Blog. Google DeepMind. June 7, 2023. Archived from the original on 2023-06-20. Retrieved 2023-06-20.
1 2 Tunney, Justine (2023-06-20). "Understanding DeepMind's Sorting Algorithm". justine.lol. Archived from the original on 2023-06-18. Retrieved 2023-06-20.
↑ Github - AlphaDev, DeepMind, 2023-06-21, retrieved 2023-06-21
1 2 3 4 Heaven, Will Douglas (June 7, 2023). "Google DeepMind's game-playing AI just found another way to make code faster". MIT Technology Review. Archived from the original on 2023-06-14. Retrieved 2023-06-20.
1 2 "⚙ D118029 Introduce branchless sorting functions for sort3, sort4 and sort5". reviews.llvm.org. Retrieved 2023-06-21.
↑ Sparkes, Matthew (7 June 2023). "DeepMind AI's new way to sort objects could speed up global computing". New Scientist. Retrieved 2024-06-20.
↑ "Replace absl::Hash for inputs from 9 to 16 bytes according to AlphaZero findings by Abseil Team · abseil/abseil-cpp@74eee2a". GitHub. Retrieved 2023-06-24.
↑ "VarInt protocol buffer serialization and deserialization". protobuf.dev. Retrieved 2023-06-24.
↑ Schkufza, Eric; Sharma, Rahul; Aiken, Alex (2013-03-16). "Stochastic superoptimization". ACM SIGARCH Computer Architecture News. 41 (1): 305–316. arXiv: 1211.0557 . doi: 10.1145/2490301.2451150 . ISSN 0163-5964.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[nature-1] 1 2 3 4 5 6 7 Mankowitz, Daniel J.; Michi, Andrea; Zhernov, Anton; Gelmi, Marco; Selvi, Marco; Paduraru, Cosmin; Leurent, Edouard; Iqbal, Shariq; Lespiau, Jean-Baptiste; Ahern, Alex; Koppe, Thomas; Millikin, Kevin; Gaffney, Stephen; Elster, Sophie; Broshear, Jackson; Gamble, Chris; Milan, Kieran; Tung, Robert; Hwang, Minjae; Cemgil, Taylan; Barekatain, Mohammadamin; Li, Yujia; Mandhane, Amol; Hubert, Thomas; Schrittwieser, Julian; Hassabis, Demis; Kohli, Pushmeet; Riedmiller, Martin; Vinyals, Oriol; Silver, David (2023). "Faster sorting algorithms discovered using deep reinforcement learning". Nature. 618: 257–263. doi:10.1038/s41586-023-06004-9. PMC 10247365 .

[deepmind-2] 1 2 3 4 "AlphaDev discovers faster sorting algorithms". Blog. Google DeepMind. June 7, 2023. Archived from the original on 2023-06-20. Retrieved 2023-06-20.

[understanding-3] 1 2 Tunney, Justine (2023-06-20). "Understanding DeepMind's Sorting Algorithm". justine.lol. Archived from the original on 2023-06-18. Retrieved 2023-06-20.

[4] Github - AlphaDev, DeepMind, 2023-06-21, retrieved 2023-06-21

[mit-5] 1 2 3 4 Heaven, Will Douglas (June 7, 2023). "Google DeepMind's game-playing AI just found another way to make code faster". MIT Technology Review. Archived from the original on 2023-06-14. Retrieved 2023-06-20.

[:0-6] 1 2 "⚙ D118029 Introduce branchless sorting functions for sort3, sort4 and sort5". reviews.llvm.org. Retrieved 2023-06-21.

[7] Sparkes, Matthew (7 June 2023). "DeepMind AI's new way to sort objects could speed up global computing". New Scientist. Retrieved 2024-06-20.

[8] "Replace absl::Hash for inputs from 9 to 16 bytes according to AlphaZero findings by Abseil Team · abseil/abseil-cpp@74eee2a". GitHub. Retrieved 2023-06-24.

[9] "VarInt protocol buffer serialization and deserialization". protobuf.dev. Retrieved 2023-06-24.

[10] Schkufza, Eric; Sharma, Rahul; Aiken, Alex (2013-03-16). "Stochastic superoptimization". ACM SIGARCH Computer Architecture News. 41 (1): 305–316. arXiv: 1211.0557 . doi: 10.1145/2490301.2451150 . ISSN 0163-5964.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]