AlphaDev

Last updated
AlphaDev
Developer(s) DeepMind
Type Reinforcement learning


AlphaDev is an artificial intelligence system developed by Google DeepMind to discover enhanced computer science algorithms using reinforcement learning. AlphaDev is based on AlphaZero, a system that mastered the games of chess, shogi and go by self-play. AlphaDev applies the same approach to finding faster algorithms for fundamental tasks such as sorting and hashing. [1] [2] [3]

Contents

Development

On June 7, 2023, Google DeepMind published a paper in Nature introducing AlphaDev, which discovered new algorithms from scratch that outperformed the state-of-the-art methods for small sort algorithms. [1] For example, AlphaDev found a faster assembly language sequence for sorting 5-element sequences. [4] Upon analysing the algorithms in-depth, AlphaDev discovered two unique sequences of assembly instructions called the AlphaDev swap and copy moves that save a single assembly instruction each time they are applied. [1] [3] For variable sort algorithms, AlphaDev discovered fundamentally different algorithm structures. For example, for VarSort4 (sort up to 4 elements) AlphaDev discovered an algorithm 29 assembly instructions shorter than the human benchmark. [1] AlphaDev also improved on the speed of hashing algorithms by up to 30% in certain cases. [2]

In January 2022, Google DeepMind submitted its new sorting algorithms to the organization that manages C++, one of the most popular programming languages in the world, and after independent vetting, AlphaDev's algorithms were added to the library. [5] This was the first change to the C++ Standard Library sorting algorithms in more than a decade and the first update to involve an algorithm discovered using AI. [5] In January 2023, DeepMind also added its hashing algorithm for inputs from 9 to 16 bytes to Abseil, an open-source collection of prewritten C++ algorithms that can be used by anybody coding with C++. [6] [5] It is estimated that these algorithms are used trillions of times every day.

Design

AlphaDev is built on top of AlphaZero, the reinforcement-learning model that DeepMind trained to master games such as Go and chess. [5] The company's breakthrough was to treat the problem of finding a faster algorithm as a game and then train its AI to win it. [2] AlphaDev plays a single-player game where the objective is to iteratively build an algorithm in the assembly language that is both fast and correct. [1] AlphaDev uses a neural network to guide its search for optimal moves, and learns from its own experience and synthetic demonstrations. [1]

AlphaDev showcases the potential of AI to advance the foundations of computing and optimize code for different criteria. Google DeepMind hopes that AlphaDev will inspire further research on using AI to discover new algorithms and improve existing ones. [2]

Algorithm

The primary learning algorithm in AlphaDev is an extension of AlphaZero.

Encoding assembly programming into a game

In order to use AlphaZero on assembly programming, the authors created a Transformer-based vector representation of assembly programs designed to capture their underlying structure. [1] This finite representation allows a neural network to play assembly programming like a game with finitely many possible moves (like Go),

The representation uses the following components:

Playing the game

The game state is the assembly program generated up to a given point.

The game move is an extra instruction appended to the current assembly program.

The game's reward is a function of the assembly program's correctness and latency. To reduce cost, AlphaDev only computes actual measured latency on less than 0.002% of generated programs, as it does not evaluate latency during the search process. Instead, it uses two functions that estimate the correctness and latency by being trained via supervised learning using the real measured correctness and latency values.

Result

Hashing

AlphaDev developed hashing algorithms for inputs from 9 to 16 bytes to Abseil, an open-source collection of prewritten C++ algorithms. [7]

LLVM standard sorting library

AlphaDev discovered new sorting algorithms, which led to up to 70% improvements in the LLVM libc++ sorting library for shorter sequences and about 1.7% improvements for sequences exceeding 250,000 elements. These improvements apply to the uint32, uint64 and float data types for ARMv8, Intel Skylake and AMD Zen 2 CPU architectures. AlphaDev's branchless conditional assembly and new swap move contributed to these performance improvements. The discovered algorithms were reverse-engineered from low-level assembly to C++, and have officially been included in the libc++ standard sorting library. [6]

Improved deserialization in protobuf

AlphaDev learned an optimized VarInt deserialization function in protobuf, [8] outperforming the human benchmark for single valued inputs by approximately three times in terms of speed. AlphaDev also discovered a new VarInt assignment move, combining two operations into a single instruction for latency savings.

Comparison with logical AI approach

The AlphaDev's performance was compared to stochastic superoptimization, [9] a logical AI approach. The latter was run with at least the same amount of resources and wall-clock time as AlphaDev. The results showed that AlphaDev-S requires a prohibitive amount of time to optimize directly for latency, as latency needs to be computed after every mutation. As such, AlphaDev-S optimizes for a latency proxy, specifically algorithm length, and, then, at the end of training, all correct programs generated by AlphaDev-S are searched through.

Related Research Articles

<span class="mw-page-title-main">Hash function</span> Mapping arbitrary data to fixed-size values

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.

In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algorithm can be measured based on the usage of different resources. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.

In computer science, self-modifying code is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance. The term is usually only applied to code where the self-modification is intentional, not in situations where code accidentally modifies itself due to an error such as a buffer overflow.

In computer science, program optimization, code optimization, or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Recently, artificial neural networks have been able to surpass many previous approaches in performance.

In numerical analysis, the Kahan summation algorithm, also known as compensated summation, significantly reduces the numerical error in the total obtained by adding a sequence of finite-precision floating-point numbers, compared to the obvious approach. This is done by keeping a separate running compensation, in effect extending the precision of the sum by the precision of the compensation variable.

In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing the meaning of the code:

Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions; the small set is known as the peephole or window.

Meta learning is a subfield of machine learning where automatic learning algorithms are applied to metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing learning algorithms or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn.

SHA-3 is the latest member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015. Although part of the same series of standards, SHA-3 is internally different from the MD5-like structure of SHA-1 and SHA-2.

ZPAQ is an open source command line archiver for Windows and Linux. It uses a journaling or append-only format which can be rolled back to an earlier state to retrieve older versions of files and directories. It supports fast incremental update by adding only files whose last-modified date has changed since the previous update. It compresses using deduplication and several algorithms depending on the data type and the selected compression level. To preserve forward and backward compatibility between versions as the compression algorithm is improved, it stores the decompression algorithm in the archive. The ZPAQ source code includes a public domain API, libzpaq, which provides compression and decompression services to C++ applications. The format is believed to be unencumbered by patents.

In computer software and hardware, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word, designates the index or position of the least significant bit set to one in the word counting from the least significant bit position. A nearly equivalent operation is count trailing zeros (ctz) or number of trailing zeros (ntz), which counts the number of zero bits following the least significant one bit. The complementary operation that finds the index or position of the most significant set bit is log base 2, so called because it computes the binary logarithm ⌊log2(x)⌋. This is closely related to count leading zeros (clz) or number of leading zeros (nlz), which counts the number of zero bits preceding the most significant one bit. There are two common variants of find first set, the POSIX definition which starts indexing of bits at 1, herein labelled ffs, and the variant which starts indexing of bits at zero, which is equivalent to ctz and so will be called by that name.

<span class="mw-page-title-main">Google DeepMind</span> Artificial intelligence division

DeepMind Technologies Limited, doing business as Google DeepMind, is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014. The company is based in London, with research centres in Canada, France, Germany, and the United States.

AlphaGo is a computer program that plays the board game Go. It was developed by the London-based DeepMind Technologies, an acquired subsidiary of Google. Subsequent versions of AlphaGo became increasingly powerful, including a version that competed under the name Master. After retiring from competitive play, AlphaGo Master was succeeded by an even more powerful version known as AlphaGo Zero, which was completely self-taught without learning from human games. AlphaGo Zero was then generalized into a program known as AlphaZero, which played additional games, including chess and shogi. AlphaZero has in turn been succeeded by a program known as MuZero which learns without being taught the rules.

<span class="mw-page-title-main">Tensor Processing Unit</span> AI accelerator ASIC by Google

Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software. Google began using TPUs internally in 2015, and in 2018 made them available for third-party use, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.

The following outline is provided as an overview of and topical guide to machine learning:

<span class="mw-page-title-main">AlphaZero</span> Game-playing artificial intelligence

AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.

AlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure. The program is designed as a deep learning system.

Deep reinforcement learning is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs and decide what actions to perform to optimize an objective. Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare.

References

  1. 1 2 3 4 5 6 7 Mankowitz, Daniel J.; Michi, Andrea; Zhernov, Anton; Gelmi, Marco; Selvi, Marco; Paduraru, Cosmin; Leurent, Edouard; Iqbal, Shariq; Lespiau, Jean-Baptiste; Ahern, Alex; Koppe, Thomas; Millikin, Kevin; Gaffney, Stephen; Elster, Sophie; Broshear, Jackson; Gamble, Chris; Milan, Kieran; Tung, Robert; Hwang, Minjae; Cemgil, Taylan; Barekatain, Mohammadamin; Li, Yujia; Mandhane, Amol; Hubert, Thomas; Schrittwieser, Julian; Hassabis, Demis; Kohli, Pushmeet; Riedmiller, Martin; Vinyals, Oriol; Silver, David (2023). "Faster sorting algorithms discovered using deep reinforcement learning". Nature. 618: 257–263. doi:10.1038/s41586-023-06004-9. PMC   10247365 .
  2. 1 2 3 4 "AlphaDev discovers faster sorting algorithms". Blog. Google DeepMind. June 7, 2023. Archived from the original on 2023-06-20. Retrieved 2023-06-20.
  3. 1 2 Tunney, Justine (2023-06-20). "Understanding DeepMind's Sorting Algorithm". justine.lol. Archived from the original on 2023-06-18. Retrieved 2023-06-20.
  4. Github - AlphaDev, DeepMind, 2023-06-21, retrieved 2023-06-21
  5. 1 2 3 4 Heaven, Will Douglas (June 7, 2023). "Google DeepMind's game-playing AI just found another way to make code faster". MIT Technology Review. Archived from the original on 2023-06-14. Retrieved 2023-06-20.
  6. 1 2 "⚙ D118029 Introduce branchless sorting functions for sort3, sort4 and sort5". reviews.llvm.org. Retrieved 2023-06-21.
  7. "Replace absl::Hash for inputs from 9 to 16 bytes according to AlphaZero findings by Abseil Team · abseil/abseil-cpp@74eee2a". GitHub. Retrieved 2023-06-24.
  8. "VarInt protocol buffer serialization and deserialization". protobuf.dev. Retrieved 2023-06-24.
  9. Schkufza, Eric; Sharma, Rahul; Aiken, Alex (2013-03-16). "Stochastic superoptimization". ACM SIGARCH Computer Architecture News. 41 (1): 305–316. arXiv: 1211.0557 . doi: 10.1145/2490301.2451150 . ISSN   0163-5964.