OpenAI Five

Last updated

OpenAI Five is a computer program by OpenAI that plays the five-on-five video game Dota 2 . Its first public appearance occurred in 2017, where it was demonstrated in a live one-on-one game against the professional player Dendi, who lost to it. The following year, the system had advanced to the point of performing as a full team of five, and began playing against and showing the capability to defeat professional teams.

Contents

By choosing a game as complex as Dota 2 to study machine learning, OpenAI thought they could more accurately capture the unpredictability and continuity seen in the real world, thus constructing more general problem-solving systems. The algorithms and code used by OpenAI Five were eventually borrowed by another neural network in development by the company, one which controlled a physical robotic hand. OpenAI Five has been compared to other similar cases of artificial intelligence (AI) playing against and defeating humans, such as AlphaStar in the video game StarCraft II , AlphaGo in the board game Go, Deep Blue in chess, and Watson on the television game show Jeopardy! .

History

Development on the algorithms used for the bots began in November 2016. OpenAI decided to use Dota 2 , a competitive five-on-five video game, as a base due to it being popular on the live streaming platform Twitch, having native support for Linux, and had an application programming interface (API) available. [1] Before becoming a team of five, the first public demonstration occurred at The International 2017 in August, the annual premiere championship tournament for the game, where Dendi, a Ukrainian professional player, lost against an OpenAI bot in a live one-on-one matchup. [2] [3] After the match, CTO Greg Brockman explained that the bot had learned by playing against itself for two weeks of real time, and that the learning software was a step in the direction of creating software that can handle complex tasks "like being a surgeon". [4] [5] OpenAI used a methodology called reinforcement learning, as the bots learn over time by playing against itself hundreds of times a day for months, in which they are rewarded for actions such as killing an enemy and destroying towers. [6] [7] [8]

By June 2018, the ability of the bots expanded to play together as a full team of five and were able to defeat teams of amateur and semi-professional players. [9] [6] [10] [11] At The International 2018, OpenAI Five played in two games against professional teams, one against the Brazilian-based paiN Gaming and the other against an all-star team of former Chinese players. [12] [13] Although the bots lost both matches, OpenAI still considered it a successful venture, stating that playing against some of the best players in Dota 2 allowed them to analyze and adjust their algorithms for future games. [14] The bots' final public demonstration occurred in April 2019, where they won a best-of-three series against The International 2018 champions OG at a live event in San Francisco. [15] A four-day online event to play against the bots, open to the public, occurred the same month. [16] There, the bots played in 42,729 public games, winning 99.4% of those games. [17]

Architecture

Each OpenAI Five bot is a neural network containing a single layer with a 4096-unit [18] LSTM that observes the current game state extracted from the Dota developer's API. The neural network conducts actions via numerous possible action heads (no human data involved), and every head has meaning. For instance, the number of ticks to delay an action, what action to select – the X or Y coordinate of this action in a grid around the unit. In addition, action heads are computed independently. The AI system observes the world as a list of 20,000 numbers and takes an action by conducting a list of eight enumeration values. Also, it selects different actions and targets to understand how to encode every action and observe the world. [19]

OpenAI Five has been developed as a general-purpose reinforcement learning training system on the "Rapid" infrastructure. Rapid consists of two layers: it spins up thousands of machines and helps them ‘talk’ to each other and a second layer runs software. By 2018, OpenAI Five had played around 180 years worth of games in reinforcement learning running on 256 GPUs and 128,000 CPU cores, [20] using Proximal Policy Optimization, a policy gradient method. [19] [21]

Comparison chart
OpenAI 1v1 bot (2017)OpenAI Five (2018)
CPUs60,000 CPU cores on Microsoft Azure 128,000 pre-emptible CPU cores on the Google Cloud Platform (GCP)
GPUs256 K80 GPUs on Azure256 P100 GPUs on the GCP
Experience collected~300 years per day~180 years per day
Size of observation~3.3kB~36.8kB
Observations per second of gameplay107.5
Batch size8,388,608 observations1,048,576 observations
Batches per minute~20~60

Comparisons with other game AI systems

Prior to OpenAI Five, other AI versus human experiments and systems have been successfully used before, such as Jeopardy! with Watson, chess with Deep Blue, and Go with AlphaGo. [22] [23] [24] In comparison with other games that have used AI systems to play against human players, Dota 2 differs as explained below: [19]

Long run view: The bots run at 30 frames per second for an average match time of 45 minutes, which results in 80,000 ticks per game. OpenAI Five observes every fourth frame, generating 20,000 moves. By comparison, chess usually ends before 40 moves, while Go ends before 150 moves.

Partially observed state of the game: Players and their allies can only see the map directly around them. The rest of it is covered in a fog of war which hides enemies units and their movements. Thus, playing Dota 2 requires making inferences based on this incomplete data, as well as predicting what their opponent could be doing at the same time. By comparison, Chess and Go are "full-information games", as they do not hide elements from the opposing player. [25]

Continuous action space: Each playable character in a Dota 2 game, known as a hero, can take dozens of actions that target either another unit or a position. The OpenAI Five developers allow the space into 170,000 possible actions per hero. Without counting the perpetual aspects of the game, there are an average of ~1,000 valid actions each tick. By comparison, the average number of actions in chess is 35 and 250 in Go.

Continuous observation space: Dota 2 is played on a large map with ten heroes, five on each team, along with dozens of buildings and non-player character (NPC) units. The OpenAI system observes the state of a game through developers’ bot API, as 20,000 numbers that constitute all information a human is allowed to get access to. A chess board is represented as about 70 lists, whereas a Go board has about 400 enumerations.

Reception

OpenAI Five have received acknowledgement from the AI, tech, and video game community at large. Microsoft founder Bill Gates called it a "big deal", as their victories "required teamwork and collaboration". [8] [26] Chess player Garry Kasparov, who lost against the Deep Blue AI in 1997, stated that despite their losing performance at The International 2018, the bots would eventually "get there, and sooner than expected". [27]

In a conversation with MIT Technology Review , AI experts also considered OpenAI Five system as a significant achievement, as they noted that Dota 2 was an "extremely complicated game", so even beating non-professional players was impressive. [25] PC Gamer wrote that their wins against professional players was a significant event in machine learning. [28] In contrast, Motherboard wrote that the victory was "basically cheating" due to the simplified hero pools on both sides, as well as the fact that bots were given direct access to the API, as opposed to using computer vision to interpret pixels on the screen. [29] The Verge wrote that the bots were evidence that the company's approach to reinforcement learning and its general philosophy about AI was "yielding milestones". [16]

In 2019, DeepMind unveiled a similar bot for Starcraft II , AlphaStar. Like OpenAI Five, AlphaStar used reinforcement learning and self-play. The Verge reported that "the goal with this type of AI research is not just to crush humans in various games just to prove it can be done. Instead, it’s to prove that — with enough time, effort, and resources — sophisticated AI software can best humans at virtually any competitive cognitive challenge, be it a board game or a modern video game." They added that the DeepMind and OpenAI victories were also a testament to the power of certain uses of reinforcement learning. [30]

It was OpenAI's hope that the technology could have applications outside of the digital realm. In 2018, they were able to reuse the same reinforcement learning algorithms and training code from OpenAI Five for Dactyl, a human-like robot hand with a neural network built to manipulate physical objects. [31] In 2019, Dactyl solved the Rubik's Cube. [32]

Related Research Articles

<span class="mw-page-title-main">Deep Blue (chess computer)</span> Chess-playing computer made by IBM

Deep Blue was a chess-playing expert system run on a unique purpose-built IBM supercomputer. It was the first computer to win a game, and the first to win a match, against a reigning world champion under regular time controls. Development began in 1985 at Carnegie Mellon University under the name ChipTest. It then moved to IBM, where it was first renamed Deep Thought, then again in 1989 to Deep Blue. It first played world champion Garry Kasparov in a six-game match in 1996, where it lost four games to two. It was upgraded in 1997 and in a six-game re-match, it defeated Kasparov by winning two games and drawing three. Deep Blue's victory is considered a milestone in the history of artificial intelligence and has been the subject of several books and films.

General game playing (GGP) is the design of artificial intelligence programs to be able to play more than one game successfully. For many games like chess, computers are programmed to play these games using a specially designed algorithm, which cannot be transferred to another context. For instance, a chess-playing computer program cannot play checkers. General game playing is considered as a necessary milestone on the way to artificial general intelligence.

There are a number of competitions and prizes to promote research in artificial intelligence.

<span class="mw-page-title-main">Progress in artificial intelligence</span> How AI-related technologies evolve

Progress in artificial intelligence (AI) refers to the advances, milestones, and breakthroughs that have been achieved in the field of artificial intelligence over time. AI is a multidisciplinary branch of computer science that aims to create machines and systems capable of performing tasks that typically require human intelligence. AI applications have been used in a wide range of fields including medical diagnosis, finance, robotics, law, video games, agriculture, and scientific discovery. However, many AI applications are not perceived as AI: "A lot of cutting-edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore." "Many thousands of AI applications are deeply embedded in the infrastructure of every industry." In the late 1990s and early 2000s, AI technology became widely used as elements of larger systems, but the field was rarely credited for these successes at the time.

<i>Dota 2</i> 2013 video game

Dota 2 is a 2013 multiplayer online battle arena (MOBA) video game by Valve. The game is a sequel to Defense of the Ancients (DotA), a community-created mod for Blizzard Entertainment's Warcraft III: Reign of Chaos. Dota 2 is played in matches between two teams of five players, with each team occupying and defending their own separate base on the map. Each of the ten players independently controls a character known as a hero that has unique abilities and differing styles of play. During a match, players collect experience points (XP) and items for their heroes to defeat the opposing team's heroes in player versus player (PvP) combat. A team wins by being the first to destroy the other team's Ancient, a large durable structure located in the center of each base.

Multiplayer online battle arena (MOBA) is a subgenre of strategy video games where two teams of players compete on a predefined battlefield, each controlling a single character with distinctive abilities. These abilities become more powerful as the match progresses, allowing characters to contribute more effectively to the team's overall strategy. The typical objective is for each team to destroy the opponents' main structure, located at the opposite corner of the battlefield, while protecting their own. In some MOBA games, the objective can be defeating every player on the enemy team. Players are assisted by computer-controlled units that periodically spawn in groups and march forward along set paths toward their enemy's base, which is heavily guarded by defensive structures. This type of multiplayer online video games originated as a subgenre of real-time strategy, though MOBA players usually do not construct buildings or units. The genre is seen as a fusion of real-time strategy, role-playing and action games.

<span class="mw-page-title-main">Dendi (gamer)</span> Ukrainian esports player (born 1989)

Danil Ishutin, better known as Dendi, is a Ukrainian professional Dota 2 player. He is best known for his time with Natus Vincere (Na'Vi) in the 2010s, where he was a member of the team that won The International 2011. He left Na'Vi in 2018 and formed his own Dota 2 organization in 2020, B8.

DeepMind Technologies Limited, also known by its trade name Google DeepMind, is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014 and merged with Google AI's Google Brain division to become Google DeepMind in April 2023. The company is based in London, with research centres in Canada, France, Germany, and the United States.

OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its stated mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.

AlphaGo is a computer program that plays the board game Go. It was developed by the London-based DeepMind Technologies, an acquired subsidiary of Google. Subsequent versions of AlphaGo became increasingly powerful, including a version that competed under the name Master. After retiring from competitive play, AlphaGo Master was succeeded by an even more powerful version known as AlphaGo Zero, which was completely self-taught without learning from human games. AlphaGo Zero was then generalized into a program known as AlphaZero, which played additional games, including chess and shogi. AlphaZero has in turn been succeeded by a program known as MuZero which learns without being taught the rules.

<span class="mw-page-title-main">OG (esports)</span> Esports organisation

OG is a professional esports organisation based in Europe. Formed in 2015, they are best known for their Dota 2 team who won The International 2018 and 2019 tournaments. They also have teams in Counter-Strike 2 and Rocket League.

<span class="mw-page-title-main">The International 2017</span> 2017 esports tournament

The International 2017 (TI7) was the seventh iteration of The International, an annual Dota 2 esports world championship tournament. Hosted by Valve, the game's developer, the tournament began with the online qualifier phase in June 2017, and ended after the main event at the KeyArena in Seattle in August. The Grand Finals took place between the European-based Team Liquid and Chinese-based Newbee, with Liquid defeating Newbee 3–0 in a best-of-five series, winning nearly $11 million in prize money.

<span class="mw-page-title-main">AlphaZero</span> Game-playing artificial intelligence

AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.

<span class="mw-page-title-main">The International 2018</span> 2018 esports tournament

The International 2018 (TI8) was the eighth iteration of The International, an annual Dota 2 world championship esports tournament. Hosted by Valve, the game's developer, TI8 followed a year-long series of tournaments awarding qualifying points, known as the Dota Pro Circuit (DPC), with the top eight ranking teams being directly invited to the tournament. In addition, ten more teams earned invites through qualifiers that were held in June 2018, with the group stage and main event played at the Rogers Arena in Vancouver in August. The best-of-five grand finals took place between OG and PSG.LGD, with OG winning the series 3–2. Their victory was considered a Cinderella and underdog success story, as they had come from the open qualifiers and were not favored to win throughout the competition.

AlphaStar is an artificial intelligence (AI) software developed by DeepMind for playing the video game StarCraft II. It was unveiled to the public by name in January 2019. AlphaStar attained "Grandmaster" status in August 2019, a significant milestone not just for AI in video games, but arguably for the field as a whole.

Deep reinforcement learning is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs and decide what actions to perform to optimize an objective. Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare.

Artificial intelligence and machine learning techniques are used in video games for a wide variety of applications such as non-player character (NPC) control and procedural content generation (PCG). Machine learning is a subset of artificial intelligence that uses historical data to build predictive and analytical models. This is in sharp contrast to traditional methods of artificial intelligence such as search trees and expert systems.

<i>Dota Underlords</i> 2020 video game

Dota Underlords is a 2020 auto battler game developed and published by Valve. The game is based on a popular Dota 2 community-created game mode called Dota Auto Chess, which was released in January 2019. Dota Underlords first released in early access in June 2019 before officially releasing on February 25, 2020, for Android, iOS, macOS, Linux, and Windows. The game is free to play and features cross-platform play.

Dota is a series of strategy video games. The series began in 2003 with the release of Defense of the Ancients (DotA), a fan-developed multiplayer online battle arena (MOBA) custom map for the video game Warcraft III: Reign of Chaos and its expansion, The Frozen Throne. The original map features gameplay centered around two teams of up to five players who assume control of individual characters called "heroes", which must coordinate to destroy the enemy's central base structure called an "Ancient", to win the game. Ownership and development of DotA were passed on multiple times since its initial release until Valve hired the map's lead designer IceFrog and after a legal dispute with Blizzard Entertainment, the developer of Warcraft III, brokered a deal that allowed Valve to inherit the trademark to the Dota name.

Self-play is a technique for improving the performance of reinforcement learning agents. Intuitively, agents learn to improve their performance by playing "against themselves".

References

  1. OpenAI. "OpenAI Five". openai.com/five. Archived from the original on 1 September 2018. Retrieved 10 October 2018.
  2. Savov, Vlad (14 August 2017). "My favorite game has been invaded by killer AI bots and Elon Musk hype". The Verge. Archived from the original on 26 June 2018. Retrieved 25 June 2018.
  3. Frank, Blair Hanley. "OpenAI's bot beats top Dota 2 player so badly that he quits". Venture Beat. Archived from the original on 12 August 2017. Retrieved 12 August 2017.
  4. OpenAI (11 August 2017). "Dota 2". blog.openai.com. Archived from the original on 11 August 2017. Retrieved 12 August 2017.
  5. OpenAI (16 August 2017). "More on Dota 2". blog.openai.com. Archived from the original on 16 August 2017. Retrieved 16 August 2017.
  6. 1 2 Simonite, Tom (25 June 2018). "Can Bots Outwit Humans in One of the Biggest Esports Games?". Wired. Archived from the original on 25 June 2018. Retrieved 25 June 2018.
  7. Kahn, Jeremy (25 June 2018). "A Bot Backed by Elon Musk Has Made an AI Breakthrough in Video Game World". Bloomberg.com. Archived from the original on 27 June 2018. Retrieved 27 June 2018.
  8. 1 2 "Bill Gates says gamer bots from Elon Musk-backed nonprofit are 'huge milestone' in A.I." CNBC. 28 June 2018. Archived from the original on 28 June 2018. Retrieved 28 June 2018.
  9. OpenAI (18 July 2018). "OpenAI Five Benchmark". blog.openai.com. Archived from the original on 26 August 2018. Retrieved 25 August 2018.
  10. Vincent, James (25 June 2018). "AI bots trained for 180 years a day to beat humans at Dota 2". The Verge. Archived from the original on 25 June 2018. Retrieved 25 June 2018.
  11. Savov, Vlad (6 August 2018). "The OpenAI Dota 2 bots just defeated a team of former pros". The Verge. Archived from the original on 7 August 2018. Retrieved 7 August 2018.
  12. Simonite, Tom. "Pro Gamers Fend off Elon Musk-Backed AI Bots—for Now". Wired. Archived from the original on 24 August 2018. Retrieved 25 August 2018.
  13. Quach, Katyanna. "Game over, machines: Humans defeat OpenAI bots once again at video games Olympics". The Register. Archived from the original on 25 August 2018. Retrieved 25 August 2018.
  14. OpenAI (24 August 2018). "The International 2018: Results". blog.openai.com. Archived from the original on 24 August 2018. Retrieved 25 August 2018.
  15. Wiggers, Kyle (13 April 2019). "OpenAI Five defeats professional Dota 2 team, twice". Venture Beat. Archived from the original on 13 April 2019. Retrieved 13 April 2019.
  16. 1 2 Statt, Nick (13 April 2019). "OpenAI's Dota 2 AI steamrolls world champion e-sports team with back-to-back victories". The Verge. Vox Media. Archived from the original on 15 April 2019. Retrieved 15 April 2019.
  17. Wiggers, Kyle (22 April 2019). "OpenAI's Dota 2 bot defeated 99.4% of players in public matches". Venture Beat. Retrieved 22 April 2019.
  18. "Dota 2 with Large Scale Deep Reinforcement Learning" (PDF). OpenAI. Archived (PDF) from the original on 26 September 2024. Retrieved 29 September 2024.
  19. 1 2 3 OpenAI (25 June 2018). "OpenAI Five". blog.openai.com. Archived from the original on 25 June 2018. Retrieved 25 June 2018.
  20. "Why are AI researchers so obsessed with games?". QUARTZ. 4 August 2018. Archived from the original on 4 August 2018. Retrieved 4 August 2018.
  21. Schulman, John; Wolski, Filip; Dhariwal, Prafulla; Radford, Alec; Klimov, Oleg (2017). "Proximal Policy Optimization Algorithms". arXiv: 1707.06347 [cs.LG].
  22. Gabbatt, Adam (17 February 2011). "IBM computer Watson wins Jeopardy clash". The Guardian. Archived from the original on 21 September 2013. Retrieved 17 February 2011.
  23. "Chess grandmaster Garry Kasparov on what happens when machines 'reach the level that is impossible for humans to compete'". Business Insider. Archived from the original on 29 December 2017. Retrieved 29 December 2017.
  24. "DeepMind's Go-playing AI doesn't need human help to beat us anymore". Verge. 18 October 2017. Archived from the original on 18 October 2017. Retrieved 18 October 2017.
  25. 1 2 Knight, Will (25 June 2018). "A team of AI algorithms just crushed humans in a complex computer game". MIT Tech Review. Retrieved 25 June 2018.
  26. "Bill Gates hails 'huge milestone' for AI as bots work in a team to destroy humans at video game 'Dota 2'". Business Insider. Archived from the original on 27 June 2018. Retrieved 27 June 2018.
  27. "Garry Kasparov's Twitter". 24 August 2018. Retrieved 24 August 2018.
  28. Park, Morgan (11 August 2018). "How the OpenAI Five tore apart a team of Dota 2 pros". PC Gamer. Retrieved 25 May 2020.
  29. Gault, Matthew (17 August 2018). "OpenAI Is Beating Humans at 'Dota 2' Because It's Basically Cheating". Vice. Retrieved 25 May 2020.
  30. Statt, Nick (30 October 2019). "DeepMind's StarCraft 2 AI is now better than 99.8 percent of all human players". The Verge. Retrieved 25 May 2020.
  31. OpenAI; Andrychowicz, Marcin; Baker, Bowen; Chociej, Maciek; Józefowicz, Rafał; McGrew, Bob; Pachocki, Jakub; Petron, Arthur; Plappert, Matthias; Powell, Glenn; Ray, Alex; Schneider, Jonas; Sidor, Szymon; Tobin, Josh; Welinder, Peter; Weng, Lilian; Zaremba, Wojciech (2019). "Learning Dexterous In-Hand Manipulation". arXiv: 1808.00177v5 [cs.LG].
  32. OpenAI; Akkaya, Ilge; Andrychowicz, Marcin; Chociej, Maciek; Litwin, Mateusz; McGrew, Bob; Petron, Arthur; Paino, Alex; Plappert, Matthias; Powell, Glenn; Ribas, Raphael (2019). "Solving Rubik's Cube with a Robot Hand". arXiv: 1910.07113v1 [cs.LG].