Last week, Google’s DeepMind has reached a new milestone: their AI agent AlphaStar performed better than two human players in StarCraft II, a real-time strategy game known for its complexity. Let’s take a look at the mechanics of this match.
The Origins of AlphaStar
AlphaStar is an iteration over AlphaGo, a previous achievement by DeepMind’s team, that managed to win the game of Go against a professional human player. They successfully applied their deep neural network and reinforcement learning based approach to a very different game. The team uses long short-term memory (LSTM) networks as a core component of its AI agents.
Both AlphaGo and its successors (AlphaGoZero and AlphaZero) were considered the biggest AI achievements in 2016 and 2017. They are the strongest AI agents that ever played the game of Go and chess. It is not a surprise that the paper about AlphaZero appeared in the prestigious scientific journal, Science.
However, AlphaStar is not a fully polished agent yet:
- It currently specializes in only one flavor of StarCraft II and is trained to play only Protoss vs. Protoss matches.
- It applies a small cheat by processing the zoomed out map with superhuman vision.
- It has won matches against strong pro gamers, but not the best human players.
- Although AlphaStar’s actions per minute are limited, it can utilize its tirelessness and precision in micro-managing the units to beat humans.
Overall, it seems that it has gone much further than any previous AI agents for real-time strategy games.
AlphaGo and AlphaZero applied convolutional neural networks that were developed originally for image classification. It is interesting that AlphaStar’s neural network architecture is quite different. This suggests that there is even more potential in DeepMind’s approach than previously thought.
Training the Challenger
AlphaStar was exposed to about 200 years of gameplay for training. It used about 50 strong GPU’s for training in Google Cloud. To compare, a pro human player can have 10-20 years of experience, with something around 10 years of training.
The gameplay styles were extremely different, and that was by design:
- AlphaStar first observed games that were played by average players. That was the first (bootstrap) training set: they did not apply tabula rasa learning, as in the case of AlphaZero.
- Then the team forked AlphaStar into several competing teams and into their own virtual league and played a huge number of games against each other.
- What they picked to play against the humans were 5 top AlphaStars, all of which have different styles of play.
Naturally, the first human player was confused, thinking that AlphaStar would play (or at least start) in a similar way, because he didn’t know there are 5 different personas of AlphaStar.
Surprisingly, AI was actually taking fewer steps per minute than average human players.
Starcraft II players specialize in species: you can either be representing Terran, Zerg, or Protoss. Playing for each of these races requires to have a specific skill set. AlphaStar is trained on playing Protoss.
The first human player that played against AlphaStar did not specialize in the race that they had to play: Dario “TLO” Wünsch is usually playing for Zerg. Even though he was a professional, he is not the best in this class.
The second human player, Grzegorz “MaNa” Komincz is experienced in playing for Protoss. However, according to GosuGamers, MaNa ranks #19 in the world. He is #17 in the WCS 2019 Global standings ranking. MaNa is a strong pro gamer, but he is definitely not at world champion level yet.
AlphaStar could see the whole board at one time. Even though the DeepMind team claimed that AlphaStar had areas of attention similar to the movement of the camera. Humans must move the camera to see different areas of the board.
In a sense, this detail is unfair. Humans are biologically unable to see every detail on a zoomed out map. The AI’s vision doesn’t have such limitations, it can identify every pixel on the board.
AlphaStar lost in the last game, probably due to the change related to visibility. In that game, AlphaStar would have to move the camera to see the field. This yielded worse performance during training initially, but by the end AlphaStar caught up and was as good in their internal testing.
AlphaStar had both great high-level strategies and low-level movement ability. Starcraft II community call this low-level activity “micro management” or “micro.” Currently, AlphaStar is far superior to humans at the micro level. It can move perfectly in the real-time setting, while humans can make mistakes.
AlphaStar challenged some of the common strategies of human players very effectively. For example, most human players never engage in battle up a ramp, because they lose high ground visibility. AlphaStar did this all of the time, and very successfully. AlphaStar rarely opened with a “walled barrier” for its base, something that all pro players do automatically.
15 years ago, AI researchers were dreaming of the day when an AI agent could beat humans at the game of Go. In 2016, AlphaGo defeated Lee Sedol. After that, the next big challenge was getting good at complex real-time strategy games. Last week DeepMind has reached this milestone.
These achievements are believed to bring us closer to general AI. After all, humans also learn by playing.