DeepMind's new RL SOTA: MuZero
New from DeepMind, in Nature: Mastering Atari, Go, chess and shogi by planning with a learned model by Schrittwieser et al. (2020). The paper describes DeepMind’s new games-playing reinforcement learning algorithm MuZero, which is the latest evolution of the lab’s previous AlphaGo (2016), AlphaGo Zero (2017), and AlphaZero (2018) algorithms. The key improvement in MuZero is that it doesn’t need to be explicitly told the rules of the games it plays: it’s model-free, and “just models aspects that are important to the agent’s decision-making process.” This helps it achieve state-of-the-art (and superhuman) results on the Atari suite, Go, chess, and shogi.