DeepMind's new RL SOTA: MuZero

DT #56 — January 3, 2021

New from DeepMind, in Nature: Mastering Atari, Go, chess and shogi by planning with a learned model by Schrittwieser et al. (2020). The paper describes DeepMind’s new games-playing reinforcement learning algorithm MuZero, which is the latest evolution of the lab’s previous AlphaGo (2016), AlphaGo Zero (2017), and AlphaZero (2018) algorithms. The key improvement in MuZero is that it doesn’t need to be explicitly told the rules of the games it plays: it’s model-free, and “just models aspects that are important to the agent’s decision-making process.” This helps it achieve state-of-the-art (and superhuman) results on the Atari suite, Go, chess, and shogi.

ML Research

This section of Dynamically Typed covers recent models, datasets, and tools for machine learning research.

Join 325+ others and subscribe to get DT in your inbox every second Sunday — 76 issues and counting!

Or check out recent DT issues first:

DT #76: Dynamically Typed Hiatus

DT #75: OpenAI's book summaries for the alignment problem, Translatotron 2, and AI-generated movie posters

DT #74: Apple's privacy-focused facial recognition, DeepMind's multimodal Perceiver IO, and sea ice forecasting with IceNet