Google Translate 600B transformer

DT #43 — July 5, 2020

Today in gargantuan language models: Google’s new state-of-the-art model for translating from 100 languages to English has 600 billion parameters. Compare this to OpenAI’s GPT-3 at 175 billion parameters from June (see DT #42) and Microsoft’s Turing-NLG at 17 billion parameters from February (DT #33). Google’s 600 billion-parameter Transformer took four days to train on 2048 (!) TPUs, which is actually relatively little for that model size. This training process is therefore also the focus of the paper describing the model: Lepikhin et al. (2020) introduce GShard, “an elegant way to express a wide range of parallel computation patterns with minimal changes to the existing model code.”

ML Research

This section of Dynamically Typed covers recent models, datasets, and tools for machine learning research.

Join 325+ others and subscribe to get DT in your inbox every second Sunday — 76 issues and counting!

Or check out recent DT issues first:

DT #76: Dynamically Typed Hiatus

DT #75: OpenAI's book summaries for the alignment problem, Translatotron 2, and AI-generated movie posters

DT #74: Apple's privacy-focused facial recognition, DeepMind's multimodal Perceiver IO, and sea ice forecasting with IceNet