Dynamically Typed


Although increasingly enormous do-it-all language models like T5 and GPT-3 (DT #42, #44) have been getting a lot of attention (haha) lately, smaller and more parameter-efficient models are still improving a lot as well. A recent interesting one is REALM by Guu et al. (2020) at Google AI, which, unlike these larger models, separates the encoding of language from the encoding of knowledge. Instead of implicitly storing information about the world in the language model’s weights, it introduces a neural retriever that learns to find relevant snippets of text from Wikipedia to be fed into the language model as context alongside the original query. As a result, it achieves a score of 40.4 on Natural Questions with just 300 million parameters, compared to T5’s score of 36.6 with 11 billion parameters—10% better results at 35x fewer parameters.