OpenAI’s Sparse Transformers model set records at sequence prediction tasks in text, image and sound data. It’s an attention-based model:
In Transformers, every output element is connected to every input element, and the weightings between them are dynamically calculated based upon the circumstances, a process called attention. While it is believed that this allows Transformers to be more flexible than models with fixed connectivity patterns, in practice it requires the creation of an N×N attention matrix for every layer and attention head, which can consume large amounts of memory when applied to data types with many elements, like images or raw audio.
One of the biggest contributions by authors Rewon Child et al. is an O(N×sqrt(N)) reformulation of Transformer self-attention, compared to the previous O(N×N) formulation. This allowed them to attack problems with larger data sizes (like images and audio) and longer-distance dependencies within the data, beating the state of the art for the density estimation task on CIFAR-10, Enwik8, and Imagenet 64. Although this is an impressive improvement, the authors think it can be taken further in combination with multi-scale approaches. More:
Google has open-sourced a TensorFlow implementation of MorphNet, a tool that “takes an existing neural network as input and produces a new neural network that is smaller, faster, and yields better performance tailored to a new problem.” MorphNet works in a cycle of two phases: a
shrinking phase that prunes inefficient neurons from the network, and an
expanding phase that uniformly grows all layers using a width multiplier. Together, these two phases result in computation (in terms of FLOPs or model size) being reallocated to places where it is most effective. When applied to the
Inception V2 network trained on
Imagenet, MorphNet reduces FLOPs per inference by 11-15% without degrading the accuracy. More:
Adam King wrote an in-depth explanation of how GauGAN works. NVIDIA’s GauGAN tool that can automatically transform sketches into photorealistic landscapes (see
DT #10) is powered by a recent Generative Adversarial Network (GAN) architecture called SPADE. King’s excellent post explains everything from the original Goodfellow GAN and pix2pixHD, to the problems with these methods and how SPADE solves them. Read it here:
Photos from Crude Sketches: NVIDIA’s GauGAN Explained Visually.
Quick ML resource links ⚡️