Dynamically Typed

#12: OpenAI introduces Mozart to Lady Gaga, and Google takes your best duck-face selfies for you

Hey everyone, welcome to the 12th edition of Dynamically Typed! In the past two weeks, both Google and OpenAI showed off new research ranging from complex deep learning architectures to charming machine learning applications: Google open-sourced a network architecture shrinking tool called MorphNet, and automated taking selfies with Photobooth; OpenAI improved on the concept of attention with Sparse Transformers, and generated music with MuseNet. Other news includes remove.bg’s new Photoshop plugin, Ben Evan’s notes on AI bias, and a few more useful ML resources I found.

Productized Artificial Intelligence 🔌

“Photobooth automatically captures group shots, when everyone in the photo looks their best.” (Google)

“Photobooth automatically captures group shots, when everyone in the photo looks their best.” (Google)

Google’s Pixel phones have a new feature called Photobooth that captures selfies at exactly the right time. The app tracks people in the frame of your selfie and makes sure that everyone is looking at the camera and that no one is blinking. It then uses an Image Content Model that looks for five expressions (smiles 😀, tongue-out 😝, duck face 😙, puffy-cheeks 🐡, and surprise 😮) and triggers an image capture when it sees one. This is just the latest of all the machine learning-powered photography features that Google has been adding to its Pixel phones (see also Night Sight), and it’s definitely the biggest thing I’m jealous of as an iPhone user. More details of how Photobooth works are on Google’s AI blog: Take Your Best Selfie Automatically, with Photobooth on Pixel 3.

Remove.bg, which automatically removes backgrounds from photos, now has a Photoshop plugin. The service launched with a demo site about four months ago (see DT #3), and quickly after also added a paid tier for processing high-resolution images (see DT #6). Previously, you had to go to remove.bg, upload an image, wait for it to process, and then download it again. Now with the new extension, Photoshop users can simply click a button inside the app to automatically remove the background of the current layer and generate a transparency mask. I’ve loved following this product’s evolution—it’s the perfect example of the kind of productized AI I try to cover here—and it’s great to see its creators continuing to ship cool features like this one. Read their blog post here: Announcing remove.bg for Photoshop.

Benedict Evans published some of his notes on AI bias. He lays out a clear perspective of where bias in productized machine learning systems comes from—“a system for finding patterns in data might find the wrong patterns, and you might not realise”—and what to do about it:

You can divide thinking in the field into three areas: (1) methodological rigour in the collection and management of the training data; (2) technical tools to analyse and diagnose the behavior of the model; and (3) training, education and caution in the deployment of ML in products.

He also argues that the big-name companies and research groups should probably worry us less than “third-tier vendors” that sell ML-powered technology to non-technical customers: the former group is full of people who know (and worry) about AI bias, while the latter’s customers may not even know the right questions to ask about the limitations of the systems they’re buying. Read Evan’s full post here: Notes on AI Bias.

Machine Learning Technology 🎛

Image completions by the Sparse Transformer model. (OpenAI)

Image completions by the Sparse Transformer model. (OpenAI)

OpenAI’s Sparse Transformers model set records at sequence prediction tasks in text, image and sound data. It’s an attention-based model:

In Transformers, every output element is connected to every input element, and the weightings between them are dynamically calculated based upon the circumstances, a process called attention. While it is believed that this allows Transformers to be more flexible than models with fixed connectivity patterns, in practice it requires the creation of an N × N attention matrix for every layer and attention head, which can consume large amounts of memory when applied to data types with many elements, like images or raw audio.

One of the biggest contributions by authors Rewon Child et al. is an O(N×sqrt(N)) reformulation of Transformer self-attention, compared to the previous O(N×N) formulation. This allowed them to attack problems with larger data sizes (like images and audio) and longer-distance dependencies within the data, beating the state of the art for the density estimation task on CIFAR-10, Enwik8, and Imagenet 64. Although this is an impressive improvement, the authors think it can be taken further in combination with multi-scale approaches. More:

Google has open-sourced a TensorFlow implementation of MorphNet, a tool that “takes an existing neural network as input and produces a new neural network that is smaller, faster, and yields better performance tailored to a new problem.” MorphNet works in a cycle of two phases: a shrinking phase that prunes inefficient neurons from the network, and an expanding phase that uniformly grows all layers using a width multiplier. Together, these two phases result in computation (in terms of FLOPs or model size) being reallocated to places where it is most effective. When applied to the Inception V2 network trained on Imagenet, MorphNet reduces FLOPs per inference by 11-15% without degrading the accuracy. More:

Adam King wrote an in-depth explanation of how GauGAN works. NVIDIA’s GauGAN tool that can automatically transform sketches into photorealistic landscapes (see DT #10) is powered by a recent Generative Adversarial Network (GAN) architecture called SPADE. King’s excellent post explains everything from the original Goodfellow GAN and pix2pixHD, to the problems with these methods and how SPADE solves them. Read it here: Photos from Crude Sketches: NVIDIA’s GauGAN Explained Visually.

Quick ML resource links ⚡️

Cool Things ✨

My favorite clip generated by MuseNet: Lady Gaga’s Poker Face, continued in the style of Mozart. (OpenAI)

My favorite clip generated by MuseNet: Lady Gaga’s Poker Face, continued in the style of Mozart. (OpenAI)

Christine Payne (OpenAI) released MuseNet, “a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles.” It uses the same unsupervised technology as OpenAI’s GPT-2 language model (see DT #8) and it’s super fun to play around with. Check it out here:

@eukaryote trained OpenAI’s GPT-2 language model on r/ShowerThoughts, where Reddit users share odd realizations they’ve had in the shower, and set up a site that generates new such thoughts. Some of my favorites:

The last time you ate a hot dog was during dinner

The internet would be very different if all of us had a phone for our eyes.

If there was an AI/machine learning/machine learning and artificial intelligence, i bet most of the people i know were probably scared of it.

Check it out at BotThoughts. (Thanks for sharing, Steinar!)

Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.

If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. 😁