Dynamically Typed

#37: OpenAI's neural network taxonomy, decoding text from brain implants, and models that don't exist

Hey everyone, welcome to Dynamically Typed #37! I’ve pushed the ML research section to the top of today’s newsletter because OpenAI’s new Distill article is one of the most exciting things I’ve read in a long time: they investigated the early layers of Google’s InceptionV1 vision network to an incredible level of detail, resulting in a first-of-its-kind taxonomy of “neuron groups.” It’s really cool stuff, so I’m covering it in depth.

Beyond that, I’ve got links to neurological work on decoding text from brain implant signals, and to Wayve’s new LIDAR data augmentation tech. For productized AI, I’m covering a startup that’s using GANs to synthesize fake models for ads, as well as links about AR acquisitions and more. Finally, for cool stuff, I found a paper that generates 2.5D-perspective images based on a single photo with depth information.

Machine Learning Research 🎛

The largest neuron groups in the mixed3a layer of InceptionV1. (Olah et al., 2020)

The largest neuron groups in the mixed3a layer of InceptionV1. (Olah et al., 2020)

Chris Olah and his OpenAI collaborators published a new Distill article: An Overview of Early Vision in InceptionV1 . This work is part of Distill’s Circuits thread, which aims to understand how convolutional neural networks work by investigating individual features and how they interact through the formation of logical circuits (see DT #35). In this new article, Olah et al. explore the first five layers of Google’s InceptionV1 network:

Over the course of these layers, we see the network go from raw pixels up to sophisticated boundary detection, basic shape detection (eg. curves, circles, spirals, triangles), eye detectors, and even crude detectors for very small heads. Along the way, we see a variety of interesting intermediate features, including Complex Gabor detectors (similar to some classic “complex cells” of neuroscience), black and white vs color detectors, and small circle formation from curves.

Each of these five layers contains dozens to hundreds of features (a.k.a. channels or filters) that the authors categorize into human-understandable groups, which consist of features that detect similar things for inputs with slightly different orientations, frequencies, or colors. This goes from conv2d0, the first layer where 85% of filters fall into two simple categories (detectors for lines and for contrasting colors, in various orientations), all the way up to mixed3b, the fifth layer where there are over a dozen complex categories (detectors for small heads, for circles/loops, and much more). We’ve known that there are line detectors in early network layers for a long time, but this detailed taxonomy of later-layer features is novel—and it must’ve been an enormous amount of work to create.

A cicuits-based visualization of the black & white detector neuron group in layer mixed3a of InceptionV1. (Olah et al., 2020)

A cicuits-based visualization of the black & white detector neuron group in layer mixed3a of InceptionV1. (Olah et al., 2020)

For a few of the categories, like black & white and small circle detectors in mixed3a, and boundary and fur detectors in mixed3b, the article also investigates the “circuits” that formed them. Such circuits show how strongly the presence of a feature in the input positively or negatively influences (“excites” or “inhibits”) different regions of the current feature. One of the most interesting aspects of this research is that some of these circuits—which were learned by the network, not explicitly programmed!—are super intuitive once you think about them for a bit. The black & white detector above, for example, consists mostly of negative weights that inhibit colorful input features: the more color features in the input, the less likely it is to be black & white.

The simplicity of many of these circuits suggests, to me at least, that Olah et al. are currently exploring one of the most promising paths in AI explainability research. (Although there is an alternate possibility, as pointed out by the authors: that they’ve found a “taxonomy that might be helpful to humans but [that] is ultimately somewhat arbitrary.”)

Anyway, An Overview of Early Vision in InceptionV1 is one of the most fascinating machine learning papers I’ve read in a long time, and I spent a solid hour zooming in on different parts of the taxonomy. The groups for layer mixed3a are probably my favorite. I’m also curious about how much these early-layer neuron groups generalize to other vision architectures and types of networks—to what extent, for example, do these same neuron categories show up in the first layers of binarized neural networks?

If you read the article and have more thoughts about it that I didn’t cover here, I’d love to hear them. :)

Quick ML research + resource links 🎛 (see all 59 resources)

Productized Artificial Intelligence 🔌

None of these models exist. (Rosebud AI)

None of these models exist. (Rosebud AI)

Rosebud AI uses generative adversarial networks (GANs) to synthesize photos of fake people for ads. We’ve of course seen a lot of GAN face generation in the past (see DT #6, #8, #23), but this is one of the first startups I’ve come across that’s building a product around it. Their pitch to advertisers is simple: take photos from your previous photoshoots, and we’ll automatically swap out the model’s face with one better suited to the demographic you’re targeting. The new face can either be GAN-generated or licensed from real models on the generative.photos platform. But either way, Rosebud AI’s software takes care of inserting the face in a natural-looking way.

This raises some obvious questions: is it OK to advertise using nonexistent people? Do you need models’ explicit consent to reuse their body with a new face? How does copyright work when your model is half real, half generated? I’m sure Rosebud AI’s founders spend a lot of time thinking about these questions; and as they do, you can follow their along with their thoughts on Twitter and Instagram.

Quick productized AI links 🔌

Cool Things ✨

Layered depth inpainting. (Shih et al., 2020)

Layered depth inpainting. (Shih et al., 2020)

Here’s another cool AI art piece that can’t be done justice using just the static screenshot above: Shih et al. (2020) published 3D Photography using Context-aware Layered Depth Inpainting at this year’s CVPR conference. Here’s what that means:

We propose a method for converting a single RGB-D input image into a 3D photo, i.e., a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view.

Based on a single image (plus depth information), they can generate a 2.5-dimensional representation, realistically re-rendering the scene from slightly different perspectives from which it was originally taken. Contrast that with recent work on neural radiance fields, which requires on the order of 20 - 50 images to work (see DT #36).

Shih et al. set up a website with some fancy demos, which is definitely worth a look; see these gifs on Twitter too. One of the authors also works at Facebook, so I wonder if we’ll one day see Instagram filters with this effect—or if it’ll be a part of Facebook’s virtual reality ambitions. Since the next generation of iPhones will likely have a depth sensor on the back too, I expect we’ll see a lot of this 2.5D photography stuff in the coming years.

Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.

If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. 🌞