#63: Three times Distill: Multimodal neurons, branch specialization, and weight banding
Hey everyone, welcome to Dynamically Typed #63. I took today to finally sit down and read three recent articles from Distill, the machine learning journal that focuses on AI interpretability with the help of clear writing and great, often interactive, visuals. (It’s also my favorite AI publication.) The articles are all quite long, and I’ve done my best to summarize them concisely in the ML Research section below, but — as always — I highly recommend reading them in full as well. I’m also saving the other stories I found for DT over the past two weeks for the next issue, so check back then for more productized AI, ML research, climate AI, and cool stuff!
Machine Learning Research 🎛
A CLIP neuron that responds to the concept of Spider-Man — in the form of photos, comic drawings, or text.
- 🕸 Distill #1: Multimodal Neurons in Artificial Neural Networks by Goh et al. (2021), which investigates CLIP, OpenAI’s multimodal neural network that learned to match images on the internet to text snippets that surround them. (Probably) unlike older image classification models, CLIP has neurons that “light up” for high-level concepts that were never explicitly part of any classification dataset. “These neurons don’t just select for a single object. They also fire (more weakly) for associated stimuli, such as a Barack Obama neuron firing for Michelle Obama or a morning neuron firing for images of breakfast.” The article has deep dives into three neuron families : (1) the Region Neurons family (like neurons for the USA or for Europe; these links take you to the neurons’ pages on OpenAI Microscope); (2) the Person Neurons family (including Lady Gaga and Ariana Grande); and (3) the Emotion Neurons family (including sleepy and happy). It also highlights a baker’s dozen other families, from holidays and religions to brands and fictional universes. There’s even an LGBTQ+ neuron that responds to things like rainbow flags and the word “pride”! Beyond this exploration, the article looks at how these abstractions in CLIP can be used: for understanding language, emotion composition, and typographic attacks. The authors also note that “CLIP makes it possible for end-users to ‘roll their own classifier’ by programming the model via intuitive, natural language commands — this will likely unlock a broad range of downstream uses of CLIP-style models.” Sound familiar? I wonder how long it’ll take until OpenAI launches a v2 of their API that uses CLIP (+ DALL·E?) for image processing and generation the way v1 uses GPT-3 for text.
Neurons in different branches of InceptionV1 “specialize” in different kinds of concepts, like all curve-related neurons being grouped in mixed3b_5x5
- ⛓ Distill #2: Branch Specialization by Voss et al. (2021), a chapter in the Circuits thread which includes previous work like Zoom in on Circuits, Early Vision in CNNs, Curve Detectors, Equivariance, High-Low Frequency Detectors, and Curve Circuits. In this article, the authors find that similar circuit-level functions tend to group themselves in network branches, which are “sequences of layers which temporarily don’t have access to ‘parallel’ information which is still passed to later layers.” For example, all 30 curve-related features in InceptionV1’s mixed3b_5x5 layer are concentrated in just one of the layer’s four branches. The authors hypothesize that this is because of a positive feedback loop during training, where the earlier layer in a branch is incentivized to form low-level features that the later layer uses as primitives for higher-level features. One cool thing about Distill is that it also invites non-AI researchers to provide commentary on articles. In this case, Matthew Nolan and Ian Hawes, neuroscientists at the University of Edinburgh, see a “striking parallel” with the separation of cortical pathways in the human brain.
Weight banding kind of resembles muscle tissue.
- 🌈 Distill #3: Weight Banding by Petrov et al. (2021), another chapter in the Circuits thread. This article explores why weights in some layers display a very distinct banding pattern when visualized using Nonnegative Matrix Factorization (NMF), with the following process: “For each neuron, we take the weights connecting it to the previous layer. We then use NMF to reduce the number of dimensions corresponding to channels in the previous layer down to 3 factors, which we can map to RGB channels.” This pattern occurs in the final convolutional layer across InceptionV1, ResNet50, and VGG19 (but not AlexNet, which does not use global average pooling). The authors hypothesize that this horizontal banding pattern “is a learned way to preserve [vertical] spatial information as it gets lost through various pooling operations,” which is enforced by the fact that, in an experiment in which they rotate input images by 90 degrees, the bands also rotate by 90 degrees to become vertical. The article concludes that banding is an example of emergent structure in vision models, but that we can’t say much about whether this structure is “good” or “bad” or how its presence should influence architectural decisions; not the most significant conclusions, but a very interesting observation nonetheless.
Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe to get a new issues in your inbox every second Sunday.
If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. 🤓