For a few of the categories, like black & white and small circle detectors in `mixed3a`, and boundary and fur detectors in `mixed3b`, the article also investigates the “circuits” that formed them. Such circuits show how strongly the presence of a feature in the input positively or negatively influences (“excites” or “inhibits”) different regions of the current feature. One of the most interesting aspects of this research is that some of these circuits—which were learned by the network, not explicitly programmed!—are super intuitive once you think about them for a bit. The black & white detector above, for example, consists mostly of negative weights that inhibit colorful input features: the more color features in the input, the less likely it is to be black & white.
The simplicity of many of these circuits suggests, to me at least, that Olah et al. are currently exploring one of the most promising paths in AI explainability research. (Although there is an alternate possibility, as pointed out by the authors: that they’ve found a “taxonomy that might be helpful to humans but [that] is ultimately somewhat arbitrary.”)
Anyway,
An Overview of Early Vision in InceptionV1 is one of the most fascinating machine learning papers I’ve read in a long time, and I spent a solid hour zooming in on different parts of the taxonomy.
The groups for layer `mixed3a` are probably my favorite. I’m also curious about how much these early-layer neuron groups generalize to other vision architectures and types of networks—to what extent, for example, do these same neuron categories show up in the first layers of binarized neural networks?
If you read the article and have more thoughts about it that I didn’t cover here, I’d love to hear them. :)