Dynamically Typed

Transformers are graph neural networks

Chaitanya K. Joshi wrote an essay for The Gradient where he argues that Transformers are Graph Neural Networks, equating the former’s attention mechanism to the latter’s aggregation functions. It’s a great introduction to both model types, and Joshi poses that these two subfields of machine learning can learn a lot from each other. (Also, he represents nodes in a GNN using emojis instead of letters, and references them as such in the text, which I love.) Great weekend read.