Communicating a complex topic like AI research is not easy, and we’re grateful that this series is hosted by the brilliant Dr Hannah Fry, a mathematician and broadcaster with a talent for making technical topics accessible and interesting. We’ve worked together over the last 12 months to choose topics that we hope will convey the excitement of AI research, whilst also highlighting some of the questions and challenges the whole field is wrestling with today. The result is an eight-part series that explores topics such as the link between neuroscience and AI, why we use games in our research, building safe AI and how AI can be used to solve scientific problems.
Distill experimented with a new format for research discussions. Distill is a web-first machine learning journal that focuses on visualizations and explanations of ML models, like the
Activation Atlas for image classification networks (see
DT #9). All articles published in the journal are responsive, interactive, open-source web pages (they even
accept pull requests with updates after publication!), which is a welcome change from the usual process, in which papers are written, are discussed once at a conference, and then live statically in a repository.
The paper was received with intense interest and discussion on social media, mailing lists, and reading groups around the world. How should we interpret these experiments? Would they replicate? … To explore these questions, Distill decided to run an experimental “discussion article.” We invited a number of researchers to write comments on the paper and organized discussion and responses from the original authors.
In this discussion, other researchers provided critiques and ran additional experiments, to which the original authors also responded. The topic itself—transferring models trained on adversarial examples to real datasets—is a bit out of my area of expertise, but I find it exciting that Distill is continuing to innovate on the process of machine learning science. More:
Rebecca Vickery wrote about Python libraries for interpretable machine learning. Her post is a brief guide to several visualization techniques that researchers and practitioners can use to inspect their models, to for example investigate whether they are picking up any problematic biases. Vickery covers
yellowbrick,
ELI5,
LIME, and
MLxtend, each with installation instructions and code examples. Read the post on Medium:
Python Libraries for Interpretable Machine Learning.
The Allen Institute for Artificial Intelligence published a best practices guide for crowdsourcing data labels. The post covers ethical pricing rates for different worker regions and types of labeling tasks, as well as notes on privacy, transparency, and tooling design. Read the post on Medium:
Crowdsourcing: Pricing Ethics and Best Practices.
- Andrej Karpathy’s arXiv Sanity is an interface for browsing, searching, and filtering recent arXiv submissions. Link: karpathy/arxiv-sanity-preserver
- TabNine adds deep-learning-powered autocompletion to code editors like VSCode (see DT #18). Link: TabNine
- Deeplearning.ai has a set of detailed, interactive AI Notes on things like initializing neural networks and parameter optimization in NNs. Link: AI Notes