#21: DeepMind's ML for drug discovery, security leaks in neural networks, and Green AI
Hey everyone, welcome to Dynamically Typed #21! The past two summer weeks have been light on productized AI, so today’s newsletter is a bit more on the technical side than usual.
A few articles came out about DeepMind’s push into using machine learning for medical drug discovery, which is also becoming a hot area for startups. On the research side, a new neural network optimizer called RAdam has been making the rounds, and researchers at UC Berkeley showed that it’s possible to recover sensitive information like credit card numbers from the training data of natural language processing models. For climate-related AI, I’m covering some meta news about Green AI and carbon offsets that are making machine learning research itself more sustainable.
Productized Artificial Intelligence 🔌
A folded-up protein. (DeepMind)
DeepMind “stunned” scientists with their neural network-based protein folding results. In computational biology, protein folding is the task to predict the three-dimensional structure of a protein given its sequence. Knowing how a protein folds is important for medical drug development, but for larger proteins this structure is difficult to predict. Robert Langreth for Bloomberg:
Artificial intelligence is a chic catchphrase in health care, often trotted out as a cure-all for whatever ails the industry. It has been held up as a potential solution to fix cumbersome electronic medical records, speed up diagnosis and make surgery more precise. DeepMind’s victory points to a possible practical application for the technology in one of the most expensive and failure-prone parts of the pharmaceutical business.
Machine learning-based drug discovery is becoming a big deal in Silicon Valley, with medical AI startups like Recursion Pharmaceuticals, Insitro, BenevolentAI and others raising over $1 billion in 2018 alone. Big tech labs, like Google’s and Facebook’s, have also started published papers on protein folding recently. All this work is going toward simulating the effects of drugs without having to test them clinically:
An aerospace company “won’t build and fly a plane without building it on the computer first and simulating it under many conditions,” said Colin Hill of GNS Healthcare, a startup using AI to model disease, whose investors include Amgen Inc. In the future, drugmakers won’t begin clinical trials without a virtual dry run, Hill said.
Although this practical AI application is still mostly in the research phase, these recent articles are a good indication that we’ll probably start seeing it being productized (or, in this case, used to develop medicine) in the coming years:
- Robert Langreth for Bloomberg: AI Drug Hunters Could Give Big Pharma a Run for Its Money
- DeepMind blog by the lab’s team that focuses completely on protein folding: AlphaFold: Using AI for scientific discovery
- Greg Williams wrote a deep dive for Wired: Inside DeepMind’s epic mission to solve science’s trickiest problem
Machine Learning Research 🎛
“Performance of RAdam, Adam and SGD with different learning rates on CIFAR10,” showing RAdam’s robustness to learning rate selection. (Liu et al.)
Liu et al. introduced RAdam, a new rectified variant of Adam. Adam is a gradient-based optimizer for neural networks that uses the estimates of a gradient’s lower-order moments to beat standard stochastic gradient descent (SGD) in terms of accuracy and epochs to convergence. It was originally developed by Kingma and Ba (2014), and it has since become a go-to optimizer for ML researchers and practitioners alike (and for students too—I had to implement Adam it for a practical ML course).
A new paper by Liu et al. (2019) has been making the rounds in the machine learning community because it investigates and extends Adam. The authors’ key insight is that adaptive learning rates cause a problematically large variance in early stages of training when using Adam. As a solution, they propose RAdam with a term to rectify this variance, and experimentally show that it is more robust to learning rate choices across different types of neural network architectures. It’s getting a lot of buzz on Twitter, and it has potential to become the go-to optimizer: having to worry less about fine-tuning the learning rate hyperparameter is a big win when training models. More on RAdam:
- Paper by Liu et al. on arXiv: On the Variance of the Adaptive Learning Rate and Beyond
- Implemenration on GitHub: LiyuanLucasLiu/RAdam
- Less Wright’s writeup of RAdam on Medium: New State of the Art AI Optimizer: Rectified Adam (RAdam). Improve your AI accuracy instantly versus Adam, and why it works.
- Fast.ai forums discussion, including comparisons with other state-of-the-art optimizers like Lookahead, Novograd, and Ranger (RAdam + Lookagead): Meet RAdam
Carlini et al. showed that neural networks can extensively memorize their training data. This is a concern from a security perspective: if a generative machine learning models is trained on sensitive user data it might, prompted with the right input, accidentally spit out users’ secrets. (Relevant XKCD.) Indeed:
[Given] access to a language model trained on the Penn Treebank with one credit card number inserted, it is possible to completely extract this credit card number from the model.
Carlini’s wonderfully-written post goes into the details of how they quantify memorization, when memorization happens during training, and how hyperparameters affect its severity. He finally explains how they were able to extract secrets like the above using a novel combination of beam search and Dijkstra’s algorithm, and how training using differentially-private SGD prevents memorization. Read the full post here: Evaluating and Testing Unintended Memorization in Neural Networks.
Quick ML resource links ⚡️ (see all 39):
- OpenSpiel is DeepMind’s framework for reinforcement learning in games, with 25+ games and 20+ algorithms built in, along with visualization and evaluation tools. GitHub link: deepmind/open_spiel
Artificial Intelligence for the Climate Crisis 🌍
Today’s news in this section is a bit meta: instead of projects that are using AI to battle the climate crisis directly, I’m covering projects that make AI research itself more climate-friendly. (Via the CCAI newsletter.)
Green AI is a movement to make AI both greener and more inclusive originating from the Allen Institute for AI (AI2). Schwartz et al. (2019) show that the of computational power required to do machine learning research has been doubling every few months, with the cost of training cutting-edge ML models rising to hundreds of thousands of dollars—an environmentally-unfriendly and uninclusive trend that the authors refer to as Red AI.
Green AI is meant to oppose this trend by encouraging researchers to build their models with efficiency in mind, focussing on carbon emissions, electricity usage, training time, number of parameters, and floating point operation counts. The authors advocate for major conferences to require that researchers report these efficiency measures in their papers. I think that’s a great idea, and hope to see conferences implement it. More:
- Position paper by Schwartz et al. (2019): Green AI
- A recent example of Green AI by Hugging Face that’s making the rounds on Twitter: achieving 95% of BERT’s GLUE performance, with a fraction of the parameters: Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT
ICLR and other ML conferences are going to buy carbon offsets for their participants’ flights. Yoshua Bengio posted about it on Facebook:
We in the ICLR executive (which manages the ICLR conference) just voted to buy offsets for the estimated air travel of participants to the conference. I do not think that the increased registration fees which will eventually result from this will deter much travel, but at least we can invest in planting trees and other means to neutralize the impact somewhat.
His last point has been a source of debate in the climate activism community for a long time now: if we merely offset the emissions of our flights by paying 10-20% more for our tickets, this may do more to ease over our flykskam than to actually help the planet. However, I think doing it on an organizational level like this is a good step for pushing the awareness needed for the international governmental change we need (kerosene and emissions taxes). More:
- SIGPLAN blog: ACM Conferences and the Cost of Carbon. All ACM conferences are now required to publicly report its carbon footprint and SIGPLAN is exploring carbon offsets.
- I still fly quite a lot more than I’m comfortable with myself—I’m writing today’s DT at the airport—, so this topic is something I plan to think and write more about in the coming months.
Cool Things ✨
Illustration of how the nanophotonic neural medium (NNM) recognizes hand-written digits. (Khoram et al.))
Researchers at the University of Wisconsin, MIT, and Columbia developed glass-based neural network. Simply by shining an image of a handwritten-digit from the MNIST dataset through their “nanophotonic” pane of glass, they can detect which digit it is with 79% accuracy by looking at how the light propagates. See their paper here.
Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.
If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. 🚲