Dynamically Typed

#55: DeepMind's structural biology breakthrough with AlphaFold 2

Hey everyone, welcome to Dynamically Typed #55. In today’s issue I’m mostly focusing on DeepMind’s new AlphaFold 2 (AF2) model, which I waited to dive into until last month’s hype settled down a bit; that story is right below this, in the ML research section. It took pretty long to understand and boil down everything around AF2, so I’ve only got a few other quick links in the other sections.

This is also the last DT of 2020 — happy holidays everyone! 🎄

Machine Learning Research 🎛

AlphaFold’s predictions vs. the experimentally-determined shapes of two CASP14 proteins.

AlphaFold’s predictions vs. the experimentally-determined shapes of two CASP14 proteins.

DeepMind’s AlphaFold 2 is a major protein folding breakthrough. Protein folding is a problem in structural biology where, given the one-dimensional RNA sequence of a protein, a computational model has to predict what three-dimensional structure the protein “folds” itself into. This structure is much more difficult to determine experimentally than the RNA sequence, but it’s essential for understanding how the protein interacts with other machinery inside cells. In turn, this can give insights into the inner workings of diseases — “including cancer, dementia and even infectious diseases such as COVID-19” — and how to fight them.

Biennially since 1994, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) has determined the state of the art in computational models for protein folding using a blind test. Research groups are presented (only) with the RNA sequences of about 100 proteins whose shapes have recently been experimentally determined. They blindly predict these shapes using their computational models and submit them to CASP to be evaluated with a Global Distance Test (GDT) score, which roughly corresponds to how far each bit of the protein is from where it’s supposed to be. GDT scores range from 0 to 100, and a model that scores at least 90 across different proteins would be considered good enough to be useful to science (“competitive with results obtained from experimental methods”).

Before CASP13 in 2018, no model had ever scored significantly above 40 GDT. That year, the first version of AlphaFold came in at nearly 60 GDT — already “stunning” at the time (see DT #21). At CASP14 this year, AlphaFold 2 blew its previous results out of the water and achieved a median score of 92.4 GDT across all targets. This was high enough for CASP to declare the problem as “solved” in their press release and to start talking about new challenges for determining the shape of multi-protein complexes.

I’ve waited a bit to write about AlphaFold 2 until the hype died down because, oh boy, was there a lot of hype. DeepMind released a slick video about the team’s process, their results were covered with glowing features in Nature and The New York Times, and high praise came even from the leaders of DeepMind’s biggest competitors, including OpenAI’s Ilya Sutskever and Stanford HAI’s Fei-Fei Li. It was a pretty exciting few days on ML twitter.

Columbia University’s Mohammed AlQuraishi, who has been working on protein folding for over a decade, was one of the first people to break the CASP14 news. His blog post about CASP13 and AlphaFold 1 was also widely circulated back in 2018, so a lot of people in the field were interested in what he’d have to say this year. Last week, after the hype died down a bit, AlQuraishi published his perspective on AlphaFold 2. He summarized it by saying “it feels like one’s child has left home:” AF2 got results he did not expect to see until the end of this decade, even when takin into account AF1 — bittersweet for someone whose lab has also been working on this same problem for a long time.

AlQuraishi is overall extremely positive about DeepMind’s results here, but he does express disappointment at their “falling short of the standards of academic communication” — the lab has so far been much more secretive about AF2 than it was about AF1 (which is open-source). AlQuraishi’s post is very long and technical, but if you want to know exactly how impressive AlphaFold 2 is, learn the basics of how it works, read about its potential applications in broader biology, or see some of the hot takes against it debunked, the post is definitely worth the ~75 minutes of your time. (I always find it energizing to see someone excitedly explain a big advancement in their field that they did not directly work on; here’s the link again.)

I personally also can’t wait to see the first practical applications of AlphaFold, which I expect we’ll start to see DeepMind talk about in the coming years. (Hopefully!) For one, they’ve already released AlphaFold’s predictions for some proteins associated with COVID-19.

Quick ML research + resource links 🎛

I’ve also collected all 75+ ML research tools previously featured in Dynamically Typed on a Notion page for quick reference. ⚡️

Artificial Intelligence for the Climate Crisis 🌍

Cool Things ✨

Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.

If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. 🎅