#18: Runway ML's "app store" for AI, Google's new YouTube dataset, and a trippy GAN journey
Welcome to the 18th issue of Dynamically Typed—an exciting one, because this edition marks the first time that I’m sending it out to more than 100 people! 🎉
There were lots of cool releases in the past two weeks, so today’s DT is also a bit longer than usual. On the productized AI side, there was Descript’s new Speaker Detective feature, TabNine’s deep learning code autocomplete, and Runway ML’s “app store” for pretrained AI models. On the research side, GitHub has a neural network model for classifying programming languages, Google released a new YouTube dataset, and Vi Hart wrote an insightful essay about the value of our data. Beyond that, I found new machine learning resources, climate-related ML initiatives, and two cool AI art projects. Let’s dive in.
Productized Artificial Intelligence 🔌
Editing an audio file transcript with Descript’s new speaker labels feature. (Descript)
Audio transcription tool Descript launched a new feature called Speaker Detective. Descript is a great example of a specialized AI product: it takes an audio file (like a podcast or conference talk recording) as input and transcribes it using machine learning. Then, it lets you edit the transcript and audio in synchrony, automatically moving audio clips around as you cut, paste, and shuffle around text. Descript just launched Speaker Detective, a new feature that asks you to identify who’s talking in a few clips of your recording, and then uses AI to automatically apply speaker labels to the rest of the audio. The interface to edit and correct these labels also looks super slick, as you can see in the gifs in Andy Anderegg’s post for Descript’s Medium blog here: New in Descript: AI-Powered Speaker Labels.
TabNine is an autocompleter that helps you write code faster. Trained on 2 million files from GitHub, it tries to predict the next code token given all preceding tokens. Since it’s not just looking at the source code like existing autocomplete systems do, it can pick up subtle hints from natural-language comments and documentation in the code, helping it make better predictions of what you’ll type next. Using this approach, it populates autocomplete pop-ups for languages like Python, Java, C++, and Haskell. TabNine offers a cloud code editor for individual developers and a self-hosted model for enterprises that must ensure the security of their code. More:
- TabNine’s blog post: Autocompletion with deep learning
- Demo tweets by Andrej Karpathy and Gerald de Melo (Thanks for the link, Steinar!)
Runway ML is an “app store” of easy-to-use machine learning models for creators. It allows you to browse through a directory of pre-trained models, experiment with giving your own inputs to the models, integrate the outputs into popular creative tools, and (soon) even retrain models on your own data, all in a user interface that requires zero coding. Some models available on Runway ML include:
- Motion capture: Posenet (skeletal tracking), Face Landmarks (identifying facial key points), Mask RCNN (cutting out image backgrounds), and DensePose (generative 3D surface models from 2D photos of people)
- Image synthesis: StyleGAN + BigGAN (generating faces, landscapes, etc.), Fast Style Transfer (stylizing images after famous painters), and AttnGAN (synthesizing images from text descriptions)
- Object recognition: DenseCap + Im2Txt (generating image descriptions), COCO (finding bounding boxes), and Face Recognition
- Other: GPT-2 (text generation), DeOldify (image colorization and restoration), and more.
I’ve been following Runway for a while, and they seem to be adding new models every few weeks to what is by now already a very full-featured suite. More:
- Runway ML’s website
- James Vincent for The Verge: Runway ML puts AI tools in the hands of creators everywhere
Machine Learning Research 🎛
Kavita Ganesan and Romano Foti wrote about OctoLingua, a machine learning model GitHub uses to classify programming languages. Figuring out in what language a piece of code is written is may seem like a trivial problem, but GitHub deals with lots of files with missing, incorrect, and overlapping file extensions. In this first scenario, their previous heuristics-based approach, Linguist, has an F1-score of 0.05. This is not surprising since Linguist mostly looks at file extensions to make its predictions. The new approach, OctoLingua, uses some interesting hand-engineered features (such as the top five special characters and top 20 code tokens per file), which it feeds into a two-layer fully-connected neural network with dropout. Using this approach—and without having to jump straight to more complex sequence-to-sequence model like LSTMs—OctoLingua achieves an F1-score up to 0.95 in the missing file extension setting. More details on the GitHub Blog: C# or Java? TypeScript or JavaScript? Machine learning based classification of programming languages.
Google released YouTube-8M Segments, a new video classification dataset. Unlike the original YouTube-8M dataset, which had machine-generated labels for eight million YouTube videos, Segements has human-labeled annotations of five-second clips within these videos. Google is also hosting a Kaggle challenge for the new dataset and organizing a related workshop at ICCV'19. More on the Google AI blog: Announcing the YouTube-8M Segments Dataset.
Vi Hart wrote an essay about AI, Universal Basic Income, and the value of data. She frames machine learning models as “collages” that “glue together” the knowledge of the thousands (or millions) of people whose data was used to train them:
That cancer-screening AI might get it right more often than any individual doctor, but that’s not because it’s “smarter” but because it uses the knowledge of thousands of doctors all collaged together. – M Eifler
Hart argues that we should value data creators much more than we do today. Many traditional jobs are being replaced by machine learning systems that can figure out 99% of cases, and then fall back on 10-cents-a-task Mechanical Turk gig workers for the 1% they can’t handle. These people (like the annotators for YouTube-8M Segments) are doing critically important work—without them, ML systems can’t “learn” anything—but they are woefully underpaid and underprotected by current labor laws that were designed for non-gig work. Hart:
33% of US adults have a bachelor’s degree. A 2016 Pew Research Center study that focused on US workers on [Amazon’s Mechanical Turk platform (MTurk)] found that 51% of US workers on MTurk have a bachelor’s degree. The same Pew Research Center study also found that the majority of US workers on MTurk were making under $5 an hour. Federal minimum wage in the US is $7.25.
That’s people who went to college and got a degree, working online for less than minimum wage, with zero benefits.
This “ghost work” problem is just one small part of Hart’s essay; she also dives deeper into the influences of the history of science and cultural evolution, the “AI purity culture” prevalent in Silicon Valley, and data markets. It’s a long post, but it’s incredibly insightful. I highly recommend taking an hour out of your Sunday to sit down and give it a read. If you do, please let me know your thoughts! Read Vi Hart’s essay at The Art of Research here: Changing my Mind about AI, Universal Basic Income, and the Value of Data
Quick ML resource links ⚡️ (see all 31)
- IBM’s differential privacy library, which includes a Jupyter notebook to explore the effect of differential privacy on machine learning accuracy using basic classification and clustering models. Link: IBM/differential-privacy-library
- PyTorch Transformers is a library of state-of-the-art pre-trained models for natural language processing, including BERT, GPT, GPT-2, Transformer-XL, XLNet, and XLM. Link: huggingface/pytorch-transformers
- I made a repository of BibTeX citations for common Python packages, to make it easier to give credit to the software we use for machine learning research in papers. Link: leonoverweel/bibtex-python-package-citations
- fast.ai has several high-quality free online machine learning courses, from practical deep learning for coders, to cutting edge deep learning from the foundations, to NLP. Link: fast.ai
Artificial Intelligence for the Climate Crisis 🌍
The Gringgo Indonesia Foundation is using computer vision to improve recycling rates. They launched several apps, for both waste workers and the public, that use image recognition to “analyze and classify waste items, and quantify their value.” The foundation hopes to motivate waste workers to collect more recyclable waste by educating them on what can and can’t be recycled through the app. In their first pilot village, Sanur Kaja in Bali, they saw recycling rates increase by 35%. Mollie Javerbaum for Google’s The Keyword blog: To reduce plastic waste in Indonesia, one startup turns to AI.
Solar Forecast Arbiter is an open source evaluation framework for comparing solar forecast models. Solar forecast models are important for predicting the real-time electricity yield of solar panels, and in extent for making solar power more efficient and affordable. The University of Arizona’s project consists of a set of metrics, data models and policies, benchmark forecasts, and reference data, all useful for researchers working on solar forecasting. Check out the tool’s website: Solar Forecast Arbiter.
Cool Things ✨
5,000 book covers, arranged by visual similarity using t-distributed stochastic neighbor embedding (t-SNE). (Jess Peter, The Pudding)
Jess Peter arranged the covers of 5000 top-selling books by visual similarity, resulting in the mesmerizing interactive map above. The website also has filters to show/hide different groups of books, which you can use to find correlations. One of the clearest ones I found is that almost all the books with white covers (on the right side of the screenshot above) are non-fiction, while the darker and more colorful ones (on the left) are mostly fiction. It’s cool that patterns like this emerge even though the algorithm was looking at the covers purely visually. Try it out at The Pudding: 11 Years of Top-Selling Book Covers, Arranged by Visual Similarity.
StyleGAN antipodes is a trippy journey through “opposite” images in the latent space of a generative adversarial network. Gene Kogan describes his video as “a random traversal through the latent space of a StyleGAN trained on 100,000 paintings from WikiArt, where each frame contains two images whose latent codes are antipodes (right Z = -1 * left Z).” Take a look on Vimeo: StyleGAN antipodes.
Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.
If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. 🚀