The deepfake detection ratrace
Microsoft is launching Video Authenticator , an app that helps organizations “involved in the democratic process” detect deepfakes — videos that make people look like they’re saying things they’ve never said by superimposing automatically-generated voice tracks and face movements over real videos. Deepfakes are usually made using generative adversarial networks (GANs) like those in Samsung AI’s neural avatars project (see DT #15) and in the popular open-source DeepFaceLab app.
Because of all the obvious ways in which deepfakes can be abused, this has been a popular research area for technology platform companies: a bit over a year ago, Facebook launched their deepfake detection challenge and Google contributed to TU Munich’s FaceForensics benchmark (#23). Microsoft has now productized these research efforts with Video Authenticator. The app checks photos and videos for the “subtle fading or greyscale elements” that may occur at a deepfake’s blending boundary — where the fake facial movements mix in with the real background media — and gives users a confidence score for whether a face is manipulated. This happens in real-time and frame-by-frame for videos, which I imagine will be particularly useful for detecting subtle fakery, like a mostly-real video with a few small tweaks that change its message.
Video Authenticator initially won’t be made publicly available. Instead, Microsoft is privately distributing it to news outlets, political campaigns, and media companies through the AI Foundation’s Reality Defender 2020 program, “which will guide organizations through the limitations and ethical considerations inherent in any deepfake detection technology.” This makes sense, since deepfakes represent a typical cat-and-mouse AI security game — new models will surely be trained specifically to fool Video Authenticator, which this limited release approach attempts to slow down.
I’d be interested to learn about how organizations integrate Video Authenticator into their existing workflows for validating the veracity of newsworthy videos. I haven’t really come across any examples of big-name news organizations getting fooled by deepfakes yet, but I imagine it’s much more common on social media where videos aren’t vetted by journalists before being shared.
Snapchat's platform for creative ML models
SnapML is a software stack for building Lenses that use machine learning models to interact with the Snapchat camera. You can build and train a model in any ONNX-compatible framework (like TensorFlow or PyTorch) and drop it straight into Snapchat’s Lens Studio as a SnapML component. SnapML can then apply some basic preprocessing to the camera feed, run it through the model, and format the outputs in a way that other Lens Studio components can understand. A segmentation model outputs a video mask, an object detection outputs bounding boxes, a style transfer model outputs a new image, etc. You even have control over how the model runs: once every frame, in the background, or triggered by a user action. (More details in the docs.)
Matthew Moelleman has written some great in-depth coverage on SnapML for the Fritz AI Heartbeat blog, including a technical overview and a walkthrough of making a pizza segmentation Lens. As he notes, SnapML has the potential to be super interesting as a platform:
Perhaps most importantly, because these models can be used directly in Snapchat as Lenses, they can quickly become available to millions of users around the world.
Indeed, if you create an ML-powered Snapchat filter in Lens Studio, you can easily publish and share it using a Snapcode, which users can scan to instantly use the Lens in their snaps. I don’t think any other platform has such a streamlined (no-code!) system for distributing trained ML models directly to a user base of this size. Early SnapML user Hart Woolery, also speaking to Heartbeat:
It’s a game-changer. At least within the subset of people working on live-video ML models. This now becomes the easiest way for ML developers to put their work in front of a large audience. I would say it’s analogous to how YouTube democratized video publishing. It also lowers the investment in publishing, which means developers can take increased risks or test more ideas at the same cost.
Similar to YouTube, the first commercial applications of SnapML have been marketing-related: there’s already a process for submitting sponsored Lenses, which can of course include an ML component. It’s not too hard to imagine that some advertising agencies will specialize in building SnapML models that, for example, segment Coke bottles or classify different types of Nike shoes in a Lens. I bet you can bootstrap a pretty solid company around that pitch.
Another application could be Lenses that track viral challenges: count and display how many pushups someone does, or whether they’re getting all the steps right for a TikTok dance. Snapchat is building many of these things itself, but the open platform leaves lots of room for creative ML engineers to innovate—and even get a share of this year’s $750,000 Official Lens Creators fund. (See some of the creations that came out of the fund here.)
The big question for me is whether and how Snapchat will expand these incentives for creating ML-powered Lenses. The Creators fund tripled in size from 2019 to 2020; will we see it grow again next year? Or are we going to get an in-app Snapchat store for premium Lenses with revenue sharing for creators? In any case, I think this will be a very exciting space to follow over the next few years.
Al Gore launches Climate TRACE
Former vice president Al Gore and Gavin McCormick of WattTime launched Climate TRACE, a project for Tracking Real-time Atmospheric Carbon Emissions. From the coalition’s launch post:
Our first-of-its-kind global coalition will leverage advanced AI, satellite image processing, machine learning, and land- and sea-based sensors to do what was previously thought to be nearly impossible: monitor GHG emissions from every sector and in every part of the world. Our work will be extremely granular in focus — down to specific power plants, ships, factories, and more. Our goal is to actively track and verify all significant human-caused GHG emissions worldwide with unprecedented levels of detail and speed.
Extracting information from satellite imagery is shaping up to be the killer app for climate change AI: we’ve previously seen it used for predicting electrical grid resilience (see DT #14), locating solar panels (#29), tracking deforestation (#25, #28, #39), and classifying farming land use (#41). At the NeurIPS 2019 panel on AI for climate change research (#30), former head of Google Brain Andrew Ng also mentioned that the ability to train models on small satellite datasets is one of the machine learning advances he was most excited about for climate projects.
All this is to say: I’m extremely excited to see such a broad coalition—its founding members include “Blue Sky Analytics, CarbonPlan, Carbon Tracker, Earthrise Alliance, Hudson Carbon, Hypervine, OceanMind, and Rocky Mountain Institute”—launch as an independent observer of greenhouse gas emissions. Their goals are certainly ambitious:
Through Climate TRACE, we will equip business leaders and investors, NGOs and climate activists, as well as international, domestic, and local policy leaders with an essential tool to fully realize the economic and societal benefits of a clean energy future, while ensuring that no one — corporation, country, or otherwise — will ever again have the ability to hide or fake their emissions data. Next year, every country in the world will gather in Glasgow, Scotland, to enhance their commitments to the Paris Agreement and raise collective ambition in line with what the world’s scientists tell us is necessary. We at the Climate TRACE coalition hope to support these COP26 climate talks with the most thorough and reliable data on emissions the world has ever seen.
The rest of the launch post goes a bit into how their GHG emissions observation will work, but beyond mentioning that they’ll do sensor fusion on visible + infrared imagery and satellite + radar measurements, Gore and McCormick don’t go into much technical detail yet. They mention that this will follow in future posts, which I’ll be sure to link to here when they come out.
Methods is Papers with Code's machine learning knowledge graph
Papers with Code’s Methods page for the residual block (cropped).
We are now tracking 730+ building blocks of machine learning: optimizers, activations, attention layers, convolutions and much more! Compare usage over time and explore papers from a new perspective.
I’ve started using Methods as my go-to reference for many things at work. Sitting at a more abstracted level than the documentation for your ML library of choice, it’s an incredibly useful resource for anyone doing ML research or engineering. Each Methods page contains the following sections:
- A concise description of what the method is and how it works, including math and a diagram where relevant
- A chronological list of papers that use the method
- A breakdown of tasks from the site’s State-of-the-Art leaderboards for which the method is used
- A graph of how the method’s use changed over time, compared to other methods of the same category (for example, Adam vs. SGD for optimizers)
- A list of components: other methods that contribute to this method (for example, 1x1 convolutions and ReLUs are components of residual blocks)
- A list of categories for the method
I’ve found the last of those sections to be especially handy for answering those hard-to-Google “what’s the name of that other thing that’s kind of like this thing again?” questions. (Also see this Twitter thread by the project’s co-creator Ross Taylor for a few example uses of the other sections.) Methods launched just a month ago, and given how useful it already is, I’m very excited to see how it grows in the future.
One additional feature I’d find useful is the inverse of the components section: I also want to know which methods build on top of the method I’m currently viewing. Another thing I’d like to see is an expansion of code links for methods to also include TensorFlow snippets—but since Facebook AI Research bought Papers with Code late last year, I’m guessing that keeping these snippets exclusive to (FAIR-controlled) PyTorch may be a strategic decision rather than a technical one.
GPT-3 demos: one month in
OpenAI is expanding access to its API powered by GPT-3, the lab’s latest gargantuan language model. As I wrote in last month’s DT #42, what makes GPT-3 special is that it can perform a wide variety of language tasks straight out of the box, making it much more accessible than its predecessor, GPT-2:
For example, if you feed it several questions and answers prefixed with “Q:” and “A:” respectively, followed by a new question and “A:”, it’ll continue the passage by answering the question—without ever having to update its weights! Other example include parsing unstructured text data into tables, improving English-language text, and even turning natural language into Bash terminal commands (but can it do git?).
At the time, only a few companies (like Casetext, MessageBird and Quizlet) and researchers (like Janelle Shane) had access to the API. But the rest of us could sign up for a waitlist, and over the past few weeks OpenAI has started sending out invites. I’ve collected some of the coolest demos here, roughly grouped by topic. I know it’s a lot of links, but many of these are definitely worth a look! They’re all very impressive, very funny, or both.
A big group of projects generate some form of code.
- Sharif Shameem prompted the API with a description of a website layout, and his app then generated and rendered the layout as JSX code. A little while later, he got it to generate functioning React apps.
- Harley Turan also used GPT-3 to generate React components based on variables names. Sonny Lazuardi did something similar, integrating React component generation into Figma.
- Components AI showed GPT-3 a few examples of words and emojis with corresponding hexadecimal color scales. It could then generate new color scales based on emoji. My favorites are “smoke” and “🦋”.
Two other projects imitate famous writers.
- AI|Writer by Andrew Mayne “creates simulated hypothetical correspondence with famous personalities, both real and fictitious.” You email an address that the app provides with “Dear…” to a historical figure, and a little while later it emails back with “their” response. My favorite example so far is a conversation between the Hulk and Bruce Banner.
- Nick Cammarata had GPT-3 generate poetry: Richard Feynman in the style of Robert Frost.
Another set of projects restructures text into new forms.
- Another experiment by Andrew Mayne can transform a movie script into a story (and the reverse). I found this demo particularly impressive: the story also includes a lot of relevant and interesting details tha were not in the original script.
- Francis Jervis had GPT-3 turn plain language into legal language. For example, “My apartment had mold and it made me sick” became “Plaintiff’s dwelling was infested with toxic and allergenic mold spores, and Plaintiff was rendered physically incapable of pursing his or her usual and customary vocation, occupation, and/or recreation.” (More here.)
- Mckay Wrigley built a site called Learn From Anyone, where you can ask Elon Musk to teach you about rockets, or Shakespeare to teach you about writing.
Some projects are about music.
- Arram Sabeti used GPT-3 for a bunch of different things, including generating songs: he had both Lil Wayne and Taylor Swift write songs called “Harry Potter,” with great results. (The blog post also contains a fake user manual for a flux capacitor and a fake essay about startups on Mars by Paul Graham.)
- Sushant Kumar got the API to write vague but profound-sounding snippets about music. For example, “Innovation in rock and roll was often a matter of taking a pop melody and playing it loudly.” And, “You can test your product by comparing it to a shitty product it fixes. With music, you can’t always do that.” (It also generates tweets for blockchain, art, or any other word.)
And finally, some projects did more of the fun prompt-and-response text generation we saw from GPT-2 earlier:
- Mario Klingemann, creator of the Memories of Passersby I AI art project (see DT #9), has been playing with GPT-3 a lot. He asked it why did the chicken cross the road?, had it fill in the blanks in stories, and asked it to make boring sentences sound more interesting.
- Sid Bharath asked GPT-3 what the purpose of life is. “Life is a beautiful miracle. Life evolves through time not greater forms of beauty. In that sense, the purpose of life is to increase the beauty of universe.” (The conversation goes on from there.)
- Janelle Shane had GPT-3 generate fake facts about whales, in the form of bullet points and Wikipedia auto-completes.
- Kevin Lacker gave GPT-3 a Turing test.
GPT-3 generating episode titles and summaries for the Connected podcast.
I also got my own invite to try GPT-3 for This Episode Does Not Exist!, my project to generate fake episode titles and summaries for my favorite podcasts, like Connected and Hello Internet. It used to work by fine-tuning GPT-2 on metadata of all previous episodes of the show for 600 to 1,000 epochs, a process that took about half an hour on a p100 GPU on Colab. Now, with GPT-3 I can simply paste 30ish example episodes into the playground (more is beyond the input character limit), type “Title:”, and GPT-3 generates a few new episodes—no retraining required! Once I get a chance to wrap this into a Python script, it’ll become so much easier for me to add new podcasts and episodes to the website.