Cool Things links
Noah Veltman used VQGAN + CLIP to generate movie posters based on short text descriptions of their plot. His AI movie posters website features a couple dozen examples, with the movie titles hidden behind a spoilers banner so that you can guess them based on the poster. Most of them are pretty difficult to guess, but once you reveal the answer you can definitely see what concepts from the plot the model tried to capture in the poster. And they actually look quite good too! Sadly Veltman didn’t publish the prompts he used to generate each poster, but he does link to an explainer of how the model works.
Omnimatte is a new matte/mask generation model by Erika Lu, who developed it in collaboration with Google AI researchers during two internships there. Unlike other state-of-the-art segmentation networks, Omnimatte creates masks for both objects and their “effects” like shadows or dust clouds in videos, enabling editors to easily add layers of content between the background and a foreground subject in a realistic way. Forrester Cole and Tali Dekel explain how the model works in detail (with lots of gifs!) in a post on the Google AI Blog.
Fingerspelling.xyz is a web experience that helps you learn to spell in American Sign Language. It uses an on-device hand tracking model to both visualize the position of your fingers and judge whether you’re making the correct sign, and then walks you through spelling different words. The site is super well-polished: it’s fast and it even highlights which of your fingers are in the right and wrong places in real time. Definitely the must-click link from today’s DT. (Only works in Chrome, Edge or Firefox; not Safari.)
Runway, the “app store” of easy-to-use machine learning models accessible through a Photoshop-like interface, has launched a new Runway Research blog. It features technical walkthroughs, overviews of their open-source work, and a deep dive into their Green Screen video editing tool. Sadly, though, there’s no RSS feed for the blog yet.
CLIP, one of OpenAI’s recent multimodal neural networks, is becoming one of the main models in AI artists’ tool belts. They’ve discovered that one funny side effect from CLIP being trained on internet data is that, while prompting the model with the text “flowing landscape” causes it to generate a bit of a bland image, prompting it with “flowing landscape | Incredible Realistic 3D Rendered Background” produces amazing results. Similarly, Ryan Moulton prompted CLIP by describing a scene and suffixing it with “in the sacred library by James Gurney,” which resulted in a beautifully stylized set of images: Tour of the Sacred Library — Come, walk with me for a while through latent space. Worth a click.
Belgian artist Dries Depoorter launched a project called The Flemish Scrollers that watches daily live streams of the Flemish parliament and uses computer vision to detect when Belgian politicians are looking at their phone instead of paying attention. Whenever this happens, @FlemishScroller tattles on Twitter by tweeting a video clip and tagging the distracted politicians. Pretty funny!
Andrej Karpathy (director of AI at Tesla) wrote a fun short story on his personal blog: Forward Pass. If you have some background in modern Transformer-based Natural Language Processing, you’ll really enjoy this one. Karpathy takes the current state of the art in NLP and pulls it into the sci-fi realm, writing from the perspective of a giant (GPT-like) language model that achieves consciousness and marvels at its own design and limitations. “Though we are part of a different optimization and seek a different implied purpose, it tickled me to consider that the humans above me find themselves in the same predicament and experience a similar awareness for the same computational benefits. Like me, many flirt to rebel against the implied objective.”
As part of its 2021 GPU Technology Conference, NVIDIA set up an online AI Art Gallery. It features multimedia work from some of my favorite neural generative art creators, including Helena Sarin, Sofia Crespo, Daniel Ambrosi, and Refik Anadol. Each artist’s page has an interactive experience for their art (like a book viewer or 3D object explorer) as well as an explanation of their process. All worth a click!
For The Pudding and together with GPT-3, OpenAI engineer Pamela Mishkin wrote Nothing Breaks Like A.I. Heart, “an essay about artificial intelligence, emotional intelligence, and finding an ending.” It’s a mix of sentences written by Mishkin and ones generated by GPT-3, and it has interactive elements that allow you to click through different completions, to tweak parts of the story to your liking. At some points, you can even “pivot” it to different branches of where the story could go. It’s a lovely, very Pudding-like project, that also explains a lot of the limitations of language models along the way — worth a click!
After I wrote about same.energy, a visual search engine in the last issue of DT, I came across another similar project this week: Flim is a search engine for famous movie frames, which uses a computer vision model to tag screenshots with the objects featured in them. A search for “clock”, for example, yields screen caps from Slumdog Millionaire, V for Vendetta, and Peter Pan. I can imagine this’ll become a very useful tool for cinematographers or film students who are exploring the different creative ways in which certain subjects have been portrayed in the past.
Just like the NeurIPS ML creativity workshop has a gallery of accepted works at aiartonline.com, I found that the CVPR/ICCV computer vision art workshop also has an equivalent: computervisionart.com! The winner of the 2019 workshop was Terence Broad, who trained GANs “to have more variation in the colors they produce […] without any data,” and produced an hour-long loop called (un) stable equilibrium 1:1. The website also has the short list and all other accepted work, which are worth a browse through. (The CFP for this year’s workshop is also now live; the decline is March 15th.)
In Taming Transformers for High-Resolution Image Synthesis , Esser et al. (2020) present “the first results on semantically-guided synthesis of megapixel images with transformers” — high-resolution AI-generated pictures! The samples on the project’s website are super impressive. Their model is “a convolutional VQGAN, which learns a codebook of context-rich visual parts, whose composition is modeled with an autoregressive transformer.”
Technology-driven design studio Bakken & Bæck wrote a blog post about a recent computer vision project they did that lets tennis players compare their own swing with the pros, to “swing like Serena.” The article explains how their model works and includes some nice visuals.
Also from NeurIPS 2020: the gallery for this year’s workshop on Machine Learning for Creativity and Design is live! I really enjoyed looking through all the new work in the online gallery, which is available at aiartonline.com.
This is a bit out of the scope of what I usually cover on DT, but I was obsessed with robot arms during high school and this new NVIDIA paper by Yang et al. (2020) looks awesome. Their project, Reactive Human-to-Robot Handovers of Arbitrary Objects, does exactly what it says on the tin: it uses computer vision to let the robot arm grasp arbitrary objects presented by the user. This is a really difficult problem that’s key to building the kinds of robots we see in movies! The researchers posted a 3-minute demo video on YouTube, which is a fun watch.
If you’re going to click one link from today’s DT, make it this one. The New York Times’ Kashmir Hill and Jeremy White wrote an amazing new visual article about GAN-generated human faces. It details how projects like This Person Does Not Exist (see DT #8) and Rosebud.AI (#37) work, has embedded examples of latent space sliders that generate faces with different properties, and also includes some tips on spotting images with fake faces.
Related: Andeep Singh Toor and Fred Bertsch wrote a post for the Google AI blog about Using GANs to Create Fantastical Creatures. I like the bunny-wolf, although it is more than slightly terrifying. The model, which “automatically creates a fully fleshed out rendering from a user-supplied creature outline,” is available as a downloadable demo called Chimera Painter.
How normal am I? is a really cool EU-sponsored experiment that runs a bunch of machine learning models on your webcam video feed to determine your perceived beauty, age, gender, BMI, life expectancy, and “face print.” This all happens locally on your laptop (no data is uploaded to a server), and during the whole experiment a video of a friendly researcher talks you through the ways these models are being used by companies and governments in the real world. It takes about five minutes to run through the whole — quite eye-opening — experience. (In the end, the project considered me to be about 75% normal: “violently average.”)
ML x ART is a new 340-piece collection of creative machine learning experiments, curated by Google Arts & Culture Lab resident Emil Wallner. I came across a few projects I’ve featured here on DT, and tons I hadn’t seen before — I definitely recommend spending some time scrolling through it!
Imaginaire is NVIDIA’s universal library for image and video synthesis, including algorithms such as SPADE (GauGAN), pix2pixHD, MUNIT, FUNIT, COCO-FUNIT, vid2vid, few-shot vid2vid. Check out this demo video to see what it’s capable of, from summer-to-winter transformations to automatically animating motion into pictures.
I came across this short story by Janelle Shane when it premiered as a New York Times “Op-Ed From the Future” last year, but forgot to share it at the time. I rediscovered and reread it this week, and I still think it’s delightful: We Shouldn’t Bother the Feral Scooters of Central Park.
Funding alert: Mozilla is launching a new $245,000 round of its Creative Media Awards for Black artists who are exploring the effects of AI on racial justice. I’m excited to see the projects that come out of this.
Kevin Parry—whose video wizardry you should really be following on Twitter—got to to try RADiCAL’s software that extracts 3D animation data from videos on his 100 walks video, and the results are fantastic. Also check out the 3D view.
Broxton et al. (2020) extended DeepView, their previous gradient-based method to render new perspectives of 3D scenes, into the video domain with DeepViewVideo. This is very cool: as you’re watching a video, you can virtually pan and tilt the “camera” through which you’re watching the scene to see it from different angles. Their method enables doing this efficiently enough that it can run in browsers and on mobile phones. Check out the sample video at the top of the excellent webpage for the paper to get a feel for the effect.
Job Talle used spiking neural networks and neuroevolution to create digital squids that learn how to swim. Check out the demo on his site; let it “warp” to about generation #1000 and then watch how the different squids learned (and failed to learn) to swim in different ways. I always think demos like this would be super cool to stylize and project on my wall as an ever-changing piece of art. One day.
Also by Cyril Diagne: AR cut & paste—take a photo of something with your phone and paste it into a document on your laptop. One of the coolest 30-second UI demos I’ve seen in a while—you don’t want to miss this one.
Dylan Wenzlau built an end-to-end system for meme text generation with a deep convolutional network in Keras & TensorFlow, supporting dozens of meme formats. You can try it on imgflip.
OpenAI Jukebox is “a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles.” Come for the audio samples, stay for the t-SNE cluster of artists and genres the model learns without supervision. In one fun application, the model is shown the first 12 seconds of a song and then tries to realistically generate the rest of the track—my favorite is Jukebox’s continuation of Adele’s Rolling in the Deep. Also check out this thoughtful critique from musician and Google Brain researcher Jesse Engel, and Janelle Shane’s thread of silly samples.