Cool Things links
For The Pudding and together with GPT-3, OpenAI engineer Pamela Mishkin wrote Nothing Breaks Like A.I.
Heart, “an essay about artificial intelligence, emotional intelligence, and finding an ending.” It’s a mix of sentences written by Mishkin and ones generated by GPT-3, and it has interactive elements that allow you to click through different completions, to tweak parts of the story to your liking.
At some points, you can even “pivot” it to different branches of where the story could go.
It’s a lovely, very Pudding-like project, that also explains a lot of the limitations of language models along the way — worth a click!
After I wrote about same.energy, a visual search engine in the last issue of DT, I came across another similar project this week: Flim is a search engine for famous movie frames, which uses a computer vision model to tag screenshots with the objects featured in them.
A search for “clock”, for example, yields screen caps from Slumdog Millionaire, V for Vendetta, and Peter Pan.
I can imagine this’ll become a very useful tool for cinematographers or film students who are exploring the different creative ways in which certain subjects have been portrayed in the past.
Just like the NeurIPS ML creativity workshop has a gallery of accepted works at aiartonline.com, I found that the CVPR/ICCV computer vision art workshop also has an equivalent: computervisionart.com!
The winner of the 2019 workshop was Terence Broad, who trained GANs “to have more variation in the colors they produce […] without any data,” and produced an hour-long loop called (un) stable equilibrium 1:1.
The website also has the short list and all other accepted work, which are worth a browse through.
(The CFP for this year’s workshop is also now live; the decline is March 15th.)
In Taming Transformers for High-Resolution Image Synthesis , Esser et al.
(2020) present “the first results on semantically-guided synthesis of megapixel images with transformers” — high-resolution AI-generated pictures!
The samples on the project’s website are super impressive.
Their model is “a convolutional VQGAN, which learns a codebook of context-rich visual parts, whose composition is modeled with an autoregressive transformer.”
Technology-driven design studio Bakken & Bæck wrote a blog post about a recent computer vision project they did that lets tennis players compare their own swing with the pros, to “swing like Serena.” The article explains how their model works and includes some nice visuals.
Also from NeurIPS 2020: the gallery for this year’s workshop on Machine Learning for Creativity and Design is live!
I really enjoyed looking through all the new work in the online gallery, which is available at aiartonline.com.
This is a bit out of the scope of what I usually cover on DT, but I was obsessed with robot arms during high school and this new NVIDIA paper by Yang et al.
(2020) looks awesome.
Their project, Reactive Human-to-Robot Handovers of Arbitrary Objects, does exactly what it says on the tin: it uses computer vision to let the robot arm grasp arbitrary objects presented by the user.
This is a really difficult problem that’s key to building the kinds of robots we see in movies!
The researchers posted a 3-minute demo video on YouTube, which is a fun watch.
If you’re going to click one link from today’s DT, make it this one.
The New York Times’ Kashmir Hill and Jeremy White wrote an amazing new visual article about GAN-generated human faces.
It details how projects like This Person Does Not Exist (see DT #8) and Rosebud.AI (#37) work, has embedded examples of latent space sliders that generate faces with different properties, and also includes some tips on spotting images with fake faces.
Related: Andeep Singh Toor and Fred Bertsch wrote a post for the Google AI blog about Using GANs to Create Fantastical Creatures.
I like the bunny-wolf, although it is more than slightly terrifying.
The model, which “automatically creates a fully fleshed out rendering from a user-supplied creature outline,” is available as a downloadable demo called Chimera Painter.
How normal am I? is a really cool EU-sponsored experiment that runs a bunch of machine learning models on your webcam video feed to determine your perceived beauty, age, gender, BMI, life expectancy, and “face print.” This all happens locally on your laptop (no data is uploaded to a server), and during the whole experiment a video of a friendly researcher talks you through the ways these models are being used by companies and governments in the real world.
It takes about five minutes to run through the whole — quite eye-opening — experience.
(In the end, the project considered me to be about 75% normal: “violently average.”)
ML x ART is a new 340-piece collection of creative machine learning experiments, curated by Google Arts & Culture Lab resident Emil Wallner.
I came across a few projects I’ve featured here on DT, and tons I hadn’t seen before — I definitely recommend spending some time scrolling through it!
Imaginaire is NVIDIA’s universal library for image and video synthesis, including algorithms such as SPADE (GauGAN), pix2pixHD, MUNIT, FUNIT, COCO-FUNIT, vid2vid, few-shot vid2vid.
Check out this demo video to see what it’s capable of, from summer-to-winter transformations to automatically animating motion into pictures.
I came across this short story by Janelle Shane when it premiered as a New York Times “Op-Ed From the Future” last year, but forgot to share it at the time.
I rediscovered and reread it this week, and I still think it’s delightful: We Shouldn’t Bother the Feral Scooters of Central Park.
Funding alert: Mozilla is launching a new $245,000 round of its Creative Media Awards for Black artists who are exploring the effects of AI on racial justice.
I’m excited to see the projects that come out of this.
Kevin Parry—whose video wizardry you should really be following on Twitter—got to to try RADiCAL’s software that extracts 3D animation data from videos on his 100 walks video, and the results are fantastic.
Also check out the 3D view.
Broxton et al.
(2020) extended DeepView, their previous gradient-based method to render new perspectives of 3D scenes, into the video domain with DeepViewVideo.
This is very cool: as you’re watching a video, you can virtually pan and tilt the “camera” through which you’re watching the scene to see it from different angles.
Their method enables doing this efficiently enough that it can run in browsers and on mobile phones.
Check out the sample video at the top of the excellent webpage for the paper to get a feel for the effect.
Job Talle used spiking neural networks and neuroevolution to create digital squids that learn how to swim.
Check out the demo on his site; let it “warp” to about generation #1000 and then watch how the different squids learned (and failed to learn) to swim in different ways.
I always think demos like this would be super cool to stylize and project on my wall as an ever-changing piece of art.
One day.
Also by Cyril Diagne: AR cut & paste—take a photo of something with your phone and paste it into a document on your laptop.
One of the coolest 30-second UI demos I’ve seen in a while—you don’t want to miss this one.
Dylan Wenzlau built an end-to-end system for meme text generation with a deep convolutional network in Keras & TensorFlow, supporting dozens of meme formats.
You can try it on imgflip.
OpenAI Jukebox is “a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles.” Come for the audio samples, stay for the t-SNE cluster of artists and genres the model learns without supervision.
In one fun application, the model is shown the first 12 seconds of a song and then tries to realistically generate the rest of the track—my favorite is Jukebox’s continuation of Adele’s Rolling in the Deep.
Also check out this thoughtful critique from musician and Google Brain researcher Jesse Engel, and Janelle Shane’s thread of silly samples.