#28: OCR for LaTeX equations, Night Sight for astrophotography, and a GPT-2-powered text adventure
Hey everyone, welcome to Dynamically Typed #28! Here’s what I’ve got for you in today’s edition of the newsletter.
On the productized AI side, there’s automatic LaTeX generation from screenshots or photos of handwritten equation using Mathpix, and an update to Night Sight on Google Pixel phones that allows for mobile astrophotography; and for ML research, there’s OpenAI’s new Procgen Benchmark. On the climate change AI side, coinciding with last week’s COP25, ETH Zurich wrote up David Dao et al.’s work on predicting deforestation based on “fish bone” patterns in rainforest satellite imagery. And finally, I found AI Dungeon 2 , a super fun text adventure game powered by the GPT-2 language model.
Productized Artificial Intelligence 🔌
As input for its OCR pipeline, Mathpix can use either a camera photo (on mobile) or a screenshot (on desktop).
Mathpix is a tool to extract LaTeX from PDFs and handwritten notes, available across macOS, Windows, Linux, iOS and Android. I’ve used it extensively during my MSc and continue to use it at Plumerai when referencing equations from literature in writeups. Mathpix has saved me a ton of time that I’d otherwise spend tediously rewriting LaTeX math, and in my experience, it always outputs syntactically correct LaTeX source and has never gotten an equation wrong.
Mathpix is a great example of productized artificial intelligence because it takes a hard problem—joint optical character recognition (OCR) and presentational markup generation—, solves it well, and then wraps it in a smooth user experience. They monetize using a freemium model for consumers (50 free snips free per month, or unlimited snips for $5 per month) and a paid API for third party developers. Find out more on the company’s website: Mathpix Snip (consumer app) and Mathpix OCR (developer API).
Side note: I had some trouble tracking down exactly what machine learning algorithms power Mathpix, so I went down a bit of a research rabbit hole. Their website doesn’t mention any details of how the OCR works, but an old hackernews comment points toward some Harvard NLP research that used a neural encoder-decoder model to achieve similar results, and least one of their employees has the job title Senior Deep Learning R &D Engineer on LinkedIn. So even though their website doesn’t shout it from the rooftops (which is also nice for a change), I think Mathpix definitely qualifies as a “real” AI application.
Examples of photos taken on a Google Pixel 4 phone with Night Sight enabled. (Google)
Florian Kainz and Kiran Murthy wrote about using the Night Sight camera mode on Google Pixel phones for astrophotography. Night Sight, which has been on Pixel phones since last year, “allows phone photographers to take good-looking handheld shots in environments so dark that the normal camera mode would produce grainy, severely underexposed images.” The updated version allows exposures of up to 4 minutes and automatically takes care of removing image artefacts like hot pixels that start to occur at such long exposure times. As an iPhone user, this is definitely the biggest feature that makes me jealous of people with Pixel phones. Read more about Night Sight on the Google AI Blog:
- Marc Levoy and Yael Pritch: Night Sight: Seeing in the Dark on Pixel Phones
- Florian Kainz and Kiran Murthy: Astrophotography with Night Sight on Pixel Phones
Machine Learning Research 🎛
OpenAI has released the Procgen Benchmark, a set of 16 reinforcement learning environments. In what seems to be a trend in RL research this year, the benchmark is very specifically designed to require agents to have two types of generalization:
- Generalization within one game , which is lacking in the common Arcade Learning Environment because agents may just be memorizing specific trajectories for each game. Procgen aims to solve this by procedurally generating levels for each game.
- Generalizing between games , which is lacking in previous procedurally generated environments. Procgen aims to solve this by having the 16 different environments.
I haven’t been covering a lot of RL anymore recently, but this stuck out to me because it’s broadly addressing the same concerns that are central to Chollet’s Measure of Intelligence paper and Abstraction and Reasoning Corpus benchmark (see DT #26). Over the past year or two a lot of cutting edge RL research has been in beating specific video games (like Dota and StarCraft), and the field has been criticized for not having many real-world applications; so I’m excited by this shift toward generalizability. It’d be cool to see some more applications research too.
Read more on OpenAI’s blog: Procgen Benchmark.
Quick ML resource links ⚡️ (see all 49)
- Mathpix Snip lets you automatically extract LaTeX equations from screenshots; very handy for writing papers.
- Mathematics for Machine Learning is a free 400+ page book that covers math concepts relevant to ML engineers and researchers.
- nbdev allows you to create complete python packages, including tests and a rich documentation system, all in Jupyter Notebooks.
Artificial Intelligence for the Climate Crisis 🌍
Predicted areas of rainforest deforestation by GainForest. (Dao et al.)
David Dao and his collaborators at ETH Zurich’s AI for Climate Action research team presented GainForest at ICML 2019. The machine learning system uses video prediction and semantic segmentation of satellite imagery to predict how forests evolve and which parts are likely to disappear next. Similar to 20tree.ai’s approach (see DT #25), their algorithm looks for tell-tale “fish bone” structures in tree coverage to find areas that are likely to be deforested soon. Florian Meyer for ETH news:
As Dao explains, the algorithms read sequences in order to recognise which areas are forested and whether these areas are shrinking. These sequences are individual images strung together in chronological succession – much like old film reels or comic strips. So when a new road is built through the rainforest, for instance, numerous smaller roads form off it over time. It is along these roads that the forest coverage is destroyed.
From a bird’s-eye view, the resulting pattern resembles the skeleton of a fish, with its spine and small bones – thus the moniker “fish bones”.
This week, Dao presented the work at the 2019 UN Climate Change Conference (COP25) and he’s working with parties like the Chilean forestry authority to see if GainForest can help them detect and prevent illegal logging activities. More here:
- Florian Meyer for ETH news: Rainforest preservation through machine learning
- ICML Paper by Dao et al. (2019): GainForest: Scaling Climate Finance for Forest Conservation using Interpretable Machine Learning on Satellite Imagery (PDF)
- Website for the project, which visually explains how GainForest works: Komorebi: Predicting Deforestation
- Website for the research group: AI for Climate Action
Climate Change AI is going to be at NeurIPS 2019. The workshop will open with a keynote from Google AI’s Jeff Dean (Chuck Norris of computer science) and have spotlights “on routing aircraft efficiently, predicting wildfire risks, understanding cloud properties, designing next-gen solar panels, and more.” Here’s the full pitch of the workshop:
Many in the ML community wish to take action on climate change, yet feel their skills are inapplicable. This workshop aims to show that in fact the opposite is true: while no silver bullet, ML can be an invaluable tool both in reducing greenhouse gas emissions and in helping society adapt to the effects of climate change. Climate change is a complex problem, for which action takes many forms - from designing smart electrical grids to tracking deforestation in satellite imagery. Many of these actions represent high-impact opportunities for real-world change, as well as being interesting problems for ML research.
I wish I could go to this—Yoshua Bengio, Andrew Ng, Carla Gomes, Lester Mackey, and Jeff Dean on a Climate change and AI panel sounds amazing!—but luckily they’re planning to record the talks and panels. I’ll do my best to write them up here in the next edition of this newsletter. You can see the full schedule on the workshop website.
Cool Things ✨
Playing AI Dungeon 2 in the fantasy setting as the wizard character.
Nick Walton released AI Dungeon 2 : “infinitely generated text adventures with deep learning models.” The game is built on top of OpenAI’s GPT-2 XL language model (see DT #27) which means that, unlike in traditional text adventures, the text adventure’s inputs and responses are completely open-ended. Gameplay is very simple: based on some seed information (a genre, character, and name), AI Dungeon 2 generates a scenario; the player then responds with a command (text after the “>”); and the game generates a new scenario in response. The process then repeats.
Here’s a fun snippet that captures how well the GPT-2 backend understands the meaning of commands and keeps track of context, from a playthrough by Janelle Shane on her AI Weirdness blog:
As you walk through the ruins, you find a small wooden door with a few skeletons guarding it.
Ask the skeletons to please let me through the door
You knock on the door and ask them if they can open it for you. They immediately comply.
Transform the ruin into a palace
You immediately change the area into a palace. You have a throne made with gold and marble. You also make several statues of yourself. You have a large bed made with silk sheets and hang a portrait of yourself on it. You even have a fountain made with pearls and diamonds.
You can play AI Dungeon 2 online in a Google Colab notebook, (free access to the beefy GPU you need to run GPT-2!) and read more about it on Nick Walton’s blog: AI Dungeon 2: Creating Infinitely Generated Text Adventures with Deep Learning Language Models.
Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.
If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. 🌱