Dynamically Typed

#41: Black Lives Matter: highlighting ML/AI products, research and climate projects by Black creators


Like many around the world, I watched in horror last week as George Floyd was murdered by a police officer while three other officers stood by and watched, the latest in far too long a list of such incidents. It was tragic to see the massive peaceful #BlackLivesMatter marches that followed be marred by unprovoked police violence against protestors and journalists and by opportunist looters. I’ve since read a lot about how I can personally best contribute to the #BlackLivesMatter movement. I think that’s within the AI/ML community—my community, our community—in two ways.

First, I’ve donated $50 to Black in AI, an organization that works to “increase the presence of Black individuals in the field of AI.” This money goes towards supporting their BAI Workshops at major ML conferences like NeurIPS. If you’re looking for a way to support the #BLM movement within our community, Black in AI’s donation button is a great place to start—three readers have already let me know that they’re matching my donation, and I’d love to see that number increase.

Second, I’m covering the same sections as usual in this edition of Dynamically Typed; but today, each section will highlight recent work only from Black creators. I found much of this content thanks to the great work done by BAI, whose twitter feed I’ll continue to use as a source for content to cover in future editions of DT as well.

With that said, let’s dive into today’s newsletter.

Productized Artificial Intelligence 🔌

Fireflies.ai turns meetings into notes.

Fireflies.ai turns meetings into notes.

Fireflies.ai records and transcribes meetings, and automatically turns them into searchable, collaborative notes. The startup’s virtual assistant, adorably named Fred, hooks into Google Calendar so that it can automatically join an organization’s Zoom, Meet or Skype calls. As it listens in, it extracts useful notes and information which it can forward to appropriate people in the organization through integrations like Slack and Salesforce. Zach Winn for MIT News:

“[Fred] is giving you perfect memory,” says [Sam] Udotong, who serves as Firelies’ chief technology officer. “The dream is for everyone to have perfect recall and make all their decisions based on the right information. So being able to search back to exact points in conversation and remember that is powerful. People have told us it makes them look smarter in front of clients.”

As someone who externalizes almost everything I need to remember into an (arguably overly) elaborate system of notes, calendars and to-do apps, I almost feel like this pitch is aimed directly at me. I haven’t had a chance to try it out yet, but I’m hoping to give it a shot on my next lunchclub.ai call (if my match is up for it, of course).

Fireflies is not alone, though. It looks like this is becoming an competitive space in productized AI, with Descript (DT #18, #24), Microsoft’s Project Denmark (#23), and Otter.ai (#40) all currently working on AI-enabled smart transcription and editing of long-form audio data. Exciting times!

Quick productized AI links 🔌

Machine Learning Research 🎛

Google’s model card for their face detection model. (Google)

Google’s model card for their face detection model. (Google)

Datasheets for Datasets and Model Cards for Model Reporting . These two papers aim to improve transparency and accountability in machine learning models and the datasets that were used to create them.

From the abstract of the first paper by Gebru et al. (2018):

The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on.

The paper goes on to provide a set of questions and a workflow to properly think through and document each of these aspects of a dataset in a dataseheet. It also has example datasheets for two standard datasets: Labeled Faces in the Wild and the Movie Review Data.

From the abstract of the second paper by Mitchell et al. (2019):

Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information.

This is essentially the same principle, but now applied to a trained model instead of a dataset. The paper also includes details on how to fill in each part of a model card, as well as two examples: a smile detection model and a text toxicity classifier. I’ve also seen some model cards in the wild recently: Google has them for their face detection and object detection APIs and OpenAI has one for their GPT-2 language model (but not yet for GPT-3, as far as I can tell).

I’m excited to try creating a dataset datasheet and a model card at work—which also makes me think: practicing making these should really have been part of my AI degree. I’ve also added both papers to my machine learning resources list.

Quick ML research + resource links 🎛 (see all 65 resources)

Artificial Intelligence for the Climate Crisis 🌍

“Sample fields (color coded with their crop class) overlayed on Google basemap from Western Kenya.” (Radiant Earth)

“Sample fields (color coded with their crop class) overlayed on Google basemap from Western Kenya.” (Radiant Earth)

The Radiant Earth Foundation announced the winners of their Crop Detection in Africa challenge . The competition was hosted on Zindi, a platform that connects African data scientists to organizations with “the world’s most pressing challenges”—similar to Kaggle. Detecting crops from satellite imagery comes with extra challenges in Africa due to limited training data and the small size of farms.

A total of 440 data scientists across the world participated in building a machine learning model for classifying crop types in farms across Western Kenya using training data hosted on Radiant MLHub. The training data contained crop types for a total of more than 4,000 fields (3,286 in the training and 1,402 in the testing datasets). Seven different crop classes were included in the dataset, including: 1) Maize, 2) Cassava, 3) Common Bean, 4) Maize & Common Bean (intercropping), 5) Maize & Cassava (intercropping), 6) Maize & Soybean (intercropping), 7) Cassava & Common Bean (intercropping). Two major challenges with this dataset were class imbalance and the intercropping classes that are a common pattern in smallholder farms in Africa.

As climate change will make farming more difficult in many regions across the world, this type of work is vital for protecting food production capacities. Knowing what is being planted where is an important first step in this process. Last year I covered the AI Sowing App from India (DT #20), another climate resilience project that helps farmers decide when to plant which crop using weather and climate data; better data on crop types and locations can certainly help initiatives like that as well.

Quick climate AI links 🌍

Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.

If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. ✊🏾