#41: Black Lives Matter: highlighting ML/AI products, research and climate projects by Black creators
Like many around the world, I watched in horror last week as George Floyd was murdered by a police officer while three other officers stood by and watched, the latest in far too long a list of such incidents. It was tragic to see the massive peaceful #BlackLivesMatter marches that followed be marred by unprovoked police violence against protestors and journalists and by opportunist looters. I’ve since read a lot about how I can personally best contribute to the #BlackLivesMatter movement. I think that’s within the AI/ML community—my community, our community—in two ways.
First, I’ve donated $50 to Black in AI, an organization that works to “increase the presence of Black individuals in the field of AI.” This money goes towards supporting their BAI Workshops at major ML conferences like NeurIPS. If you’re looking for a way to support the #BLM movement within our community, Black in AI’s donation button is a great place to start—three readers have already let me know that they’re matching my donation, and I’d love to see that number increase.
Second, I’m covering the same sections as usual in this edition of Dynamically Typed; but today, each section will highlight recent work only from Black creators. I found much of this content thanks to the great work done by BAI, whose twitter feed I’ll continue to use as a source for content to cover in future editions of DT as well.
With that said, let’s dive into today’s newsletter.
Productized Artificial Intelligence 🔌
Fireflies.ai turns meetings into notes.
Fireflies.ai records and transcribes meetings, and automatically turns them into searchable, collaborative notes. The startup’s virtual assistant, adorably named Fred, hooks into Google Calendar so that it can automatically join an organization’s Zoom, Meet or Skype calls. As it listens in, it extracts useful notes and information which it can forward to appropriate people in the organization through integrations like Slack and Salesforce. Zach Winn for MIT News:
“[Fred] is giving you perfect memory,” says [Sam] Udotong, who serves as Firelies’ chief technology officer. “The dream is for everyone to have perfect recall and make all their decisions based on the right information. So being able to search back to exact points in conversation and remember that is powerful. People have told us it makes them look smarter in front of clients.”
As someone who externalizes almost everything I need to remember into an (arguably overly) elaborate system of notes, calendars and to-do apps, I almost feel like this pitch is aimed directly at me. I haven’t had a chance to try it out yet, but I’m hoping to give it a shot on my next lunchclub.ai call (if my match is up for it, of course).
Fireflies is not alone, though. It looks like this is becoming an competitive space in productized AI, with Descript (DT #18, #24), Microsoft’s Project Denmark (#23), and Otter.ai (#40) all currently working on AI-enabled smart transcription and editing of long-form audio data. Exciting times!
Quick productized AI links 🔌
- 🧫 BenchSci helps life science companies reduce failed experiments by curating reagent catalogs and experiments from the literature, decoding them using ML models, and wrapping the resulting data in an easy-to-use interface for researchers. This is the classic productized AI model of (1) automating graduate-student-level work, (2) applying it across the corpus of literature in some niche, and then (3) selling access to the extracted info as a service. I’m personally a big fan of this model and think it has the potential to make many industries more efficient; VCs seem to agree, since BenchSci recently raised a $22 million round of funding.
- 👩🏾💻 Andrea Lewis Åkerman interviewed Tiffany Deng, Tulsee Doshi and Timnit Gebru on their work at Google to make the company’s AI products more inclusive. “Why is it that some products and services work better for some than others, and why isn’t everyone represented around the table when a decision is being made?” They emphasize the importance of tooling and resources, the difficulty of even defining fairness, and the necessity of diversity in both data and teams. I found their journeys toward their positions at Google—each noticing inequalities in tech and wanting to help fix them—especially eye-opening.
- ⚡️ DeepQuest’s DeepStack AI Servers offer a different twist on machine learning APIs: instead of just being available as endpoints in the cloud (like Google’s, Microsoft’s and Amazon’s ML APIs), DeepStack’s servers and pretrained models can be installed as Docker containers. This way it combines the ease-of-use of cloud APIs with the data privacy of self-hosting—a cool idea I hadn’t heard of before.
Machine Learning Research 🎛
Google’s model card for their face detection model. (Google)
Datasheets for Datasets and Model Cards for Model Reporting . These two papers aim to improve transparency and accountability in machine learning models and the datasets that were used to create them.
From the abstract of the first paper by Gebru et al. (2018):
The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on.
The paper goes on to provide a set of questions and a workflow to properly think through and document each of these aspects of a dataset in a dataseheet. It also has example datasheets for two standard datasets: Labeled Faces in the Wild and the Movie Review Data.
From the abstract of the second paper by Mitchell et al. (2019):
Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information.
This is essentially the same principle, but now applied to a trained model instead of a dataset. The paper also includes details on how to fill in each part of a model card, as well as two examples: a smile detection model and a text toxicity classifier. I’ve also seen some model cards in the wild recently: Google has them for their face detection and object detection APIs and OpenAI has one for their GPT-2 language model (but not yet for GPT-3, as far as I can tell).
I’m excited to try creating a dataset datasheet and a model card at work—which also makes me think: practicing making these should really have been part of my AI degree. I’ve also added both papers to my machine learning resources list.
Quick ML research + resource links 🎛 (see all 65 resources)
- 🗃 Related: Jo and Gebru (2019) point out that many AI fairness problems are rooted in the data collection and annotation process, and offer “five key approaches in document collection practices in archives that can inform data collection in sociocultural ML.” These can be summarized as consent , inclusivity , power , transparency , and ethics & privacy, with details in Table 1 of their paper: Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning.
- 💱 “Oun yìn wàn nouwé” means “I love you” in Fon, an African language spoken by approximately two million people across Benin, Nigeria and Togo. Aiming to translate texts from his mother, Bonaventure Dossou worked with Chris Emezue to scrape data from a Jehovah Witness Bible and create a basic Fon to French machine translation model. Since the language is “mostly spoken and rarely documented,” this is a low-resource neural machine translation problem, which presents a number of additional challenges. (See also Sennrich and Zhang (2019), by my former NLP/NMT professor in Edinburgh.)
- 📲 Free online talk + meetup on June 18th from the PyData Boston group: Causal Modeling in Machine Learning by AI research engineer Robert Osazuwa Ness.
Artificial Intelligence for the Climate Crisis 🌍
“Sample fields (color coded with their crop class) overlayed on Google basemap from Western Kenya.” (Radiant Earth)
The Radiant Earth Foundation announced the winners of their Crop Detection in Africa challenge . The competition was hosted on Zindi, a platform that connects African data scientists to organizations with “the world’s most pressing challenges”—similar to Kaggle. Detecting crops from satellite imagery comes with extra challenges in Africa due to limited training data and the small size of farms.
A total of 440 data scientists across the world participated in building a machine learning model for classifying crop types in farms across Western Kenya using training data hosted on Radiant MLHub. The training data contained crop types for a total of more than 4,000 fields (3,286 in the training and 1,402 in the testing datasets). Seven different crop classes were included in the dataset, including: 1) Maize, 2) Cassava, 3) Common Bean, 4) Maize & Common Bean (intercropping), 5) Maize & Cassava (intercropping), 6) Maize & Soybean (intercropping), 7) Cassava & Common Bean (intercropping). Two major challenges with this dataset were class imbalance and the intercropping classes that are a common pattern in smallholder farms in Africa.
As climate change will make farming more difficult in many regions across the world, this type of work is vital for protecting food production capacities. Knowing what is being planted where is an important first step in this process. Last year I covered the AI Sowing App from India (DT #20), another climate resilience project that helps farmers decide when to plant which crop using weather and climate data; better data on crop types and locations can certainly help initiatives like that as well.
Quick climate AI links 🌍
- Omdena, another platform “where AI engineers and domain experts collaborate to build solutions to real-world problems,” hosted a competition together with the UN Refugee Agency (UNHCR) to predict forced displacement, violent conflicts, and climate change in Somalia. The resulting models can be used to help optimize the allocation of resources. The competition has finished and the result are available here.
Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.
If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. ✊🏾