#66: Google's controversial dermatology app, Twitter's AI feature removal, and a Dropbox image search deep-dive
Hey everyone, welcome to Dynamically Typed #66. I’ve got three productized AI links and two machine learning research links for you today. For the former, I wrote about Google’s new dermatology assist tool and its potential problems; Dropbox’s image search technical deep dive; and Twitter’s decision to remove an AI-powered feature from its app. For the latter, I linked a thread of cool visual dataset aggregators, and Google’s new tool for finding biases in ML datasets.
Productized Artificial Intelligence 🔌
- 🩺 Google previewed its AI-powered dermatology assist tool at I/O, its yearly developer conference. Integrated with Search, the app guides you through taking photos of your skin at different angles, and then uses a deep learning model published in Nature Medicine to potentially detect one of 288 skin conditions. (See how it works in this GIF.) The tool is explicitly not intended to provide a diagnosis or as a substitute to medical advice. Although this theoretically sounds incredible — internet-scale access to early-stage detection of e.g. skin cancer could be an amazing global DALY booster — experts have raised some serious concerns. Google Ethical AI researcher Dr. Alex Hanna, Stanford Medicine dermatologist Roxanna Daneshjou MD/PhD and Vice journalist Todd Feathers have pointed out that, although Google claims to have tested the app across all demographics, it has not sufficiently tested it across all (Fitzpatrick) skin types: the darkest V and VI types — where skin conditions are already misdiagnosed relatively often — were severely underrepresented in the dataset. The app isn’t live yet, and Google Health spokesperson Johnny Luu told Vice that the dataset has been expanded since the Nature paper was published, but this issue must be properly addressed before the app can responsibly be launched. I’d be disappointed to see it go live without at the very least a Datasheet and a Model Card explaining its limitations.
- 🔦 Thomas Verg wrote about How image search works at Dropbox for the company’s blog. Their algorithm uses a combination of image classification to extract relevant ImageNet-style labels from photos (like “beach” or “hotdog”), and word vectors to match non-exact search terms to those labels (e.g. “shore” or “sandwich”). The rest of the post goes into quite some depth on the production architecture and scalability optimizations in the algorithm’s deployment. Always nice to see these technical deep dives on AI-powered features from product companies!
- 🐦 A bit different from usual on DT: the following is a good example of removing an AI-powered feature from a product. Late last year, Twitter users began to notice that the app’s photo cropping algorithm (which decides what portion of an image to show as preview in the timeline) seemed to favor white faces over Black faces. The simple saliency algorithm doesn’t look for faces specifically but rather tries to predict what part of an image a user would look at first, and no one thought to check it for this bias. Twitter has now solved the problem by no longer cropping images at all, instead displaying standard aspect ratio images in full (which I think is better anyway.) Director of Software Engineering Rumman Chowdhury wrote an excellent blog post about how the company handled this issue, including details of its own (open-source) study that confirmed the algorithm’s biases. “One of our conclusions is that not everything on Twitter is a good candidate for an algorithm, and in this case, how to crop an image is a decision best made by people.”
Machine Learning Research 🎛
- ⚡️ Google launched Know Your Data, a new tool that “helps researchers, engineers, product teams, and decision makers understand datasets with the goal of improving data quality, and helping mitigate fairness and bias issues.” It includes 70+ existing image datasets for which the tool can find corrupted data, sensitive subjects, coverage gaps, and balance problems. This looks like a solid technical step towards more equitable and reliable machine learning.
- ⚡️ In response to the announcement that NeurIPS 2021 will have a datasets track (cool!), Cyril Diagne wrote a Twitter thread covering some of his favorite sources of publicly available visual datasets, including Kaggle (646 computer vision datasets), Visual Data (527 datasets) and Bifrost (1900 datasets). A great source of project inspiration!
I’ve also collected all 80+ ML research tools previously featured in Dynamically Typed on a Notion page for quick reference. ⚡️
Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.
If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. 🌧