Productized AI links
Last September, I wrote that autonomous trucks will be the first big self-driving market.
A detailed new report by Deloitteâs Rasheq Zarif now came to the same conclusion: Autonomous trucks lead the way.
âDriverless trucks are already heading out to the highway, as shipping companies increasingly look to autonomous technology to meet rising demand for goods.
The focus now: determining the best way to hand off trailers from machine to human.â In related news, self-driving car company Waymo, which has been developing autonomous heavy-duty trucks since 2017, invited a few journalists along for a (virtual) test ride.
Exciting few years ahead here.
After we saw GPT-3 â OpenAIâs gargantuan language model that doesnât need finetuning â used for lots of cool demos, the modelâs API now powers 300+ apps and outputs an average of 4.5 billion (!) words per day.
OpenAI published a blog post describing some of these apps, including Viable, which summarizes and answers questions about survey responses, and Agolia, a website plugin for semantically searching through content.
Cool stuff!
As the OpenAI API scales up to power more products, though, one thing to keep a close eye on will be how often it outputs problematic responses in production systems.
Abid et al.
(2021) have shown that GPT-3 has a persistent anti-Muslim bias, and TNWâs Tristan Greene got a GPT-3-powered chatbot to spit out racist and anti-LGBT slurs.
The OpenAI API runs a content filter on top of the raw GPT-3 model to prevent such responses from reaching end-users (which is pretty strict in my experience: when I was playing around with the beta, I couldnât get it to say bad things without labeling them as potentially problematic) but no filter is ever perfect.
Weâll see what happens in the coming few years, but I do expect that the good and useful products will outweigh the occasional bad response.
This was making the rounds on Twitter: online genealogy platform MyHeritage launched a tool called Deep Nostalgia that animates faces in old family photos.
According to the companyâs blog, it was used to animate over 10 million faces in its first week of being live.
As with many visual deep-learning-powered features, there is a free version with watermarks, as well as a premium version as part of a paid MyHeritage subscription.
The model behind Deep Nostalgia is licensed from D-ID, a startup that makes live portrait, talking heads, and video anonymization products.
OpenAI CEO Sam Altman wrote Mooreâs Law for Everything, an essay in which he discusses economic implications of the exponential rate at which AI is improving.
As AI replaces more labor and makes goods and services cheaper, he argues that we must shift the focus of taxation away from income and toward capital, to prevent extreme inequality from destabilizing democracies.
See his full essay for details of the (US-specific) implementation, in the form of an American Equity Fund and broad land taxes.
This reminds me of a discussion we had in my undergrad CS ethics class on âtaxing robotsâ because they replace labor (and taxable income with it).
At the time, I argued against this idea because it seems impossible to implement in any sane way â should we tax email (which is free!) because there are no more telegram operator jobs left?
Altmanâs proposal is a different solution to this same problem, and a pretty interesting one at that â right up there with a Universal Basic Income (UBI).
Cade Metz and Kashmir Hill at The New York Times wrote about how old Flickr photos became a part of facial recognition datasets.
The story centers around Exposing.AI, a tool that can show you whether your face is a featured in any popular facial recognition datasets like VGG Face, MegaFace and FaceScrub, based on your Flickr username or a photo URL.
Beyond that, itâs a good read that goes into how, five to ten years ago when AI was not yet very influential, commercial and university labs were building lots of different facial recognition datasets and, in the spirit of open science, sharing them publicly on the internet.
Only now that itâs becoming clear that facial recognition systems are biased â as I covered last summer in Is it enough for only big tech to pull out of facial recognition? and Facial recognition false arrest â some of these datasets are being taken offline.
But these systems exist now, and taking down the datasets wonât stop them from being used; only regulation will.
Apparently, Googleâs Pixel phones can detect car crashes.
This was making the rounds on Twitter after a Reddit user wrote on r/GooglePixel that car crash detection saved them from hours of suffering because they had an accident on their own property, where no one would otherwise have found them for a long time.
When the phone detects a crash, it calls local emergency services and says, âYou are being contacted by an automated emergency voice service on behalf of a caller.
The callerâs phone detected a possible car crash, and they were unresponsive.
Please send help,â followed by the phoneâs latest location.
Pretty amazing stuff, thatâs being built into more and more products â Apple Watches have a similar fall detection feature.
Dave Burke, Googleâs VP of engineering for Android, noticed the story and tweeted a photo of the setup they used to train the ML model powering this feature.
Worth a click.
Google is adding camera-based vitals measurement to its Fit app on Android.
Initially rolling out to Pixel phones, the new feature can measure your respiratory (breathing) rate by looking at your face and upper torso through the selfie camera â something that, judging from a cursory Scholar search, was only becoming a mainstream research topic just two years ago!
The rate at which computer vision research makes it from an idea to deployment on millions of phones remains pretty astonishing.
The Fit app can also read your heart rate when you place your finger on the back-facing camera, though I donât think this is as new: Iâve used iPhone apps that did this years ago â but one big difference is that Google has actually done clinical studies to validate these features.
Jingwen Lu, Jidong Long and Rangan Majumder wrote a blog post about Speller100, Microsoftâs zero-shot spelling correction models that now collectively work across 100+ languages.
Speller100 is currently live in production as part of Microsoftâs Bing search engine, where it corrects typos in search queries â itâs what powers the âdid you meanâŠâ prompt.
Although this feature has been around for English-language search queries for a very long time, Speller100 newly enables it for a whole host of smaller languages.
Itâs also an interesting case study of how an AI-powered refinement step of user input can significantly improve a productâs overall experience.
By A/B testing Speller100 against not having spelling correction, the researchers found that it reduced the number of pages with no results by 30%, and manual query reformatting by 5%; and that it increased the number of clicks on spelling suggestions by 67%, and clicks on any item on the page by 70%.
Win Suen wrote about a machine learning system running in production at Dropbox that decides for which files previews should be rendered: Cannes: How ML saves us $1.7M a year on document previews.
She goes into two design considerations for building a highly performant AI system: the cost-benefit tradeoff of ML-powered infrastructure savings (rendering fewer previews to save compute vs.
hurting user experience by not having previews) and the model complexity tradeoff (prediction accuracy vs.
interpretability and cost of deployment).
The final model is a gradient-boosted classifier that can âpredict previews up to 60 days after time of pre-warm with >70% accuracy.â
Facebook has launched a significantly improved version of its automatic alternative text (AAT) feature, which helps blind or visually impaired people understand the contents of images in their Facebook feed.
As explained in Facebookâs tech blog post, this new version of AAT can recognize over 1,200 distinct concepts.
Interestingly, the model was trained on weakly supervised data, using the hashtags on billions of public Instagram images as labels.
So if youâve ever posted a picture of your latte and tagged it #latte on Instagram, you may have had a tiny impact on this feature.
The blog post also details the user research that went into improving AAT â something I think we usually donât hear enough about (or do enough of!) in productized AI â so make sure to give it a read.
(I wish I could credit the person who wrote this post, but sadly Facebook keeps these posts anonymous, which seems a bit out of character for the company.)
Naveen Arivazhagan and Colin Cherry wrote a post for the Google AI Blog about how they solved a problem with the live speech translation feature in Google Translate: translations would frequently get updated as more of the transcribed text became available, which users found distracting.
Itâs a cool glimpse into all the stuff besides just model accuracy and speed that are important to get right for a successful AI-powered product, and into how engineers think about turning these nonfunctional requirements into measurable performance metics they can optimize for.
Creative Commons photo sharing site Unsplash (where I also have a profile!) has launched a new feature: Visual Search, similar to Googleâs search by image.
If youâve found a photo youâd like to include in a blog post or presentation, for example, but the image is copyrighted, this new Unsplash feature will help you find similar-looking ones that are free to use.
The launch post doesnât go into detail about how Visual Search works, but Iâm guessing some (convolutional) classification model extracts features from all images on Unsplash to create a high-dimensional embedding; the same happens to the image you upload, and the site can then serve you photos that are close together in this embedding space.
(Hereâs an example of how youâd build that in Keras.)
Business Insiderâs Mathias Döpfner did a long new interview with Elon Musk.
It covers a lot, and most of it isnât too relevant to DT, but this Musk quote is: âIâm extremely confident that Tesla will have level five [self driving] next year, extremely confident, 100%.â Yes, this definitely isnât the first time Musk has claimed full self driving is just around the corner, but my slightly contrarian take (from a few months ago) is that I actually do think Tesla will get to a useful level of self-driving â deployed at scale in consumer cars â first.
Their big bet years ago that vision (without LIDAR) is enough for autonomy has enabled them to be years ahead of the competition with their dataset.
Theyâve harnessed their fleet of Telsas on real roads for very clever sampling, feedback loops (ghost mode), and regression testing; Andrej Karpath, (Teslaâs head of AI, had a really great talk on all this in April last year.
Another episode in the saga of deepfakes, videos that make real people look like theyâre saying or doing things they never said or did.
In the fall of 2019, Facebook, Microsoft, and Google created datasets and challenges to automatically detect deepfakes (see DT #23); in October 2020, Microsoft then launched their Video Authenticator deepfake detection app (#48).
Now, just a few months later, Neekhara et al.
(2020) present an adversarial deepfake model that handily beats those detectors: âWe perform our evaluations on the winning entries of the DeepFake Detection Challenge (DFDC) and demonstrate that they can be easily bypassed in a practical attack scenario.â And the carousel goes âround.
Recorder app for Android, which uses on-device AI to transcribe recordings (see DT #25, #31), now has a new ML-powered feature: Smart Scrolling.
The feature âautomatically marks important sections in the transcript, chooses the most representative keywords from each section, and then surfaces those keywords on the vertical scrollbar, like chapter headings.â This all happens on-device.
How long until it also writes concise summaries of your hour-long recordings?
Runway ML, the âapp storeâ of easy-to-use machine learning models for creators (see DT #18), added a new Green Screen feature, which it says is â[the] first real-time web tool for cutting objects out of videos.
Using machine learning, it makes rotoscoping (a.k.a.
masking) a lot faster and a lot less painful.â It looks very cool, but take their claim of being first with a grain of salt: Kaleido, the folks behind DT-favorite remove.bg, also launched an ML-powered automatic video background removal tool called unscreen earlier this year (#35).
However, for Runway ML, Green Screen represents yet another well-integrated feature for their already extensive AI creativity product, which is not something unscreen can match as a single-use tool.
Along with Photoshopâs new AI features (#51), this is yet another example of how quickly deep learning vision models are becoming easy to use for everyone.
Apple has forked TensorFlow 2 to optimize it for their new crazy-fast M1 Macs!
This came as a pretty big surprise, and it makes the new M1 Macs even more attractive to ML developers: for the first time, thisâll enable using the internal GPU to train TensorFlow models on Mac laptops, leadings to ~5x speedups (!) compared to the previous generation.
Iâll probably hold until for the next generation â by which time Appleâs optimizations should also be upstreamed to the main TensorFlow branch instead of only being available on their own fork â but itâs clear that even now these laptops are already huge game changers.
Nathan Benaich and Ian Hogarthâs 2020 State of AI report came out.
It covers research, talent, industry, and politics, and is once again full of great in-depth data and analysis.
It touches on many of the trends Iâve covered in DT this year, including gargantuan (Transformer) language models, productized NLP, reproducibility, accelerator chips, and self-driving progress.
A few topics that are a bit outside DTâs scope but that are very interesting nonetheless include their assertion that biology is experiencing its âAI momentâ, their analysis of talent education and flow, and their summary of geopolitical trends surrounding AI hardware and software companies.
(Google Slides link; an executive summary is on slide 7.)
Duplex, Googleâs âAI technology that uses natural conversation to get things done,â was first launched at the companyâs 2018 I/O conference as a way to automatically make phone calls for reservations at restaurants or rental car services (see DT #13).
Itâs now being used in a lot more places, from calling businesses listed on Google Maps to automatically confirm their opening times, to screening potential spam phone calls.
Personally Iâd feel a little rude having Duplex make a reservation for me, but I think the use case of double-checking opening times is very useful â especially now, during the pandemic â since that single automated call can prevent a lot of people from showing up to closed doors if opening times are wrong on Google Maps.
Lobe, a web app to visually edit high-level compute graphs of machine learning models and train them, has (re)launched as a Microsoft product.
âJust show it examples of what you want it to learn, and it automatically trains a custom machine learning model that can be shipped in your app.â The siteâs UI looks super slick, and it can export models to TensorFlow (1.15, not 2.x), TFLite, ONNX, CoreML and more.
Iâd be very interested to find out what kind of optimizations it applies for the mobile and edge deployment targets â anything on top of the standard TFLite conversion, for example?
Cyril Diagneâs AR cut & paste demo (#39) is now an app: ClipDrop lets you take photos of objects on your phone, uses a background removal model to cut them out, and then lets you paste them onto your laptop screen in augmented reality.
Iâve tried it on a few objects I had laying around my apartment, and capturing objects (the “clipâ bit) works super reliably; sending the photo to my laptop (the âdropâ bit) was a bit less robust.
Descript has launched their new video editor.
This is another DT-favorite: Descript originally built an app that lets you edit the transcribed text of an audio file and reflects those changes back into the audio (see DT #18), followed by a version of the product optimized for podcast editing (#24).
The newest release turns the app into a fully-fledged video editor, including support for Descriptâs core transcript-based editing feature: it can delete sections, auto-remove âuhm"s, and even generate new audio (in the speakerâs voice!) for small corrections.
And it comes with a great launch video (by Sandwich, of course).
Long (technical) deep-dive from Google on their lessons learned in a decade of software engineering for machine learning systems: Towards ML Engineering: A Brief History Of TensorFlow Extended (TFX).
A recurring theme for those of you that have been reading DT for a while: âWe also recommend that before focusing on cutting-edge ML modeling techniques, product leaders should invest more time in adopting interoperable ML platforms for their organizations.â
Amsterdam (where I live!) and Helsinki (where I donât live) have launched their âAI algorithm registries.â These are actually a pretty cool idea: whenever a municipalities âutilizes algorithmic systems as part of [their] city services,â these systems must be cataloged in the cityâs algorithm registry.
Amsterdamâs registry currently has three entries: (1) license plate-recognizing automated parking control cars, (2) a pilot for algorithm-assisted fraud surveillance for holiday home rentals, and (3) a natural language processing system for categorizing reports of trash in public space.
These registries may become a good source of productized AI links for me, but more importantly, this is a great step for building transparency, trust and accountability into these systems.
Your weekly reminder that anyone who tries to sell you a facial recognition system without any age, gender, racial or accessory biases, probably does not actually have such a system to sell to you.
From the 1800 submissions to the FairFace Challenge at ECCV 2020, Sixta et al. (2020) found that: â[the] top-10 teams [show] higher false positive rates (and lower false negative rates) for females with dark skin tone as well as the potential of eyeglasses and young age to increase the false positive rates too.â
I really hope that everyone deploying these systems widely is aware of this and the potential consequences.
Facebook is increasingly talking publicly about the work it does to keep its platform safe, probably at least partially in response to the constant stream of news about its failures in this area (from Myanmar to Plandemic).
This does mean we get to learn a lot about the systems that Facebook AI Research (FAIR) is building to stop viral hoaxes before they spread too widely.
Examples include the recent inside look into their AI Red Team (DT #47); their Web-Enabled Simulations (WES, #38) and Tempotal Interaction Embeddings (TIES, #34) for detecting bots on Facebook; and their DeepFake detection dataset (#23).
Now, Halevy et al. (2020) have published an extensive survey on their work preserving integrity in online social networks, in which they âhighlight the techniques that have been proven useful in practice and that deserve additional attention from the academic community.â It covers many of the aforementioned topics, plus a lot more.
New must-read essay if youâre at an AI startup, by Martin Casado and Matt Bornstein at Andreesen Horowitz: Taming the Tail: Adventures in Improving AI Economics.
âWe share some of the lessons, best practices, and earned secrets we learned through formal and informal conversations with dozens of leading machine learning teams.
For the most part, these are their words â not ours.â I couldnât write a more convincing pitch for the post than that, so I didnât try.
Not quite productized yet, but an example of work Iâm seeing more and more of on new arXiv uploads lately: Deep Atrous Guided Filter for Image Restoration in Under Display Cameras â using AI to make photos taken through phone screens look decent.
The recent activity was probably due to the RLQ ECCV challenge in August, but itâs making me wonder if in-display selfie cameras will go mainstream in the next year or two.
(Xiaomi is already hyping it up.)
âRelated: Caldwell et al. wrote a paper on AI-enabled future crime for Crime Science, a journal associated with University College London.
They think the highest-risk possibilities are: audio/video impersonation (e.g.
deepfakes, again see DT #23), driverless vehicles as weapons, tailored phishing, disrupting AI-controlled systems (like the Facebook stuff above), large-scale blackmail, and AI-authored fake news.
Burglar bots rank as low-risk and killer robots rank as medium-riskâpersonally Iâd rank killer drones (bad title, good 7-minute sci-fi) above those two.
ïžThere has always been a cat-and-mouse game between ever-updating automated content filters and users who think of clever new ways to circumvent them: from email spam filters decades ago to blockers for explicit, violent or fake viral content on social media today.
A new filter evasion trick falls through the cracks every once in a while, becomes popular and widely used, and is then eventually added to the automated filters.
Depending on the severity of the bypass, this process sometimes has to be completed in mere hours or days.
In light of, well, the state of the world, the stakes here are obviously very highâI donât envy the pressure these ML teams must be under.
Tom Simonite at Wired wrote a feature on Facebookâs internal AI Red Team, which is the companyâs response to this problem.
The team tries to hack the companyâs own AI-powered filtering systems before users do, to always stay one step ahead of them.
Itâs a good read that covers the companyâs ârisk-a-thonsâ, their deepfakes detection challenge (DT #23), automated testing, and much more.
Voyage has put up a detailed blog post announcing the G3, its next-generation robotaxi aimed at senior citizens.
Although the company is not quite as far along as Waymo, which has had customers riding their driverless taxis for over a year now, Voyageâs service should be live in San Jose, California, next year.
Iâve been following this company for a while now and I thought I had featured them on DT at least once before, but my archive appears to disagree with me there.
To rectify that, here are some more high-quality technical blog posts from Voyage that Iâve read but never get around to covering: one on their automatic emergency braking system, one on their active learning data curation, and one on their Telessist remote operations solution.
Samuel Axon wrote an in-depth feature on machine learning at Apple for Ars Technica, with input from two executives at the company: John Giannandrea (SVP for ML and AI Strategy) and Bob Borchers (VP of Product Marketing).
From handwriting recognition to battery charging optimization, AI hasâSoftware 2.0-styleâsteadily been eating its way into more and more of the iOS software stack, far beyond just powering the obvious things like Siri speech recognition and camera roll semantic search.
Of course, Giannandrea and Borchers also talk a lot about Appleâs focus on on-device ML and their âneural engineâ accelerator chips.
Itâs a long article, but a must-read if youâre into productized AI.
From the folks behind DT-favorite remove.bg (DT #3, #5, #12, #16), automatic video background removal tool Unscreen (#35) now has a Pro version that supports full HD, MP4 export, and unlimited-length videos.
Itâs web-only for now but an API is in the works.
Iâve really enjoyed following this teamâs progress over the past almost two years, and itâs great to see theyâre continuing to execute to successfully.
Interesting case study from Amnesty International on using automated satellite image classification for human rights research in war zones: tens of thousands of volunteers labeled 2.6 million image tiles of the Darfur region in western Sudan as empty , human presence, or destroyed, which Amnesty and Element AI researchers used to train a model that could predict these labels at 85% precision and 81% recall on the test set.
This model then allowed them to visualize and analyze different waves of destruction in the zone over time.
The full case study is well worth a read: it includes detailed notes on the ethical tradeoffs they considered before starting the project âa contrast with the ethics sections in many recent ML papers that read like checkbox afterthoughts.
âRelated: Fawkes is a new algorithm by Shan et al.
that makes imperceptible changes to portrait photos to fool facial recognition: â cloaked images will teach the model a highly distorted version of what makes you look like you.â The University of Chicago researchers wrapped Fawkes into a Windows and macOS app, and they claim that itâs 100% effective against the state-of-the-art models powering commercially available facial recognition APIs.
As my friends who study computer security tell me, though, this is always a cat-and-mouse game: at some point, someone will figure out how to make a facial recognition model thatâs robust against Fawkes; and then someone else will make a Fawkes 2.0 thatâs robust against that; and thenâŠ
But, at least for a while, running your photos through Fawkes should make them unrecognizable to most facial recognition models out there.
Probably.
Google open-sourced Seq2act, a new model that translates natural-language instructions (âHow do I turn on Dutch subtitles on YouTube?â) into mobile UI action sequences (tap the video; tap the settings button; tap closed captions; select Dutch from the list).
This isnât quite productized yet, but who wants to bet that the next major version of Android will allow you to say âOK Google, turn on Dutch subtitlesâ in the YouTube appâas well as millions of other commands in other appsâand that the phone will just tap the right buttons in the background and do it for you?
This is the stuff that makes me jealous as an iPhone user.
Update on facial recognition in the United States, which big tech recently pulled out of (see DT #42), and which startups then doubled down on (#43): a group of senators has now proposed legislation to block use of the facial recognition technology by law enforcement.
Good!
As we feared following the news that IBM, Microsoft, and Amazon are no longer selling facial recognition technology to police departments in the United States (see DT #42), companies that arenât tied to large consumer-facing brandsâand that arenât under the level of scrutiny that comes with being a household nameâare now doubling down on the space.
The only real solution to this problem is regulation.
In related news, a Michigan man was arrested because a facial recognition algorithm misidentified him.
This is the first time a facial-recognition-induced wrongful arrest has been reported, which actually slightly surprises me because the technology has been rolled out much more widely in China (although cases like this may not make the news there).
Whatâs less surprising is that this first case happened to a Black man, given that commercial facial recognition algorithms have been shown to make more mistakes on people with darker skin (see DT #41).
Android 11 includes much-improved voice access, where instead of having to say the number next to the part of the screen you want to click, you can just say what youâre trying to do, and the phone is pretty good at understanding your intention.
Check out Dieter Bohnâs demo video on twitter.
Background sound removal in Google Meet got improved significantly: G Suite director of product management Serge Lachapelle made a demo video showing it successfully muting all sorts of annoying meeting noisesâwhile preserving his talking at the same time.
Reminds me of Krisp.ai (DT #16).
Isaac Caswell and Bowen Liang summarized recent advances in Google Translate, linking out to papers and other posts describing each change in depth.
These changes over the past year have together resulted in an average improvement of +5 BLEU across the 100+ languages now supported by Translate (see this fun gif), with low-resource languages improving by an additional +2 BLEU on average.
remove.bg, the service that automatically removes backgrounds from images, has released an update that significantly improves the quality of their cutouts.
It includes better hair handling, edge color correction, and multi-object scenes.
This is Software 2.0 in action: the same APIs are now powered by better models, providing better results for users who donât have to change their workflow.
BenchSci helps life science companies reduce failed experiments by curating reagent catalogs and experiments from the literature, decoding them using ML models, and wrapping the resulting data in an easy-to-use interface for researchers.
This is the classic productized AI model of (1) automating graduate-student-level work, (2) applying it across the corpus of literature in some niche, and then (3) selling access to the extracted info as a service.
Iâm personally a big fan of this model and think it has the potential to make many industries more efficient; VCs seem to agree, since BenchSci recently raised a $22 million round of funding.
ïžDeepQuestâs DeepStack AI Servers offer a different twist on machine learning APIs: instead of just being available as endpoints in the cloud (like Googleâs, Microsoftâs and Amazonâs ML APIs), DeepStackâs servers and pretrained models can be installed as Docker containers.
This way it combines the ease-of-use of cloud APIs with the data privacy of self-hostingâa cool idea I hadnât heard of before.
Andrea Lewis Ă
kerman interviewed Tiffany Deng, Tulsee Doshi and Timnit Gebru on their work at Google to make the companyâs AI products more inclusive.
âWhy is it that some products and services work better for some than others, and why isnât everyone represented around the table when a decision is being made?â They emphasize the importance of tooling and resources, the difficulty of even defining fairness, and the necessity of diversity in both data and teams.
I found their journeys toward their positions at Googleâeach noticing inequalities in tech and wanting to help fix themâespecially eye-opening.
Related: Facebookâs 3D photos feature now simulates depth for any image, using techniques very similar to what the iPhone SE is doing.
Ben Sandofsky of iOS camera app Halide wrote a deep dive on how the new iPhone SE, which has only one rear-facing camera, uses single-image monocular depth estimation to do fake background blur in portrait mode photos.
I have some experience with this exact computer vision task, and the results achieved here by Appleâon-device!âlook very impressive to me.
Otter.ai auto-generates ârich notes for meetings, interviews, lectures, and other important conversations.â This looks like a fun product, and apparently itâs being integrated into Zoom.
(Unrelated: the otter emoji may be the purest emoji in existence.)
Google Lens now lets you copy text from handwritten notes by pointing your phone at them.
Emma Beede conducted a user study on how nurses in Thailand are using Googleâs AI screening tool to help diagnose diabetic retinopathy.
â[The] study found that the AI system could empower nurses to confidently and immediately identify a positive screening, resulting in quicker referrals to an ophthalmologist.â Beede emphasizes, though, that itâs important to engage with clinicians and patients before widely deploying such systems, to ensure it doesnât inadvertently hinder diagnosis.
Writing for Ars Technica, Timothy B.
Lee shared his experience of getting a burger delivered by a robot.
Part-self-driving and part-piloted, these box-on-wheels sidewalk robots by startups like Starship and Kiwibot are getting pretty clever.
âIf, like, a group of people surrounded the robot and blocked it,â said Starship executive Ryan Touhy, âthe robot would identify the situation and say âHello Iâm a Starship delivery robot.
Can you please let me pass.ââ The whole story is a fun read, as is this comment.
Also check out Joan LÀÀneâs post about their mapping and navigation tech for Starshipâs blog.
Whatâs the best way to mitigate the damage malicious bots can do on your social media platform?
Facebookâs answer: building your own set of reinforcement-learning-based bots and setting them loose on a simulated version of your network.
The company is deploying these Web-Enabled Simulations (WESs) to catch bad actors, search for bad content, and figure out how real-world bots could scrape data off the platform and break privacy rules.
Michael Schoenberg and Adarsh Kowdle wrote a deep dive on uDepth, the set of neural networks on Googleâs Pixel 4 phones that enable some cool computational photography features and a real-time depth sensing API at 30 Hz.
Fun bonus: their architecture diagram also highlights whether each step runs on the phoneâs CPU, GPU, or Neural Core.
I recently came across Martin Zinkevichâs 24-page Rules of Machine Learning (PDF), âintended to help those with a basic knowledge of machine learning get the benefit of best practices in machine learning from around Google.â Lots of good stuff in here.
Acquisitions in the augmented reality space are heating up: in the past two weeks alone, Ikea bought Geomagical Labs (which works on placing virtual furniture in rooms; an obvious fit) and Pokémon Go developer Niantic acquired 6D.ai (which works on indoor mapping; another obvious fit).
Cool new paper from Liu et al. (2020): Learning to See Through Obstructions proposes a learning-based approach that can remove things like chain fences and window reflections from photos (see the paper PDF for examples).
This isnât yet productized, but of the collaborators works at Googleâso: how long until this shows up up in the camera app for Pixel phones?
Related to Software 2.0: Martin Casado and Matt Bornstein at venture capital firm Andreessen Horowitz wrote about the new business of AI-powered software and how itâs different from traditional software-as-a-service companies: margins are lower, thereâs a bigger services component, and itâs harder to create a defensible moat.
Luckily they end with a set of tips.
Jaime Lien and Nicholas Gillion wrote an interesting story for the Google AI blog about how their Soli radar-based perception features went from a chunky prototype in the companyâs Advanced Technology and Projects (ATAP) lab to a tiny chip shipping in Pixel 4 phones.
It involved a combination of creating and shipping a novel sensor, as well as designing machine learning models to power Motion Sense, the feature that recognizes hand gestures from radar data.
ïžâNatt Garun at The Verge: Tempo is a smart home gym that uses computer vision to track your form in real time.
Self-driving car company Waymo has raised a big new round of funding.