They’ve now overhauled the system
: instead of attempting to detect whether a query is gender-neutral and then generating two gender-specific translations, it now generates a default translation and, if this translation is indeed gendered, also rewrites it to an opposite-gendered alternative. This rewriter uses a custom dataset to “reliably produce the requested masculine or feminine rewrites 99% of the time.” As before, the UI shows both alternatives to the user.
Another interesting aspect of this update is how they evaluate the overall system:
We also devised a new method of evaluation, named bias reduction, which measures the relative reduction of bias between the new translation system and the existing system. Here “bias” is defined as making a gender choice in the translation that is unspecified in the source. For example, if the current system is biased 90% of the time and the new system is biased 45% of the time, this results in a 50% relative bias reduction. Using this metric, the new approach results in a bias reduction of ≥90% for translations from Hungarian, Finnish and Persian-to-English. The bias reduction of the existing Turkish-to-English system improved from 60% to 95% with the new approach. Our system triggers gender-specific translations with an average precision of 97% (i.e., when we decide to show gender-specific translations we’re right 97% of the time).
The standard academic metrics (recall and average precision) did not answer the most important question about the two different approaches, so the developers came up with a new metric specifically to evaluate relative bias reduction. Beyond machine translation, this is a nice takeaway for productized AI in general: building the infrastructure and metrics to measure how your ML system behaves in its production environment is at least as important as designing the model itself.
In the December 2018 post announcing gender-specific translations
, the authors mention that one next step is also addressing non-binary gender in translations; this update does not mention that, but I hope it’s still on the roadmap. Either way, it’s commendable that Google has continued pushing on this even after the story has been out of the media for a while now.