Oracle Issues in Machine Learning and Where to Find Them
In a paper to be published at ICSE 2020, Liem and Panichella (2020) introduce two heuristics that can be used to semi-automatically uncover high-level issues in data labels and representations. In ImageNet for example, they find that the synonymous “laptop” and “notebook” labels consistently confuse models, and argue that such oracle issues warrant closer collaboration between the machine learning and software testing communities. The paper, called Oracle Issues in Machine Learning and Where to Find Them, also comes with an amazing video where the authors—animated as talking portrait paintings from the wizarding world—describe their “potion for better Defense Against the Dark ML Arts.” It may be the most perfect thing I’ve ever shared in this section.