Tiferet Gazit wrote about how GitHub made its AI-powered good first issue feature. GitHub is an online platform for collaborating on open- (and closed-) source codebases, where users can write issues to notify repository maintainers of bugs or desired features in their software projects. Maintainers can label low-hanging fruit issues—like documentation updates, error reporting fixes, or output formatting bugs—as a good first issue that new contributors can tackle to familiarize themselves with the project’s codebase.
GitHub recently launched a feature
that automatically detects such good first issues in open-source codebases. Initially it worked by just looking for (synonyms of) the good-first-issue
tags set manually by repository maintainers, but that did not yield enough issues for GitHub’s social news feed:
Relying on these labels, however, means that only about 40 percent of the repositories we recommend have easy issues we can surface. Moreover, it leaves maintainers with the burden of triaging and labeling issues. Instead of relying on maintainers to manually label their issues, we wanted to use machine learning to broaden the set of issues we could surface.
That’s where deep learning comes in: GitHub used this initial sample of good first issues detected based on explicitly-set labels as training data for a natural language classifier that predicts whether an issue qualifies as good first issue based on its title and body text. With this new model, which gets run on all new open-source issues once a day, GitHub is now “able to surface issues in about 70 percent of repositories we recommend to users"—a big improvement!
Gazit’s blog post on the feature goes in depth on a lot of technical details, including data pre-processing/denoising, the deployment setup, and the coverage vs. accuracy tradeoff, that we know typically take much more time on a productized AI project than the actual model creation and training do. It’s great to see these aspects being highlighted. Read the full post on The GitHub Blog: How we built the good first issues feature