Cloudflare's ML-powered bot blocking
Cloudflare’s overview of good and bad bots.
Web infrastructure company Cloudflare is using machine learning to block “bad bots” from visiting their customers’ websites. Across the internet, malicious bots are used for content scraping, spam posting, credit card surfing, inventory hoarding, and much more. Bad bots account for an astounding 37% of internet traffic visible to Cloudflare (humans are responsible for 60%).
Machine learning is responsible for 83% of detection mechanisms. Because support for categorical features and inference speed were key requirements, Cloudflare went with gradient-boosted decision trees as their model of choice (implemented using CatBoost). They run at about 50 microseconds per inference, which is fast enough to enable some cool extras. For example, multiple models can run in shadow mode (logging their results but not influencing blocking decisions), so that Cloudflare engineers can evaluate their performance on real-world data before deploying them into the Bot Management System.
Alex Bocharov wrote about the development of this system for the Cloudflare blog. It’s a great read on adding an AI-powered feature to a larger product offering, with good coverage of all the tradeoffs involved in that process.