May 12, 2025
Trending News

Cloudflare disables web crawlers with one click

  • July 4, 2024
  • 0

Cloudflare introduces a new tool to keep web crawlers away from your website. The method is more robust than the robot.txt trick. In the age of generative AI,

Cloudflare disables web crawlers with one click

Web crawler

Cloudflare introduces a new tool to keep web crawlers away from your website. The method is more robust than the robot.txt trick.

In the age of generative AI, it’s more important than ever to protect your content. Today, the internet is teeming with web crawlers looking for data to train models. Cloudflare announces a new technology that makes it easier for web administrators to block web crawlers. The tool is based on a “digital fingerprint” system.

Web crawlers have become an integral part of internet traffic. Cloudflare estimates that about forty percent of the one million visited properties it manages have already been visited by a web crawler, and that’s eighty percent of the top ten. These are digital “spiders” that crawl around websites unnoticed, collecting data to train AI models.

According to Cloudflare, the most active web crawler is Bytespider, owned by Bytedance, the parent company of TikTok. This web crawler has already been spotted on forty percent of websites. But also GPTBot OpenAI is very present at 35 percent.

Cloudflare
The most common web crawlers. Source: Cloudflare

Robots.txt

Basically, there is already a trick to make it more difficult for web crawlers. robots.txtPutting a file in your website’s directory is a stop sign for web crawlers. OpenAI and Google also promote this trick for web administrators who cannot get a visit from their web crawlers.

However, Robot.txt is far from foolproof, says Cloudflare. Web administrators don’t always implement it, or only implement it against a limited number of web crawlers. Moreover, web crawler developers don’t always play it fair either. This can easily be circumvented by “masquerading” a web crawler as a legitimate website visitor.

Digital fingerprint

Cloudlfare has developed a new anti-crawler net that provides fewer loopholes for web crawlers. The tool checks the “fingerprint” of the identity that sends a request to your website. Ironically, Cloudflare uses machine learning to determine whether the fingerprint is a web crawler or not.

The tool is available to all Cloudflare customers and can be activated with a simple click in the management dashboard. There will be a new button in the security menu Block AI scrapers and crawlers.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *