May 5, 2025
Trending News

Anthropic is accused of “aggressive” data scraping

  • July 29, 2024
  • 0

Several companies have denounced the “aggressive” behavior of Anthropic’s web crawler, which visits websites up to millions of times a day to collect data. To make AI models

Anthropic is accused of “aggressive” data scraping

Anthropic data scraping

Several companies have denounced the “aggressive” behavior of Anthropic’s web crawler, which visits websites up to millions of times a day to collect data.

To make AI models intelligent, large amounts of data are required. It is now an open secret that the data comes from the internet. AI companies like OpenAI and Anthropic have “web crawlers” that search the internet and collect publicly available information. In theory, this practice is not illegal, although Anthropic seems to go quite far in this regard.

Kyle Wens, CEO of iFixit, reprimands Anthropic in a post on X. Anthropic’s web crawler reportedly visited the site a million times in 24 hours. It might even be better: The website Freelancer.com recorded 3.5 million visits from Anthropic in just four hours.

Rules of the Internet

Both iFixit and Freelancer.com condemn Anthropic’s “aggressive” way of crawling the Internet. Besides the fact that Anthropic runs on their content, excessive activity from web crawlers can overload servers.

On Freelancer.com, things got so bad that the web administrators even had to blacklist Anthropic. “They’re breaking the rules of the internet,” CEO Matt Barrie told the Financial Times. Anthropic responded that it is investigating the complaints and has no intention of being intrusive.

Makers of large AI models have been under fire for some time for the way they handle public data on the internet. Industry players argue that what is public can be used to train models, although this reasoning is not entirely correct. Equally important is copyright on the internet.

Licensing agreements have now been signed between AI companies and news media or large internet platforms like Reddit, which manage and own a lot of content. In this way, AI companies hope to avoid future lawsuits. Anthropic has not yet entered into such agreements.

Robots.txt

As a web administrator, you can deny web crawlers access to your website. robots.txtCopying the file to your website’s directory will put up a stop sign for web crawlers. However, the system is far from watertight. In fact, it’s quite easy to bypass web crawlers by “masquerading” them as legitimate website visitors.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *