Regulations can’t stop artificial intelligence companies: they continue to collect data from the internet
June 22, 2024
0
With the rise of artificial intelligence, companies entering this field need vast amounts of data to develop their own tools. The first alternative that comes to mind to
With the rise of artificial intelligence, companies entering this field need vast amounts of data to develop their own tools. The first alternative that comes to mind to find this data is of course the Internet. on the other hand all data on the internetNot every item can be used to train artificial intelligence. Websites indicate whether data can be collected from them with a file called robots.txt.
According to Reuters, a lot artificial intelligence developer They choose to bypass the prompts in this file and collect data from these sites. Although Perplexity, which bills itself as a “free artificial intelligence search engine,” is one of the companies that is generating the most reaction in this regard, it is not alone in this practice.
OpenAI, anthropic…
According to reports, many artificial intelligence developers robots.txt It bypasses the files and continues to receive content from the sites. While no names were given in the report, it was learned that OpenAI and Anthropic were among these companies. bewilderment It turned out that a server that was being used was also not following these guidelines. Perplexity CEO Aravind Srinivas had previously said the company is “not in a position to first circumvent the protocol and then lie about it.”
The Robots.txt protocol, on the other hand since the 1990s It is used and actually has no legal binding. Perhaps creating a new, stricter and more detailed protocol in this area will help solve the problem.
Alice Smith is a seasoned journalist and writer for Div Bracket. She has a keen sense of what’s important and is always on top of the latest trends. Alice provides in-depth coverage of the most talked-about news stories, delivering insightful and thought-provoking articles that keep her readers informed and engaged.