October 22, 2025
Trending News

OpenAI already has its own web crawler

  • August 8, 2023
  • 0

OpenAI technology is behind some of the most important artificial intelligence services today. Either on its own with ChatGPT and GPT-4 or partnered with Microsoft (which is also

OpenAI already has its own web crawler

OpenAI technology is behind some of the most important artificial intelligence services today. Either on its own with ChatGPT and GPT-4 or partnered with Microsoft (which is also the company’s majority shareholder) with Bing, it rose to prominence with the launch of its chatbot last year and hasn’t been there since. they are gaining more and more notoriety, both for what they have already done and for their plans for the future.

The most notable is of course GPT-5, a brand he registered only a few days ago, and with which the future generation of its generative artificial intelligence model, which serves as the basis for part of its services, will be identified. Rumors began to circulate a few months ago that OpenAI would launch it before the end of the year, but the technology seems to be reserving its launch for later, although it is unclear whether it will take longer to polish. or slow down AI development in response to growing demand until appropriate regulatory frameworks are in place.

If there is a key stage in the process of creating an AI model, no doubt it’s his training, as the amount and quality of data used will directly depend on the a posteriori responsiveness. So OpenAI and other AI companies are constantly working on the process of finding and preparing the data that the models will later consume. Something that, yes, has put these companies in the spotlight for unauthorized use of copyrighted content.

OpenAI already has its own web crawler

As we can read on its website, it seems that the company has found a solution to kill two birds with one stone, and that is OpenAI has launched its own web crawler, a tool that automatically analyzes and indexes web page content. As you already know, this is the same technology used by search engines, only in this case its function will be to feed the company’s AI models with data.

As with search engine bots website administrators can block the OpenAI crawlerand also specify that they only want to analyze the content of certain pages of the same. In addition, they also indicate that content that is behind a paywall, contains personal information, or whose content is against company policy will not be indexed.

I say this is a very smart move and kills two birds with one stone because at the same time improves search and indexing of training information of its models and also by enabling the blocking of said analyses, it offers a tool that OpenAI will be able to use it if it is accused of using content without permission. A very, very smart move.

Source: Muy Computer

Leave a Reply

Your email address will not be published. Required fields are marked *