May 1, 2025
Trending News

Microsoft introduces AI model that understands images and solves visual puzzles

  • March 3, 2023
  • 0

This week researchers from Microsoft presented the Kosmos-1 model. This multimodal model is capable of analyzing images, solving visual puzzles, recognizing visual text, and passing visual IQ tests.

This week researchers from Microsoft presented the Kosmos-1 model. This multimodal model is capable of analyzing images, solving visual puzzles, recognizing visual text, and passing visual IQ tests.

Kosmos-1 not only works with text input, but also with input in the form of images, audio and video. According to Microsoft researchers, the model is an important step in building an artificial general intelligence (AGI) capable of performing human-level tasks.

“As a fundamental part of intelligence, multimodal cognition is necessary to achieve artificial general intelligence for knowledge acquisition and as a basis for the real world,” the researchers write in their scientific work.

Illustrative examples from the paper show how the model analyzes images, answers questions about them, reads text from an image, writes captions and performs a visual IQ test with an accuracy of 22 to 26 percent.

Microsoft Cosmos-1
Source: microsoft

In search of artificial general intelligence

While everyone is talking about Large Language Models (LLM), some AI experts are pointing to multimodal AI as a potential route to artificial general intelligence, a hypothetical technology that could replace humans in any intellectual task.

OpenAI, a key Microsoft partner in the AI ​​space, has previously announced that it is aiming for AGI, reports Ars Technica. In the case of Kosmos-1, however, it appears to be a pure Microsoft project with no involvement from OpenAI.

The researchers call their creation a “multimodal large language model” because the basis of the model lies in the processing of language (like ChatGPT does). In order to understand image-based input, the model must first translate it into text for comprehension, as shown in the image below.

Source: microsoft

Microsoft makes Kosmos-1 available to developers through its GitHub page.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version