Microsoft introduces AI model that understands images and solves visual puzzles
March 3, 2023
0
This week researchers from Microsoft presented the Kosmos-1 model. This multimodal model is capable of analyzing images, solving visual puzzles, recognizing visual text, and passing visual IQ tests.
This week researchers from Microsoft presented the Kosmos-1 model. This multimodal model is capable of analyzing images, solving visual puzzles, recognizing visual text, and passing visual IQ tests.
Kosmos-1 not only works with text input, but also with input in the form of images, audio and video. According to Microsoft researchers, the model is an important step in building an artificial general intelligence (AGI) capable of performing human-level tasks.
“As a fundamental part of intelligence, multimodal cognition is necessary to achieve artificial general intelligence for knowledge acquisition and as a basis for the real world,” the researchers write in their scientific work.
Illustrative examples from the paper show how the model analyzes images, answers questions about them, reads text from an image, writes captions and performs a visual IQ test with an accuracy of 22 to 26 percent.
Source: microsoft
In search of artificial general intelligence
While everyone is talking about Large Language Models (LLM), some AI experts are pointing to multimodal AI as a potential route to artificial general intelligence, a hypothetical technology that could replace humans in any intellectual task.
OpenAI, a key Microsoft partner in the AI space, has previously announced that it is aiming for AGI, reports Ars Technica. In the case of Kosmos-1, however, it appears to be a pure Microsoft project with no involvement from OpenAI.
The researchers call their creation a “multimodal large language model” because the basis of the model lies in the processing of language (like ChatGPT does). In order to understand image-based input, the model must first translate it into text for comprehension, as shown in the image below.
Source: microsoft
Microsoft makes Kosmos-1 available to developers through its GitHub page.
As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.