Microsoft introduced an artificial intelligence model capable of solving visual puzzles
March 3, 2023
0
Microsoft researchers have unveiled Kosmos-1, an artificial intelligence model that offers many impressive features. how to analyze images for content, solve visual puzzles, visually recognize text, take visual
Microsoft researchers have unveiled Kosmos-1, an artificial intelligence model that offers many impressive features. how to analyze images for content, solve visual puzzles, visually recognize text, take visual IQ tests and understand instructions in natural language.
According to Microsoft researchers, the creation of artificial general intelligence (AGI) requires the creation of a multi-modal AI that integrates various input modes such as text, audio, images, and video. This approach is considered a fundamental step towards creating an AGI capable of performing human-level tasks..
In the academic paper Language Isn’t All You Need: Aligning Perception with Language Models, Microsoft researchers emphasize that multimodal perception is a fundamental part of intelligence, especially for acquiring knowledge and understanding the real world.
“As a basic part of intelligence, multimodal perception is necessary to achieve artificial general intelligence in terms of real-world knowledge acquisition and reasoning,” the North American company’s researchers said in an academic paper.
AI surpassed Microsoft’s latest generation models
According to a scientific paper by Microsoft researchers, Kosmos-1 is capable of analyzing and answering questions about images, reading text from an image, writing image captions, and performing visual IQ tests with an accuracy of 22 to 26 percent. These illustrative examples show the versatility and power of artificial intelligence.
Some AI experts are looking at multimodal AI as a possible way to achieve AGI. According to these experts, this hypothetical technology will be able to replace a person in any intellectual task and intellectual work.
Kosmos-1 is described by researchers as a “large multi-modal language model” (MLLM) that has its roots in natural language processing as a text-based LLM very similar to ChatGPT. To accept an input image, the researchers convert the image into a series of special tokens, which are essentially text that LLM understands.
Continuation after commercial
The researchers found that Kosmos-1 passed several ability tests, including language comprehension and generation, free text classification with optical character recognition, image captioning, responses to visual questions and web pages, and image classification. In several of these tests, Kosmos-1 outperformed the latest generation models.
Mundo Conectado Deal Center: Selection of Discounts and Lowest Prices Best deals on electronics, cell phones, TVs, soundbars, drones and more
Donald Salinas is an experienced automobile journalist and writer for Div Bracket. He brings his readers the latest news and developments from the world of automobiles, offering a unique and knowledgeable perspective on the latest trends and innovations in the automotive industry.