Google gives the Gemini 1.5 Pro ears

Gemini 1.5 Pro can interpret sound. Google only makes the features available to users who have access to Vertex AI and AI Studio.

Google is taking a meaningful step in the further development of its AI model Gemini. Gemini 1.5 Pro can now interpret audio. The model no longer requires a written transcript of a conversation to begin with: you can upload the audio fragment directly. Gemini 1.5 Pro also knows how to handle the sound of videos.

Trump card

The ability to listen to audio directly is an important addition to the capabilities of Google’s AI model. The company made a false start early in the AI hype with the rather painful launch of Gemini’s predecessor: Bard. Google now seems to be well on its way to matching the quality of the LLMs of its main competitor OpenAI. In any case, the integration of audio is a useful addition.

Users will soon be able to start using the new features, but only within Vertex AI and the AI Studio. Finally, the powerful Gemini 1.5 Pro model is not as freely available as the Gemini chatbot or other LLMs. It seems inevitable that the general public will have access to similar features in the future.

Source: IT Daily

Mary

As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.

Recent Posts

An unprecedented ecosystem found under the ice of an Antarctic lake December 24, 2024

Vivo Y29 5G officially launched: entry level with LED Dynamic Light December 24, 2024

Can animals have schizophrenia like humans? December 24, 2024