Meta adds important improvements to SeamlessM4T

It’s been a little over three months because Meta introduced SeamlessM4T, a multimodal model dedicated to translation which, from what we could see at the time and some of the testimonies that have since surfaced, appears to be excellent work on the company’s part. Right now, I can’t remember how long Yann LeCun has been working for Meta, but since he talked about ChatGPT earlier this year, he seems to have hit the gas and allowed Meta to climb the ranks in the booming AI industry. ecosystem.

During this year, we saw how to do it the company announced artificial intelligence models with the most diverse functions. From his LLM Llama to the model that separates the different elements of the image or of course the chatbots with which he wants to attract a younger audience to his services. It’s all there, and while I think some ideas are far more interesting than others, the technical level achieved seems to be deservedly remarkable in all cases.

SeamlessM4T, as we told you back then, is a multimodal AI model able to translate between 100 languages. As you already know, when we talk about a multimodal model, we mean the fact that it allows input and output of data in different formats, in this case both written and audio. Let’s recall what their abilities were:

Automatic speech recognition for almost 100 languages
Translation voice to text for nearly 100 input and output languages
Translation voice to voicesupporting nearly 100 input languages and 35 (+English) output languages
Translation text to text for nearly 100 languages
Translation text to speechsupporting nearly 100 input languages and 35 (+English) output languages

Meta adds important improvements to SeamlessM4T

Well, if what it offered was already interesting, as we can read on Engadget SeamlessM4T adds two features that make it much more practical and interesting. They are the following:

Seamless Expression: As you may have already deduced from the name, this function adds expressive features to the output, in voice, of SeamlessM4T. So now the model, in addition to translating the message, the output speech will take into account, among other things, the volume of the voice, the emotional tone, the speed of the source speech and pauses. This will undoubtedly significantly reduce the perception of a robotic voice caused by this type of output, which is generally somewhat impersonal.
Seamless streaming: in the style of simultaneous translation, the model will not wait for the input speech to finish before starting the output speech, but will do it “on the fly”, with a small delay of only two or three seconds. This pattern is usually found in first-person testimony documents, in which we can hear the original voice and through it, and with a delay of about a second, the already translated speech.

It is not yet clear when these features will reach SeamlessM4T users, but we are undoubtedly talking about a very promising technology that can greatly facilitate communication and which therefore brings us one step closer to the long-awaited babelfish The Hitchhiker’s Guide to the Galaxy.

Source: Muy Computer

David

Donald Salinas is an experienced automobile journalist and writer for Div Bracket. He brings his readers the latest news and developments from the world of automobiles, offering a unique and knowledgeable perspective on the latest trends and innovations in the automotive industry.

Recent Posts

An unprecedented ecosystem found under the ice of an Antarctic lake December 24, 2024

Vivo Y29 5G officially launched: entry level with LED Dynamic Light December 24, 2024

Can animals have schizophrenia like humans? December 24, 2024