Voicebox, Meta brings AI to text-to-speech

Meta hasn’t stopped pulling out her chest for the past few months, about artificial intelligence, and Voiccebox is just the latest in a growing list of his samples. Released hype from metaversion (hype internally, inside the company, because expectations from the outside were never very high), it seems that the company has decided to focus its efforts on other areas of greater interest and potential for growth, which its accountants and its customers will undoubtedly appreciate. investors.

As I said, Meta has shown a strong interest in artificial intelligence for quite some time, but only with the rise of this technology, especially thanks to generative models, decided to start publishing documents and sample projects interestingly, in some cases it also allows you to download models. Something I can’t help but relate to Yann LeCun’s statement at the end of January in which he said that ChatGPT is not that innovative. A statement that naturally made us wonder what they were working on.

Since then we have seen presentation and release of the flame (Large Language Model Meta AI) and the image feature segmentation tool SAM, among others, in addition to the most common approaches to artificial intelligence today, such as the chatbot that will soon arrive on Instagram. So it would be unfair to acknowledge that Meta knows how to position itself as a technology to be reckoned with when we talk about artificial intelligence.

Voicebox, Meta brings AI to text-to-speech

The most recent example of this is in Voicebox, an artificial intelligence model that converts text to speech. These types of tools have been around for a long time, but until now most solutions of this type are based on the use of a huge number of samples that are used to compose each text-to-speech conversion. This gives reasonable results, but it is common to find strange intonations and similar effects.

Voicebox has been trained on over 50,000 hours of unfiltered audio. As we can read on its website, Meta used a recorded voice and transcriptions of a bunch of public domain audiobooks read in English, French, Spanish, German, Polish and Portuguese. With this training, the model is able to generate truly realistic narration as well as take an existing recording with background noise and return a clean version of it.

ANDSpeech synthesis is a very active field in the world of artificial intelligence. Voicebox is just the latest example, but we’ve also recently heard about VALL-E, a model created by Microsoft capable of imitating voices, with the possibilities and risks this presents, and Apple’s plans to generate audiobooks from textual originals.

Source: Muy Computer

David

Donald Salinas is an experienced automobile journalist and writer for Div Bracket. He brings his readers the latest news and developments from the world of automobiles, offering a unique and knowledgeable perspective on the latest trends and innovations in the automotive industry.

Recent Posts

An unprecedented ecosystem found under the ice of an Antarctic lake December 24, 2024

Vivo Y29 5G officially launched: entry level with LED Dynamic Light December 24, 2024

Can animals have schizophrenia like humans? December 24, 2024