Meta has announced its latest generative AI model after Voicebox, ImageBind, designed to help creators perform speech-generating tasks such as audio editing, sampling, and stylization, even if they’re not specifically trained to do so through contextual learning.
Claiming that this new model of artificial intelligence will benefit many people around the world, Meta uses examples such as helping visually impaired people hear text messages from their friends with their own voice, and people speaking a foreign language with their own voice.
The AI ​​model itself can both create high-quality sound recordings and edit pre-recorded sounds to remove unwanted glitches such as car horns, preserve the content and style of the sound, is multilingual, generate speech in six languages. Future enhancements to the model include providing natural voices to visual assistants or non-game characters during games in the metaverse.
Meta also compared Voicebox to other AI voice models, specifically naming Vall-E and YourTTS as competitors, showing that Voicebox is more advanced and outpacing both models when comparing Word error rate and style similarity.
Voicebox is built on the Flow Matching model, the latest non-autoregressive generative Meta model that can learn highly non-deterministic mappings between text and speech, allowing Voicebox to learn from a variety of speech data without the need for tagging. carefully, it allows the data to be more diverse and at scale.
Voicebox has so far trained over 50,000 hours of recorded speech and transcription from public audiobooks in English, French, Spanish, German, Polish and Portuguese, and can also predict a segment of speech based on surrounding speech and text. section
Finally, Meta says that while this technology could usher in a new era of productive AI for language, it could create the potential for abuse and unintended harm.
Meta’s research paper on Voicebox will detail how it built a highly efficient classifier that can distinguish real speech from Voicebox generated speech. Meta will not make the AI ​​program itself public and will not release its source code.