May 8, 2025
Trending News

Microsoft artificial intelligence imitates someone’s voice in just 3 seconds of audio

  • January 10, 2023
  • 0

Microsoft researchers last week announced VALL-E, an artificial intelligence model capable of converting text to speech by imitating the voice of another person. AI capable imitate voices with

Microsoft artificial intelligence imitates someone’s voice in just 3 seconds of audio

Microsoft researchers last week announced VALL-E, an artificial intelligence model capable of converting text to speech by imitating the voice of another person. AI capable imitate voices with just 3-second audio samples. After studying the voice, VALL-E can synthesize sound and reproduce the voice of the speaker when reading texts.

Microsoft integrates ChatGPT technology into Bing search engine

Interestingly, WALL-E does not imitate a voice based on predetermined texts, but manages to imitate intonation and reproduce emotions when reading sentences that are completely different from the one reproduced by the speaker. According to Microsoft researchers, VALL-E can be used for high-quality text-to-speech, speech editing applications where a person’s recording can be edited and changed from a text transcript (to say something they didn’t originally say). say) and audio content creation in combination with other generative AI models.

How WALL-E works

VALL-E is a “neural codec language model” and is based on the EnCodec technology announced by Meta in October 2022. To be able to synthesize voices, the AI ​​generates audio codec codes from the speaker’s speech prompts. Instead of simply synthesizing a sound wave, WALL-E is able to “analyze how a person speaks” and break the information into small tokens.

In order to synthesize and play different texts while imitating the speaker’s voice, VALL-E: “generates the corresponding acoustic markers corresponding to the acoustic markers of the recorded 3-second recording. Finally, the generated acoustic markers are used to synthesize the final waveform with the appropriate neural codec decoder,” Microsoft commented on the official VALL-E page.

VALL-E simulates voice tone and emotion

On the official VALL-E website, Microsoft shows several examples of how AI works. Among the sound playback options, we can see the “speaker prompt”, which is a three-second sound provided by VALL-E. “Ground truth” is the same phrase that WALL-E will play when spoken by the announcer for comparison purposes (audio control), and the WALL-E sample is the text played by artificial intelligence.

It is important to emphasize that VALL-E reproduces text based on information obtained from the phrase “Speaker Prompt”, while “Ground Truth” is used as a comparison. That is, AI is able to reproduce voices and read texts based on audio recordings with completely different information than the one it reads.


Continuation after commercial


The most interesting thing is that AI is also able to reproduce emotions. In the print above, taken from the WALL-E website, we see WALL-E expressing various emotions, such as anger, by saying “We must reduce the amount of plastic bags.” Interestingly, the information is extracted from audio recordings with different phrases. When the announcer angrily says, “Her face was pressed against his chest,” the AI ​​mimics the voice of the announcer, also annoyed, and reads the phrase from the plastic bags.

The end result is impressive, and the way artificial intelligence can easily replicate the tone and emotion of a speaker’s voice is simply surreal. Although not all results are very similar, AI shows that in the future it will be very easy to reproduce someone’s voice to imitate any sentence.something that can be scary.

Apple is developing its own chip with built-in Wi-Fi and Bluetooth

Apple is developing its own chip with built-in Wi-Fi and Bluetooth
The company is also developing a 5G modem to replace Qualcomm and Broadcomm solutions.

Source: ArsTechnika, WALL-E

…..

Source: Mundo Conectado

Leave a Reply

Your email address will not be published. Required fields are marked *