Artificial intelligence technologies are developing at an incredible rate. After artificial intelligence models that can create images from your words and communicate with you, Microsoft developed the artificial intelligence VALL-E, which can imitate every sound it hears in just three seconds. Unlike many AI tools, VALL-E can reproduce a speaker’s emotion and tone, even creating a record of words the original speaker never spoke. Details are here…
VALL-E: an artificial intelligence tool that can reproduce any sound
Microsoft recently released an artificial intelligence tool known as VALL-E that can reproduce human voices. The tool uses only 3 seconds of recording of a specific voice as prompt to create content and is trained on 60,000 hours of English data. The AI ​​model can even reproduce the speaker’s emotion and tone by creating a record of words the original speaker never spoke.
This is a significant advance in the field of artificial intelligence-generated speech, as previous models can only reproduce sound, but not the emotions or tone of the speaker. A Cornell University article used VALL-E to synthesize multiple voices, and some examples of the work are available on GitHub. While the quality of the audio samples provided by Microsoft varies, some sound natural and others are clearly machine-generated and the sound is robotic. But as AI technology continues to evolve, the records created are likely to become more convincing.
However, there are concerns about the ethical implications of this technology. As AI becomes more powerful, the voices produced by VALL-E and similar technologies will become more believable, which can lead to realistic spam calls that mimic the voices of real people the potential victim knows. Politicians and other public figures can also impersonate themselves, which can lead to the spread of misinformation on social networks.
Additionally, there are security issues. Some banks use voice recognition technology to authenticate the caller, but as AI-generated voices become more convincing, it can become harder to detect if the caller is using a VALL-E voice. In addition, the technology could also affect voice actors, as their services may no longer be needed if AI-generated sounds become more realistic.
VALL-E is an impressive artificial intelligence tool that can revolutionize the field of voice synthesis. However, it also brings with it a few ethical and safety concerns. It will be important for companies like Microsoft to develop measures to regulate the use of VALL-E to ensure that it is used for good, not malicious purposes.