https://www.xataka.com/aplicaciones/whisper-v3-ha-pasado-desapercibido-herramienta-util-accesible-que-recien-ha-currentado-openai

Sam Altman spent almost no time on this topic during OpenAI DevDay. All the attention was focused on GPT-4 Turbo and GPTs. But for those of us who don’t pay for AI and haven’t gotten used to creating with AI yet promptsThere is a much simpler and more effective tool.

We’re talking about Whisper, which reached its third generation this week. It is a voice recognition model that not only understands and translates dozens of languages, but can also transcribe entire conversations with astonishing accuracy.

Unlike ChatGPT or DALL·E, Whisper V3 is open source. Its code has already been published on Github and is freely available via Hugging Face or Replicate. Using Whisper is as simple as loading the audio file and clicking on it.

Whisper V3 uses commas correctly

Whisper V3, Trained with over a million hours of tagged audio training and with over 4 million hours of so-called tagged audio. Compared to the previous model, Whisper now has 10-20% fewer errors. In the case of Spanish, the error rate is below 5% and it is one of the languages that understands this model best.

In my case, I have been using Whisper V2 for months to help me transcribe interviews in both English and Spanish. I quickly tested the Whisper V3 and the result is even better. The result is almost the same because in the end Whisper V2 already understood the sound very well, but The difference with Whisper V3 is that it is accurate even during speech pausesYou can place commas and periods much more accurately.

Whisper can be used directly as a translator or to transcribe a language. He is also talented Automatically recognize when you switch from one language to another in the same speech. OpenAI, a language model, is intended for other companies or developers to use it for their own voice assistants.

As with previous generations, the Whisper is available in a variety of sizes to suit different applications. From a small version requiring less than 1 GB VRAM and trained with 39 million parameters, to a large model trained with less than 1 GB VRAM 1.55 billion parameters and VRAM requirements of approximately 10 GB. This larger model is the one available directly through Hugging Face or Replicate.

Until now, converting audio to text has always been a disaster. Most of the free tools were giving too many errors due to misplaced words, incorrect shapes or missing expressions. In the end you had to carefully examine all the sounds, so you didn’t save much time.

The results of a free tool for the first time with Whisper V2 convinced me enough. With Whisper V3, I get the feeling that this language model is here to stay. It has everything we want from technology: easy to use, fast, effective and also free. We want more models like this,” Altman said.

Image | Zac Wolff

in Xataka | Spotify voices its podcasts with artificial intelligence and the voices of its creators. The result is amazing

Source: Xataka

David

Donald Salinas is an experienced automobile journalist and writer for Div Bracket. He brings his readers the latest news and developments from the world of automobiles, offering a unique and knowledgeable perspective on the latest trends and innovations in the automotive industry.