Google reports progress on universal speech model
- March 7, 2023
- 0
In November, Google announced that it was launching an initiative that will result in the development of a machine learning model that can recognize and translate the world’s
In November, Google announced that it was launching an initiative that will result in the development of a machine learning model that can recognize and translate the world’s
In November, Google announced that it was launching an initiative that will result in the development of a machine learning model that can recognize and translate the world’s 1,000 most spoken languages. The company has been working on this goal for the past few months and has released a blog post by team members working on the project. The Google team also published an article describing the implementation of the Universal Speech Model (USM) on the pre-render server. arXiv.
The updates provided by Google are part of a larger goal: to create a language translator using automatic speech recognition (ASR) that can translate any language in the world on demand. To this end, they have decided to temporarily limit (to 100) the number of languages they are trying to support, due to the small number of people speaking less lingua franca. Such rare languages lack training datasets.
As part of the announcement, Google outlined the first steps towards USM – breaking it down into families of speech models that have been trained on billions of hours of recorded speech and span over 300 languages. They state that their USM is already used for subtitle language translation on YouTube. They also draw a general model for each family.
Google explains that the models are built using training “pipelines” that contain three types of datasets: single voice, single text, and even ASR data. They also state that they are using coherent models to process the expected 2D parameters required for the project and will do so using three main steps: unsupervised pre-training, supervised pre-training at various targets, and supervised ASR training. The end result will be the generation of two types of models, pre-trained and ASR models.
Also, Google claims that in its current state, its USM performs similarly or better than its Whisper model, a general-purpose speech recognition model created by the GitHub community. In addition to using USM for YouTube, Google is expected to combine its model with other AI applications, including augmented reality devices.
Source: Port Altele
As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.