Xiaomi has made significant progress in voice tagging tasks, reaching a notable milestone with its proprietary voice recognition algorithm. Using audio data from the publicly available AudioSet-2M dataset for training, Xiaomi’s audio tagging model achieved a performance above 50 mA for the first time. This success puts Xiaomi’s voice recognition algorithm in a leading position on the world stage.
Xiaomi’s breakthrough in voice recognition technology
To provide context, Google split the AudioSet dataset into three parts; The first two subsets, collectively known as “AudioSet-2M”, were used for training purposes. Xiaomi’s voice recognition algorithm model exceeded the 50 mA threshold in this training dataset, setting a new standard in voice tagging technology.
Additionally, Xiaomi also introduced a mini version of this model designed for resource-constrained scenarios. Despite its diminutive size, this Mini model outperforms similar models from other organizations.
This technological advancement has practical value as it can be widely applied to Xiaomi smart devices and improve the overall user experience. The algorithm detects children crying, animal sounds, car engines, etc. It is superior at recognizing a variety of environmental sounds, such as sound, and can represent these sounds in various forms, such as text.
In addition, Xiaomi robots also benefit greatly from this algorithm technology. The CyberOne humanoid robot can recognize 85 types of environmental sounds and perceive a wide range of human emotions through hearing. The second-generation biomimetic quadruped robot CyberDog 2 can further enhance its dynamic response capabilities by identifying 38 types of environmental sounds.
Also read – BAE Systems presented a new technology for mass production of artillery ammunition