Developers, hobbyists and makers looking for an open source multilanguage dataset voices that can be used to train speech enabled applications, may be interested to know that NVIDIA and Mozilla have released the latest Common Voice Dataset surpassing 13,000 hours of crowd-sourced speech data, as well as adding another 16 languages to the corpus. The Common Voice is the world’s largest open data voice dataset and designed to democratize voice technology and is already used by developers, researchers and academics worldwide.
“NVIDIA has released multilingual speech recognition models in NGC for free as part of the partnership mission to democratize voice technology. NeMo is an open-source toolkit for researchers developing state-of-the-art conversational AI models. Researchers can further fine-tune these models on multilingual datasets. See an example in this notebook that fine tunes an English speech recognition model on the MCV Japanese dataset.
Contributors mobilize their own communities to donate speech data to the MCV public database, which anyone can then use to train voice-enabled technology. As part of NVIDIA’s collaboration with Mozilla Common Voice, the models trained on this and other public datasets are made available for free via an open-source toolkit called NVIDIA NeMo.”
The latest Common Voice Dataset now consists of 13,905 hours, an increase of 4,622 hours from the previous release and introduces 16 new language to the dataset including Basaa, Slovak, Northern Kurdish, Bulgarian, Kazakh, Bashkir, Galician, Uyghur, Armenian, Belarusian, Urdu, Guarani, Serbian, Uzbek, Azerbaijani, Hausa. The top five languages in the Common Voice Dataset by total hours are English (2,630 hours), Kinyarwanda (2,260), German (1,040), Catalan (920), and Esperanto (840).
“Languages that have increased the most by percentage are Thai (almost 20x growth, from 12 hours to 250 hours), Luganda (9x growth, from 8 hours to 80 hours), Esperanto (more than 7x growth, from 100 hours to 840 hours), and Tamil (more than 8x growth, from 24 hours to 220 hours).”