MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
On this page
We introduce MuAViC, a multilingual audio-visual corpus for robust speechrecognition and robust speech-to-text translation providing 1200 hours ofaudio-visual speech in 9 languages. It is fully transcribed and covers 6English-to-X translation as well as 6 X-to-English translation directions. Tothe best of our knowledge, this is the first open benchmark for audio-visualspeech-to-text translation and the largest open benchmark for multilingualaudio-visual speech recognition. Our baseline results show that MuAViC iseffective for building noise-robust speech recognition and translation models.We make the corpus available at https://github.com/facebookresearch/muavic.
Further reading
- Access Paper in arXiv.org