Data, Technologies and Benchmarks for the Spoken Languages of the World

We are pleased to announce that the meeting on “Data, Technologies and Benchmarks for the Spoken Languages of the World” will be co-located with the 2022 IEEE Spoken Language Technology Workshop (IEEE SLT 2022). The half-day, hybrid meeting will focus on building and growing speech and language communities around the world. An initial set of language families we plan to discuss include: South African, Arabic and Indic Languages. We are reaching out to expand this language set to a larger set of low resource languages.

Venue

Multipurpose Room
Qatar Computing Research Institute
Researchery (HBKU Research Complex) - B1
Education City
Doha, Qatar

Time: January 13th, 9:00 AM - 12:00 PM (Qatar Time)

embedgooglemap.net

🚧 QCRI is about a 20-minute car ride from the hotel venue in Marsa Malaz Kempinski, The pearl. For those of you who have registered to attend in-person, please come to building B1 in the research complex. For thoses staying in Marsa Malaz Kempinski or nearby please meet us at the lobby of Kempinski by 8:30 am.

📢 Online-Meeting Details
Topic: Data and Technologies for All Spoken Languages of the World Time: Jan 13, 2023 01:30 AM Eastern Time (US and Canada) Please register for this webinar Here. You will receive a passcode/direct URL immediately to join the meeting when you register

Talks

09:00 09:10 AM

Building Responsible AI Standards for Speech Data

Daniela Braga

Affiliation: Defined.ai

daniela@defined.ai

Alessandro Giannetti

Affiliation: Defined.ai

alessandro.giannetti@definedcrowd.com

09:10 09:20 AM

The current projects on language resources collections

Khalid Choukri

Affiliation: ELRA/ELDA Remote

choukri@elda.org

09:20 09:30 AM

Linguistic Data Consortium/University of Pennsylvania : Overview

Denise DiPersio

Affiliation: Linguistic Data Consortium

dipersio@ldc.upenn.edu

09:30 09:40 AM

Transperfect - Overview

Dorota Iskra

Affiliation: Transperfect

diskra@transperfect.com

09:40 09:50 AM

Localising the Mozilla Common Voice platform for South Africa's official languages

Febe de Wet

Affiliation: Stellenbosch University, South Africa

fdw@sun.ac.za

09:50 10:00 AM

Under-resourced “code-switched speech recognition in South African languages”

Joshua Jansen van Vüren

Affiliation: Stellenbosch University

jjvanvueren@sun.ac.za

10:00 10:10 AM

ASR Data Collection and Systems in Indian Languages

Srinivasan Umesh

Affiliation: IIT Madras, India

umeshs@ee.iitm.ac.in

10:10 10:20 AM

Benchmarking for Accented Speech Recognition

Preethi Jyothi

Affiliation: IIT Bombay

pjyothi@cse.iitb.ac.in

10:20 10:30 AM

My voice in ArabicSpeech

Ahmed Ali

Affiliation: HBKU

amali@hbku.edu.qa

10:30 10:40 AM

Purépecha language

Karina Figueroa

Affiliation: UMSNH

karina.figueroa@umich.mx

10:40 10:50 AM

Recent progress in the evaluation of Mexican Spanish

Ivan Vladimir Meza Ruiz

Affiliation: IIMAS, UNAM

ivanvladimir@turing.iimas.unam.mx

10:50 11:00 AM

ASR Challenges with Automatic Speech Recognition of Indian Languages Developing ASR systems for Indian languages

Vasista Sai Lodagala

Affiliation: Indian Institute of Technology, Madras

vasista.lodagala@gmail.com

11:00 11:10 AM

SIG for under-resourced languages

Sakriani Sakti

Affiliation: NAIST

ssakti@is.naist.jp

The goal of this meeting is to align our communities around data, tools, models and benchmarks and create a Special Interest Group (SIG) around this effort (Few examples here). There have been several efforts around low resource languages , such as BABEL, MGB Challenge, BULB. We would like to bring them together under an umbrella such as SUPERB (or a similar focussed effort) and encourage joint research in this field.

Please share the invitation with colleagues who may be interested in participating in this working group.

Please register your interest here so we may share logistical information regarding the meeting.