View
Proceedings

Program at a Glance

This year, we are experimenting with a new approach to sessions. We have organized all sessions to include papers from all the areas to maximize authors’ likelihood of seeing other posters in their area. We hope that this will allow more interactions between participants. We will collect feedback to hear your opinion.

The poster ID is as follows: [Day]-[Session]-[Poster Number]-[Topic],
where:
[Day] is the day of the conference (1,2, 3 or 4),
[Session] is 1 for monring and 2 for afternoon sessions,
[Poster Number] is the number of the poster within the session and
[Topic] is the technical area of the work as follows

Topic ID Technical Area
ASR 01. Automatic speech recognition
SLP 02. Spoken language processing
SES 03. Speech enhancement and separation
ANA 04. Speech analysis
SLR 05. Speaker and language recognition
DIA 06. Speaker diarization
TLP 07. Text-only language processing
MMP 08. Multimodal speech processing
MLP 09. Multilingual processing
EMR 10. Emotion recognition and paralinguistics
TTS 11. Speech synthesis and spoken language generation
RES 12. Resources (new corpora, toolkits, evaluation metrics, etc.)
MLS 13. Machine learning for speech applications
SUP 14. SUPERB challenge
DEMO Demonstrations (DEMO)

Technical Program Per Day

  1. Poster ID Paper Title Paper ID
    1-1-1-MLP Exploration of Language-Specific Self-Attention Parameters for Multilingual End-to-End Speech Recognition 13
    1-1-2-ASR ASBERT: ASR-SPECIFIC SELF-SUPERVISED LEARNING WITH SELF-TRAINING 71
    1-1-3-ASR SUB-8-BIT QUANTIZATION FOR ON-DEVICE SPEECH RECOGNITION: A REGULARIZATION-FREE APPROACH 134
    1-1-4-ASR G-AUGMENT: SEARCHING FOR THE META-STRUCTURE OF DATA AUGMENTATION POLICIES FOR ASR 240
    1-1-5-MLP How Do Phonological Properties Affect Bilingual Automatic Speech Recognition? 329
    1-1-6-MLP Scaling Up Deliberation for Multilingual ASR 109
    1-1-7-ASR Context-aware Neural Confidence Estimation for Rare Word Speech Recognition 278
    1-1-8-ASR Flickering reduction with partial hypothesis reranking for streaming ASR 86
    1-1-9-ASR InterDecoder: Using Attention Decoders as Intermediate Regularization for CTC-based Speech Recognition 187
    1-1-10-SLP Automatic Rating of Spontaneous Speech for Low-Resource Languages 199
    1-1-11-SLP Mixture of Domain Experts for Language Understanding: An Analysis of Modularity, Task Performance, and Memory Tradeoffs 25
    1-1-12-SES MULTI-STAGE PROGRESSIVE AUDIO BANDWIDTH EXTENSION 69
    1-1-13-SES JOINT OPTIMIZATION OF DIFFUSION PROBABILISTIC-BASED MULTICHANNEL SPEECH ENHANCEMENT WITH FAR-FIELD SPEAKER VERIFICATION 243
    1-1-14-ANA Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation 175
    1-1-15-MLS Speed-Robust Keyword Spotting via Soft Self-Attention on Multi-Scale Features 79
    1-1-16-ASR CCC-WAV2VEC 2.0: CLUSTERING AIDED CROSS CONTRASTIVE SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS 363
    1-1-17-TLP Fine Grained Spoken Document Summarization Through Text Segmentation 7
    1-1-18-MMP Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection 42
    1-1-19-MMP Towards visually prompted keyword localisation for zero-resource spoken languages 178
    1-1-20-EMR SPEECH EMOTION RECOGNITION WITH COMPLEMENTARY ACOUSTIC REPRESENTATIONS 315
    1-1-21-TTS WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration 141
    1-1-22-TTS On granularity of prosodic representations in expressive text-to-speech 258
    1-1-23-TTS Can we use Common Voice to train a Multi-Speaker TTS system? 81
    1-1-24-MLS Distilling Sequence-to-Sequence Voice Conversion Models For Streaming Conversion Applications 180
    1-1-25-MLS AUTOMATIC PREDICTION OF INTELLIGIBILITY OF WORDS AND PHONEMES PRODUCED ORALLY BY JAPANESE LEARNERS OF ENGLISH 355
    1-1-26-SUP On the Utility of Self-supervised Models for Prosody-related Tasks 313

  2. Poster ID Paper Title Paper ID
    1-2-1-ASR JOIST: A Joint Speech and Text Streaming Model For ASR 23
    1-2-2-MLP Code-switched language modelling using a code predictive LSTM in under-resourced South African languages 76
    1-2-3-ASR A CONTEXT-AWARE KNOWLEDGE TRANSFERRING STRATEGY FOR CTC-BASED ASR 147
    1-2-4-ASR Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR 252
    1-2-5-MLP IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS 342
    1-2-6-ASR Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR 121
    1-2-7-ASR E-Branchformer: Branchformer with Enhanced merging for speech recognition 310
    1-2-8-ASR CONFORMER-BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH KD COMPRESSION AND TWO-PASS ARCHITECTURE 169
    1-2-9-ASR Accelerator-Aware Training for Transducer-based Speech Recognition 220
    1-2-10-SLP A DATA-DRIVEN INVESTIGATION OF NOISE-ADAPTIVE UTTERANCE GENERATION WITH LINGUISTIC MODIFICATION 85
    1-2-11-SLP On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding 111
    1-2-12-SES Spatial-DCCRN: DCCRN Equipped with Frame-level Angle Feature and Hybrid Filtering for Multi-channel Speech Enhancement 70
    1-2-13-SES IMPROVED NORMALIZING FLOW-BASED SPEECH ENHANCEMENT USING AN ALL-POLE GAMMATONE FILTERBANK FOR CONDITIONAL INPUT REPRESENTATION 245
    1-2-14-ANA VSAMETER: EVALUATION OF A NEW OPEN-SOURCE TOOL TO MEASURE VOWEL SPACE AREA AND RELATED METRICS 237
    1-2-15-SLR FREQUENCY AND MULTI-SCALE SELECTIVE KERNEL ATTENTION FOR SPEAKER VERIFICATION 136
    1-2-16-DIA Joint speaker diarisation and tracking in switching state-space model 10
    1-2-17-TLP AN ANALYSIS OF THE EFFECTS OF DECODING ALGORITHMS ON FAIRNESS IN OPEN-ENDED LANGUAGE GENERATION 63
    1-2-18-MMP Exploiting information from native data for non-native automatic pronunciation assessment 123
    1-2-19-MLP Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition 270
    1-2-20-EMR A ZERO-SHOT APPROACH TO IDENTIFYING CHILDREN’S SPEECH IN AUTOMATIC GENDER CLASSIFICATION 322
    1-2-21-TTS GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models 167
    1-2-22-TTS Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy 273
    1-2-23-RES STOP: A DATASET FOR SPOKEN TASK ORIENTED SEMANTIC PARSING 120
    1-2-24-MLS SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning 204
    1-2-25-MLS PEPPANET: EFFECTIVE MISPRONUNCIATION DETECTION AND DIAGNOSIS LEVERAGING PHONETIC, PHONOLOGICAL, AND ACOUSTIC CUES 368

  3. Time Sponsor Title
    17:00 - 17:20 QNRF Dr. Ali Alaboudy, “Introducing QNRF Funding Programs: Digital Technology Track”
    17:20 - 17:40 Google Fadi Biadsy, “Speech Model Personalization: From Research to Production”
    17:40 - 18:00 Amazon Björn Hofmeister, “All-Neural ASR - The Next Challenges“

  4. Poster ID Paper Title Paper ID
    2-1-1-ASR Untied Positional Encodings for Efficient Transformer-based Speech Recognition 29
    2-1-2-ASR Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio 92
    2-1-3-ASR Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition 150
    2-1-4-ASR Damage Control during Domain Adaptation for Transducer Based Automatic Speech Recognition 254
    2-1-5-ASR PADA: PRUNING ASSISTED DOMAIN ADAPTATION FOR SELF-SUPERVISED SPEECH REPRESENTATIONS 361
    2-1-6-ASR MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario 137
    2-1-7-ASR Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition 157
    2-1-8-MMP TRANSFORMER-BASED LIP-READING WITH REGULARIZED DROPOUT AND RELAXED ATTENTION 84
    2-1-9-ASR Residual Adapters for Targeted Updates in RNN-Transducer Based Speech Recognition System 227
    2-1-10-SLP Response Timing Estimation for Spoken Dialog Systems based on Syntactic Completeness Prediction 309
    2-1-11-SLP Weak-Supervised Dysarthria-invariant Features for Spoken Language Understanding using an FHVAE and Adversarial Training 194
    2-1-12-SES Exploring WavLM on Speech Enhancement 149
    2-1-13-SES Adaptive-FSN: Integrating full-band extraction and adaptive sub-band encoding for monaural speech enhancement 247
    2-1-14-ANA INVESTIGATING THE IMPORTANT TEMPORAL MODULATIONS FOR DEEP-LEARNING-BASED SPEECH ACTIVITY DETECTION 276
    2-1-15-SLR AN ATTENTION-BASED BACKEND ALLOWING EFFICIENT FINE-TUNING OF TRANSFORMER MODELS FOR SPEAKER VERIFICATION 179
    2-1-16-DIA Diarisation using location tracking with agglomerative clustering 11
    2-1-17-TLP N-BEST HYPOTHESES RERANKING FOR TEXT-TO-SQL SYSTEMS 236
    2-1-18-MMP SpeechCLIP: Integrating Speech with Pre-trained Vision and Language Model 146
    2-1-19-EMR Distribution-based Emotion Recognition in Conversation 22
    2-1-20-TTS StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models 43
    2-1-21-TTS Learning accent representation with multi-level VAE towards controllable speech synthesis 185
    2-1-22-TTS vTTS: visual-text to speech 314
    2-1-23-MLP FLEURS: FEW-SHOT LEARNING EVALUATION OF UNIVERSAL REPRESENTATIONS OF SPEECH 133
    2-1-24-MLS Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection 216
    2-1-25-SUP Improving generalizability of distilled self-supervised speech processing models under distorted settings 53
    2-1-26-SES AVSE CHALLENGE: AUDIO-VISUAL SPEECH ENHANCEMENT CHALLENGE 374

  5. Poster ID Paper Title Paper ID
    2-2-1-ASR IMPROVED NOISY ITERATIVE PSEUDO-LABELING FOR SEMI-SUPERVISED SPEECH RECOGNITION 65
    2-2-2-ASR GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION 94
    2-2-3-ASR Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition 166
    2-2-4-ASR NAM+: TOWARDS SCALABLE END-TO-END CONTEXTUAL BIASING FOR ADAPTIVE ASR 279
    2-2-5-DIA Continual Self-supervised Domain Adaptation for End-to-end Speaker Diarization 4
    2-2-6-ASR Modular Hybrid Autoregressive Transducer 143
    2-2-7-ASR How Does Pre-trained Wav2Vec 2.0 Perform on Domain-Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications 164
    2-2-8-MLP Improving Semi-supervised E2E ASR using CycleGAN and Inter-domain Losses 115
    2-2-9-ASR Internal Language Model Personalization of E2E Automatic Speech Recognition Using Random Encoder Features 256
    2-2-10-SLP Building Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems 330
    2-2-11-SLP NON-AUTOREGRESSIVE END-TO-END APPROACHES FOR JOINT AUTOMATIC SPEECH RECOGNITION AND SPOKEN LANGUAGE UNDERSTANDING 226
    2-2-12-SES TEA-PSE 2.0: SUB-BAND NETWORK FOR REAL-TIME PERSONALIZED SPEECH ENHANCEMENT 172
    2-2-13-SES EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers 316
    2-2-14-SLR Flow-ER: a Flow-based Embedding Regularization Strategy for Robust Speech Representation Learning 3
    2-2-15-SLR UNSUPERVISED DOMAIN ADAPTATION OF NEURAL PLDA USING SEGMENT PAIRS FOR SPEAKER VERIFICATION 277
    2-2-16-DIA Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization 93
    2-2-17-MLS TDOA ESTIMATION OF SPEECH SOURCE IN NOISY REVERBERANT ENVIRONMENTS 312
    2-2-18-MMP YFACC: A Yorùbá Speech-Image Dataset for Cross-lingual Keyword Localisation through Visual Grounding 153
    2-2-19-MLP MULTILINGUAL SPEECH EMOTION RECOGNITION WITH MULTI-GATING MECHANISM AND NEURAL ARCHITECTURE SEARCH 113
    2-2-20-TTS Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech 50
    2-2-21-TTS Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion 217
    2-2-22-MLP Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using $\beta$-VAE 335
    2-2-23-RES Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition 274
    2-2-24-MLS Phoneme Segmentation Using Self-Supervised Speech Models 268
    2-2-25-SUP Exploring Efficient-tuning Methods in Self-supervised Speech Models 106

  6. Poster ID Paper Title Paper ID
    2-3-1-DEMO ISPEAK: INTERACTIVE SPOKEN LANGUAGE UNDERSTANDING SYSTEM FOR CHILDREN WITH SPEECH AND LANGUAGE DISORDERS DEMO
    2-3-2-DEMO LUX-ASR: BUILDING AN ASR SYSTEM FOR THE LUXEMBOURGISH LANGUAGE DEMO
    2-3-3-DEMO ON-DEVICE STREAMING TARGET-SPEAKER ASR WITH NEURAL TRANSDUCER DEMO
    2-3-4-DEMO VOICE-ENABLED AUDIOVISUAL AGENT FOR QUESTION ANSWERING IN ENGLISH AND ARABIC DEMO

  7. Time Sponsor Title
    17:30 - 17:50 Apptek Mohammad Zeineldeen, “Fully Automatic Video Dubbing at AppTek”
    17:50 - 18:00 3M Dr. Jing Su, “From pre-trained language models to practical medical scribing solutions”
    18:00 - 18:10 LXT Martha Hakvoort, “Powering AI innovation with High-quality data”
    18:10 - 18:20 DataForce Dr Dorota Iskra, “Data-Centric Approach to AI”
    18:20 - 18:40 SCAI Dr. Areeb Alowisheq, “SCAI: unlocking value with AI”

  8. Poster ID Paper Title Paper ID
    3-1-1-ASR Towards End-to-end Unsupervised Speech Recognition 66
    3-1-2-MLP Exploring a unified ASR for multiple south Indian languages leveraging multilingual acoustic and language models 97
    3-1-3-ASR Monotonic segmental attention for automatic speech recognition 197
    3-1-4-ASR STREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION 323
    3-1-5-ASR DUAL LEARNING FOR LARGE VOCABULARY ON-DEVICE ASR 27
    3-1-6-ASR STREAMING BILINGUAL END TO END ASR MODEL USING ATTENTION OVER MULTIPLE SOFTMAX 190
    3-1-7-ASR End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation 222
    3-1-8-ASR Fully Unsupervised Training of Few-Shot Keyword Spotting 127
    3-1-9-ASR Learning a Dual-Mode Speech Recognition Model via Self-Pruning 287
    3-1-10-SLP Improving Noise Robustness for Spoken Content Retrieval using semi-supervised ASR and N-best transcripts for BERT-based ranking models 170
    3-1-11-SLP A STUDY ON THE INTEGRATION OF PRE-TRAINED SSL, ASR, LM AND SLU MODELS FOR SPOKEN LANGUAGE UNDERSTANDING 264
    3-1-12-SES LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION 177
    3-1-13-ANA A MULTI-MODAL ARRAY OF INTERPRETABLE FEATURES TO EVALUATE LANGUAGE AND SPEECH PATTERNS IN DIFFERENT NEUROLOGICAL DISORDERS 107
    3-1-14-SLR THE CLEVER HANS EFFECT IN VOICE SPOOFING DETECTION 20
    3-1-15-SLR INVESTIGATING ACTIVE-LEARNING-BASED TRAINING DATA SELECTION FOR SPEECH SPOOFING COUNTERMEASURE 284
    3-1-16-DIA BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications 162
    3-1-17-TLP Efficient Text Analysis with Pre-trained Neural Network Models 300
    3-1-18-MMP ON THE USE OF MODALITY-SPECIFIC LARGE-SCALE PRE-TRAINED ENCODERS FOR MULTIMODAL SENTIMENT ANALYSIS 154
    3-1-19-EMR Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora 188
    3-1-20-TTS SIMD-SIZE AWARE WEIGHT REGULARIZATION FOR FAST NEURAL VOCODING ON CPU 64
    3-1-21-TTS Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech 219
    3-1-22-TTS Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation 352
    3-1-23-MLS An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition 17
    3-1-24-TLP Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition 275
    3-1-25-SUP On Compressing Sequences for Self-Supervised Speech Models 238

  9. Poster ID Paper Title Paper ID
    JSALT 2022 Report: Eighth Frederick Jelinek Memorial Summer Workshop
    JSALT 2022 Report: Speech Translation for Under-Resourced Languages
    JSALT 2022 Report: Multilingual and Code-Switching Speech Recognition
    JSALT 2022 Report: Leveraging Pre-Training Models for Speech Processing
    4-2-1-SUP SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning 373

  10. Poster ID Paper Title Paper ID
    4-1-1-ASR Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition 67
    4-1-2-ASR HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch 100
    4-1-3-ASR Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models 235
    4-1-4-ASR Personalization of CTC Speech Recognition Models 328
    4-1-5-MLP A Truly Multilingual First Pass and Monolingual Second Pass Streaming On-Device ASR System 108
    4-1-6-ASR UNIFIED END-TO-END SPEECH RECOGNITION AND ENDPOINTING FOR FAST AND EFFICIENT SPEECH SYSTEMS 269
    4-1-7-ASR Learning mask scalars for improved robust automatic speech recognition 293
    4-1-8-ASR An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition 163
    4-1-9-ASR Macro-block dropout for improved regularization in training end-to-end speech recognition models 348
    4-1-10-SLP On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting 214
    4-1-11-SES End-to-End Multi-speaker ASR with Independent Vector Analysis 15
    4-1-12-SES A Hybrid Acoustic Echo Reduction Approach Using Kalman Filtering and Informed Source Extraction With Improved Training 184
    4-1-13-ANA Efficient dynamic filter for robust and low computational feature extraction 148
    4-1-14-SLR HOW TO BOOST ANTI-SPOOFING WITH X-VECTORS 78
    4-1-15-SLR A COMPREHENSIVE STUDY ON SELF-SUPERVISED DISTILLATION FOR SPEAKER REPRESENTATION LEARNING 311
    4-1-16-DIA Low-Latency Speech Separation Guided Diarization for Telephone Conversations 241
    4-1-17-TLP Empirical Analysis of Training Strategies of Transformer-based Japanese Chit-chat Systems 308
    4-1-18-MMP An Analysis of Semantically-Aligned Speech-Text Embeddings 174
    4-1-19-EMR Combining Contrastive and Non-Contrastive Losses for Fine-Tuning Pretrained Models in Speech Analysis 218
    4-1-20-TTS Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss 77
    4-1-21-TTS Remap, warp and attend: Non-parallel many-to-many accent conversion with Normalizing Flows 234
    4-1-22-RES MASC: Massive Arabic Speech Corpus 39
    4-1-23-MLS PHONE-LEVEL PRONUNCIATION SCORING FOR L1 USING WEIGHTED-DYNAMIC TIME WARPING 35
    4-1-24-MLS PROFICIENCY ASSESSMENT OF L2 SPOKEN ENGLISH USING WAV2VEC 2.0 340
    4-1-25-SUP Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations 250

Technical Program Per Area

Poster ID Paper Title Authors Session
1-1-2-ASR ASBERT: ASR-SPECIFIC SELF-SUPERVISED LEARNING WITH SELF-TRAINING Hyung Yong Kim (42dot); Byeong-Yeol Kim (42dot); Seung Woo Yu (42dot); Youshin Lim (42dot); Yunkyu Lim (42dot); Hanbin Lee (42dot) Mon 9 Jan - Morning session (10:30-12:30)
1-1-3-ASR SUB-8-BIT QUANTIZATION FOR ON-DEVICE SPEECH RECOGNITION: A REGULARIZATION-FREE APPROACH Kai Zhen (Amazon); Martin Radfar (Amazon); Hieu D Nguyen (Amazon); Grant Strimel (Amazon); Athanasios Mouchtaris (Amazon); Nathan Susanj (Amazon) Mon 9 Jan - Morning session (10:30-12:30)
1-1-4-ASR G-AUGMENT: SEARCHING FOR THE META-STRUCTURE OF DATA AUGMENTATION POLICIES FOR ASR Yuan Wang (Google); Ekin D Cubuk (Google Brain); Andrew Rosenberg (Google LLC); Shuyang Cheng (Waymo LLC); Ron J Weiss (Google, Inc.); Bhuvana Ramabhadran (Google); Pedro Moreno (Google); Quoc Le (Google Brain); Daniel S Park (Google Brain) Mon 9 Jan - Morning session (10:30-12:30)
1-1-7-ASR Context-aware Neural Confidence Estimation for Rare Word Speech Recognition David Qiu (Google); Tsendsuren Munkhdalai (Google LLC); Yanzhang He (Google); Khe C Sim (Google Inc.) Mon 9 Jan - Morning session (10:30-12:30)
1-1-8-ASR Flickering reduction with partial hypothesis reranking for streaming ASR Antoine Bruguier (Google); David Qiu (Google); Trevor strohman (Google); Yanzhang He (Google) Mon 9 Jan - Morning session (10:30-12:30)
1-1-9-ASR InterDecoder: Using Attention Decoders as Intermediate Regularization for CTC-based Speech Recognition Tatsuya Komatsu (LINE Corporation); Yusuke Fujita (LINE Corporation) Mon 9 Jan - Morning session (10:30-12:30)
1-1-16-ASR CCC-WAV2VEC 2.0: CLUSTERING AIDED CROSS CONTRASTIVE SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS Vasista Sai Lodagala (Indian Institute of Technology, Madras); Sreyan Ghosh (University of Maryland, College Park); S Umesh (IIT Chennai) Mon 9 Jan - Morning session (10:30-12:30)
1-2-1-ASR JOIST: A Joint Speech and Text Streaming Model For ASR Tara Sainath (Google); Rohit Prabhavalkar (Google); Ankur Bapna (Google Research); Yu Zhang (Google); Zhouyuan Huo (Google ); Zhehuai Chen (Google); Bo Li (Google); Weiran Wang (Google); Trevor Strohman (Google, Inc.) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-3-ASR A CONTEXT-AWARE KNOWLEDGE TRANSFERRING STRATEGY FOR CTC-BASED ASR Ke-Han Lu (National Taiwan University of Science and Technology); Kuan-Yu CHEN (National Taiwan University of Science and Technology) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-4-ASR Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR Zhehuai Chen (Google); Ankur Bapna (Google Research); Andrew Rosenberg (Google LLC); Yu Zhang (Google); Bhuvana Ramabhadran (Google); Pedro Moreno (Google); Nanxin Chen (Google) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-6-ASR Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR Yusuke Fujita (LINE Corporation); Tatsuya Komatsu (LINE Corporation); Yusuke Kida (LINE Corp) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-7-ASR E-Branchformer: Branchformer with Enhanced merging for speech recognition Kwangyoun Kim (ASAPP); Felix Wu (ASAPP); Yifan Peng (Carnegie Mellon University); Jing Pan (ASAPP); Prashant Sridhar (ASAPP); Kyu Jeong Han (ASAPP); Shinji Watanabe (Carnegie Mellon University) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-8-ASR CONFORMER-BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH KD COMPRESSION AND TWO-PASS ARCHITECTURE Jinhwan Park (Samsung Research); Sichen Jin (Samsung); Junmo Park (Samsung Research); Sungsoo Kim (Samsung Electronics); Dhairya Sandhyana (Samsung Research); Changheon Lee (Samsung Electronics); Myoungji Han (Samsung Electronics); Jungin Lee (Samsung Electronics); Seokyeong Jung (Samsung Electronics); Chang Woo Han (Samsung Reserch); Chanwoo Kim (Samsung Electronics) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-9-ASR Accelerator-Aware Training for Transducer-based Speech Recognition Rupak Vignesh Swaminathan (Amazon.com); Suhaila Mumtaj Shakiah (Amazon); Hieu D Nguyen (Amazon); Raviteja chinta (Amazon.com); Tariq Afzal (Amazon.com); Nathan Susanj (Amazon.com); Athanasios Mouchtaris (Amazon.com); Grant Strimel (Amazon.com); Ariya Rastrow (Amazon Alexa) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-1-ASR Untied Positional Encodings for Efficient Transformer-based Speech Recognition Lahiru T Samarakoon (Fano Labs, Hong Kong); Ivan Fung (Fano Labs, Hong Kong) Tue 10 Jan - Morning session (10:30-12:30)
2-1-2-ASR Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio Yan Gao (University of Cambridge); Javier Fernandez-Marques (Samsung AI, Cambridge); Titouan Parcollet (); Pedro Gusmao (University of Cambridge); Nicholas Lane (University of Cambridge and Samsung AI) Tue 10 Jan - Morning session (10:30-12:30)
2-1-3-ASR Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition Peng Shen (NICT); Xugang Lu (NICT); Hisashi Kawai (NICT) Tue 10 Jan - Morning session (10:30-12:30)
2-1-4-ASR Damage Control during Domain Adaptation for Transducer Based Automatic Speech Recognition Somshubra Majumdar (NVIDIA); Shantanu Acharya (NVIDIA); Vitaly Lavrukhin (NVIDIA); Boris Ginsburg (NVIDIA) Tue 10 Jan - Morning session (10:30-12:30)
2-1-5-ASR PADA: PRUNING ASSISTED DOMAIN ADAPTATION FOR SELF-SUPERVISED SPEECH REPRESENTATIONS Vasista Sai Lodagala (Indian Institute of Technology, Madras); Sreyan Ghosh (University of Maryland, College Park); S Umesh (IIT Chennai) Tue 10 Jan - Morning session (10:30-12:30)
2-1-6-ASR MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario Fan Yu (Northwestern Polytechnical University); Shiliang Zhang (Alibaba Group); Pengcheng Guo (Northwestern Polytechnical University); Yuhao Liang (Northwestern Polytechnical University); Zhihao Du (Speech Lab, Alibaba Group); Yuxiao Lin (Zhejiang University); Lei Xie (Northwestern Polytechnical University) Tue 10 Jan - Morning session (10:30-12:30)
2-1-7-ASR Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition Aleksandr Laptev (NVIDIA, ITMO University); Boris Ginsburg (NVIDIA) Tue 10 Jan - Morning session (10:30-12:30)
2-1-9-ASR Residual Adapters for Targeted Updates in RNN-Transducer Based Speech Recognition System Sungjun Han (University of Stuttgart); Deepak Baby (Amazon Alexa); Valentin Mendelev (Amazon Alexa) Tue 10 Jan - Morning session (10:30-12:30)
2-2-1-ASR IMPROVED NOISY ITERATIVE PSEUDO-LABELING FOR SEMI-SUPERVISED SPEECH RECOGNITION Tian Li (Shumei AI Research Institute); Qingliang Meng (Shumei AI Research Institute); Yujian Sun (Shumei AI Research Institute) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-2-ASR GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION Aparna Khare (Amazon); Minhua Wu (Amazon Inc.); Saurabhchand Bhati (Johns Hopkins University ); Jasha Droppo (Amazon Inc.); Roland Maas (Amazon Inc.) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-3-ASR Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition Jakob Poncelet (KU Leuven); Hugo Van hamme (KU Leuven) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-4-ASR NAM+: TOWARDS SCALABLE END-TO-END CONTEXTUAL BIASING FOR ADAPTIVE ASR Zelin Wu (Google LLC); Tsendsuren Munkhdalai (Google LLC); Golan Pundak (Google); Khe C Sim (Google Inc.); David Li (Google LLC); Pat Rondon (Google LLC); Tara Sainath (Google) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-6-ASR Modular Hybrid Autoregressive Transducer Zhong Meng (Google LLC); Tongzhou Chen (Google); Rohit Prabhavalkar (Google); Yu Zhang (Google); Yuan Wang (Google); Kartik Audhkhasi (Google); Jesse Emond (Google LLC); Trevor Strohman (Google LLC); Bhuvana Ramabhadran (Google); W. Ronny Huang (Google); Ehsan Variani (Google); Yinghui Huang (Google); Pedro Moreno (Google) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-7-ASR How Does Pre-trained Wav2Vec 2.0 Perform on Domain-Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications Juan Pablo Zuluaga Gomez (Idiap Research Institute); Amrutha Prasad (Idiap Research Institute); Iuliia Nigmatulina (Idiap Research Institute); Seyyed Saeed Sarfjoo (Idiap Research Institute); Petr Motlicek (Idiap); Matthias Kleinert (DLR); Hartmut Helmke (DLR); Oliver Ohneiser (DLR); Qingran Zhan (Beijing Institute of Technology) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-9-ASR Internal Language Model Personalization of E2E Automatic Speech Recognition Using Random Encoder Features Adam Stooke (Google); Khe C Sim (Google Inc.); Mason Chua (Google); Tsendsuren Munkhdalai (Google LLC); Trevor Strohman (Google) Tue 10 Jan - Afternoon session (15:30-17:30)
3-1-1-ASR Towards End-to-end Unsupervised Speech Recognition Alexander H Liu (MIT); Wei-Ning Hsu (Massachusetts Institute of Technology); Michael Auli (Facebook); Alexei Baevski (Facebook AI Research) Wed 11 Jan - Morning session (10:30-12:30)
3-1-3-ASR Monotonic segmental attention for automatic speech recognition Albert Zeyer (RWTH Aachen University); Robin Schmitt (RWTH Aachen University); Wei Zhou (RWTH Aachen University); Ralf Schlüter (RWTH Aachen University); Hermann Ney ( RWTH Aachen University) Wed 11 Jan - Morning session (10:30-12:30)
3-1-4-ASR STREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION Yashesh Gaur (Microsoft); Nick Kibre (Microsoft); JIAN XUE (Microsoft Corporation); Kangyuan Shu (Microsoft); Yuhui Wang (Microsoft); Issac Alphonso (Microsoft); Jinyu Li (Microsoft); Yifan Gong (Microsoft) Wed 11 Jan - Morning session (10:30-12:30)
3-1-5-ASR DUAL LEARNING FOR LARGE VOCABULARY ON-DEVICE ASR Charles C Peyser (Google Inc.); W. Ronny Huang (Google); Tara Sainath (Google); Rohit Prabhavalkar (Google); Michael Picheny (NYU); Kyunghyun Cho (New York University) Wed 11 Jan - Morning session (10:30-12:30)
3-1-6-ASR STREAMING BILINGUAL END TO END ASR MODEL USING ATTENTION OVER MULTIPLE SOFTMAX Aditya R Patil (Microsoft); Vikas V Joshi (Microsoft); Purvi Agrawal (Microsoft); Rupesh Mehta (Microsoft) Wed 11 Jan - Morning session (10:30-12:30)
3-1-7-ASR End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation Yoshiki Masuyama (Tokyo Metropolitan University​); Xuankai Chang (Carnegie Mellon University); Samuele Cornell (Università Politecnica delle Marche); Shinji Watanabe (Carnegie Mellon University); Nobutaka Ono (Tokyo Metropolitan University) Wed 11 Jan - Morning session (10:30-12:30)
3-1-8-ASR Fully Unsupervised Training of Few-Shot Keyword Spotting Minchan Kim (Seoul National University); Dongjune Lee (Seoul National University); Sung Hwan Mun (Seoul National University); Min Hyun Han (Seoul National University); Nam Soo Kim (Seoul National University) Wed 11 Jan - Morning session (10:30-12:30)
3-1-9-ASR Learning a Dual-Mode Speech Recognition Model via Self-Pruning Chunxi Liu (Meta AI); Yuan Shangguan (Meta AI); Haichuan Yang (Meta); Yangyang Shi (Facebook); Raghuraman Krishnamoorthi (Facebook); Ozlem Kalinli (Meta AI) Wed 11 Jan - Morning session (10:30-12:30)
4-1-1-ASR Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition Ji Won Yoon (Seoul National University); Beom Jun Woo (Seoul National University); Sunghwan Ahn (Seoul National University); Hyeonseung Lee (Seoul National University); Nam Soo Kim (Seoul National University) Thu 12 Jan - Morning session (10:30-12:30)
4-1-2-ASR HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch Tina Raissi (RWTH Aachen University); Wei Zhou (RWTH Aachen University); Simon Berger (RWTH Aachen University); Ralf Schlüter (RWTH Aachen University); Hermann Ney ( RWTH Aachen University) Thu 12 Jan - Morning session (10:30-12:30)
4-1-3-ASR Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models Vrunda N Sukhadia (Indian Institute Of Technology Madras); S Umesh (IIT Chennai) Thu 12 Jan - Morning session (10:30-12:30)
4-1-4-ASR Personalization of CTC Speech Recognition Models Saket Dingliwal (Amazon); Monica Sunkara (Amazon); Sravan Babu Bodapati (Amazon); Srikanth Ronanki (Amazon); Jeff Farris (Amazon); Katrin Kirchhoff (Amazon) Thu 12 Jan - Morning session (10:30-12:30)
4-1-6-ASR UNIFIED END-TO-END SPEECH RECOGNITION AND ENDPOINTING FOR FAST AND EFFICIENT SPEECH SYSTEMS Shaan Bijwadia (Google); Shuo-yiin Chang (Google); Tara Sainath (Google); Bo Li (Google); Chao Zhang (Google); Yanzhang He (Google) Thu 12 Jan - Morning session (10:30-12:30)
4-1-7-ASR Learning mask scalars for improved robust automatic speech recognition Arun Narayanan (Google Inc.); James Walker (Google Llc.); SANKARAN PANCHAPAGESAN (Google, LLC); Nathan Howard (Google Llc.); Yuma Koizumi (Google) Thu 12 Jan - Morning session (10:30-12:30)
4-1-8-ASR An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition Niko Moritz (Meta); Frank Seide (Meta); Duc Le (Meta); Jay Mahadeokar (Meta AI); Christian Fuegen (Facebook) Thu 12 Jan - Morning session (10:30-12:30)
4-1-9-ASR Macro-block dropout for improved regularization in training end-to-end speech recognition models Chanwoo Kim (Samsung Electronics); Sathish Indurti (Samsung Research); Jinhwan Park (Samsung Research); Wonyong Sung (Seoul national university) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-1-10-SLP Automatic Rating of Spontaneous Speech for Low-Resource Languages Yaroslav Getman (Aalto University); Ragheb Al-Ghezi (Aalto University); Ekaterina Voskoboinik (Aalto University); Mittul Singh (Silo AI); Mikko Kurimo (Aalto University) Mon 9 Jan - Morning session (10:30-12:30)
1-1-11-SLP Mixture of Domain Experts for Language Understanding: An Analysis of Modularity, Task Performance, and Memory Tradeoffs Benjamin Kleiner (AWS AI Labs); Jack FitzGerald (Amazon Alexa Artificial Intelligence); Haidar Khan (Amazon Alexa AI); Gokhan Tur ( Amazon Alexa AI) Mon 9 Jan - Morning session (10:30-12:30)
1-2-10-SLP A DATA-DRIVEN INVESTIGATION OF NOISE-ADAPTIVE UTTERANCE GENERATION WITH LINGUISTIC MODIFICATION Anupama Chingacham (Saarland University); Vera Demberg (Dept. of Mathematics and Computer Science, Saarland University); Dietrich Klakow (Saarland University) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-11-SLP On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding Gaëlle Laperrière (LIA - Avignon University); Valentin Pelloin (LIUM, Le Mans Université); Mickael Rouvier (LIA - Avignon University); Themos Stafylakis (Omilia - Conversational Intelligence); Yannick Estève (LIA - Avignon University) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-10-SLP Response Timing Estimation for Spoken Dialog Systems based on Syntactic Completeness Prediction Jin Sakuma (Waseda University); Shinya Fujie (Chiba Institute of Technology); Tetsunori Kobayashi (Waseda University) Tue 10 Jan - Morning session (10:30-12:30)
2-1-11-SLP Weak-Supervised Dysarthria-invariant Features for Spoken Language Understanding using an FHVAE and Adversarial Training Jinzi Qi (KULeuven); Hugo Van hamme (KU LEUVEN) Tue 10 Jan - Morning session (10:30-12:30)
2-2-10-SLP Building Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems Hong Liu (Tsinghua University); Yucheng Cai (tsinghua university); Zhijian Ou (Tsinghua University); Yi Huang (China Mobile Research); Junlan Feng (China Mobile Research) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-11-SLP NON-AUTOREGRESSIVE END-TO-END APPROACHES FOR JOINT AUTOMATIC SPEECH RECOGNITION AND SPOKEN LANGUAGE UNDERSTANDING Mohan LI (Toshiba Europe Ltd); Rama S Doddipatla (Toshiba Europe LTD) Tue 10 Jan - Afternoon session (15:30-17:30)
3-1-10-SLP Improving Noise Robustness for Spoken Content Retrieval using semi-supervised ASR and N-best transcripts for BERT-based ranking models Yasufumi Moriya (Dublin City University); Gareth Jones (Dublin City University) Wed 11 Jan - Morning session (10:30-12:30)
3-1-11-SLP A STUDY ON THE INTEGRATION OF PRE-TRAINED SSL, ASR, LM AND SLU MODELS FOR SPOKEN LANGUAGE UNDERSTANDING Yifan Peng (Carnegie Mellon University); Siddhant Arora (Carnegie Mellon University); Yosuke Higuchi (Waseda University); Yushi Ueda (Carnegie Mellon University); Sujay Kumar (Carnegie Mellon University); Karthik Ganesan (Carnegie Mellon University); Siddharth Dalmia (Carnegie Mellon University); Xuankai Chang (Carnegie Mellon University); Shinji Watanabe (Carnegie Mellon University) Wed 11 Jan - Morning session (10:30-12:30)
4-1-10-SLP On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting Yuan-Kuei Wu (National Taiwan University); Wei-Tsung Kao (National Taiwan University); Hung-yi Lee (National Taiwan University); Chia-Ping Chen (intelliGo Technology inc.); Zhi-Sheng Chen (intelliGo Technology inc.); Yu-Pao Tsai (intelliGo Technology inc.) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-1-12-SES MULTI-STAGE PROGRESSIVE AUDIO BANDWIDTH EXTENSION liang wen (samsung electronics); Lizhong Wang (Samsung); Ying Zhang (Samsung Electronics); Kwang Pyo Choi (Samsung Electronics) Mon 9 Jan - Morning session (10:30-12:30)
1-1-13-SES JOINT OPTIMIZATION OF DIFFUSION PROBABILISTIC-BASED MULTICHANNEL SPEECH ENHANCEMENT WITH FAR-FIELD SPEAKER VERIFICATION Sandipana Dowerah (Inria); romain serizel (Université de Lorraine); Denis Jouvet (LORIA); Mohammad Mohammadamini (Laboratoire Informatique d’Avignon, University of Avignon); Driss Matrouf (Laboratoire Informatique d’Avignon, University of Avignon) Mon 9 Jan - Morning session (10:30-12:30)
1-2-12-SES Spatial-DCCRN: DCCRN Equipped with Frame-level Angle Feature and Hybrid Filtering for Multi-channel Speech Enhancement Shubo Lv (Shaanxi Provincial Key Laboratory of Speech and Image Information Processing, School of Computer Science, Northwestern Polytechnical University); Yihui Fu (Northwestern Polytechnical University); Yukai Ju (Northwestern Polytechnical University); Lei Xie (NWPU); Weixin Zhu (Tencent); Wei Rao (Tencent); Yannan Wang (Tencent) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-13-SES IMPROVED NORMALIZING FLOW-BASED SPEECH ENHANCEMENT USING AN ALL-POLE GAMMATONE FILTERBANK FOR CONDITIONAL INPUT REPRESENTATION Martin Strauss (International Audio Laboratories Erlangen); Matteo Torcoli (International Audio Laboratories Erlangen); Bernd Edler (International Audio Laboratories Erlangen) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-12-SES Exploring WavLM on Speech Enhancement Hyungchan Song (Gwangju Institute of Science and Technology); Sanyuan Chen (Harbin Institute of Technology); Zhuo Chen (Microsoft); Yu Wu (Microsoft Research Asia); Takuya Yoshioka (Microsoft); Min Tang (Microsoft); Jong Won Shin (Gwangju Institute of Science and Technology); Shujie Liu (Microsoft Research Asia) Tue 10 Jan - Morning session (10:30-12:30)
2-1-13-SES Adaptive-FSN: Integrating full-band extraction and adaptive sub-band encoding for monaural speech enhancement Yu-Sheng Tsao (National Taiwan Normal University); Kuan-Hsun Ho (NTNU); Jeih-weih Hung (National Chi Nan University); Berlin Chen (National Taiwan Normal University) Tue 10 Jan - Morning session (10:30-12:30)
2-1-26-SES AVSE CHALLENGE: AUDIO-VISUAL SPEECH ENHANCEMENT CHALLENGE Andrea L Aldana (Edinburgh University); Cassia Valentini (University of Edinburgh); Ondrej Klejch (University of Edinburgh); Mandar Gogate (Edinburgh Napier University ); Kia K Dashtipour (Edinburgh Napier University); Amir Hussein (Edinburgh Napier University); Peter Bell (University of Edinburgh ) Tue 10 Jan - Morning session (10:30-12:30)
2-2-12-SES TEA-PSE 2.0: SUB-BAND NETWORK FOR REAL-TIME PERSONALIZED SPEECH ENHANCEMENT Yukai Ju (Northwestern Polytechnical University); Shimin Zhang (Northwestern Polytechnical University); Wei Rao (Tencent); Yannan Wang (Tencent); Tao Yu (Tencent); Lei Xie (NWPU); Shi-dong Shang (tencent) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-13-SES EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers Soumi Maiti (CMU); Yushi Ueda (CMU); Shinji Watanabe (CMU); chunlei zhang (Tencent AI Lab); Meng Yu (Tencent); Shixiong Zhang (Tencent); Yong Xu (Tecent) Tue 10 Jan - Afternoon session (15:30-17:30)
3-1-12-SES LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION Qinghua Liu (Tianjin University); Yating Huang (Institute of Automation, Chinese Academy of Sciences (CASIA)); Yunzhe Hao (Institute of Automation,Chinese Academy of Science); Jiaming Xu (Institute of Automation Chinese Academy of Sciences); Bo Xu (Institute of Automation, Chinese Academy of Sciences) Wed 11 Jan - Morning session (10:30-12:30)
4-1-11-SES End-to-End Multi-speaker ASR with Independent Vector Analysis Robin Scheibler (LINE Corporation); Wangyou Zhang (Shanghai Jiao Tong University); Xuankai Chang (Carnegie Mellon University); Shinji Watanabe (Carnegie Mellon University); Yanmin Qian (Shanghai Jiao Tong University) Thu 12 Jan - Morning session (10:30-12:30)
4-1-12-SES A Hybrid Acoustic Echo Reduction Approach Using Kalman Filtering and Informed Source Extraction With Improved Training Wolfgang Mack (AudioLabs Erlangen); Emanuel Habets (AudioLabs Erlangen) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-1-14-ANA Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation Chendong Zhao (The Shenzhen International Graduate School, Tsinghua University, China); Jianzong Wang (Ping An Technology (Shenzhen) Co., Ltd); Xiaoyang Qu (Ping An Technology (Shenzhen) Co., Ltd); Haoqian Wang (Tsinghua Shenzhen International Graduate School, Tsinghua University); Jing Xiao (Ping An Insurance (Group) Company of China) Mon 9 Jan - Morning session (10:30-12:30)
1-2-14-ANA VSAMETER: EVALUATION OF A NEW OPEN-SOURCE TOOL TO MEASURE VOWEL SPACE AREA AND RELATED METRICS Tianyu Cao (Johns Hopkins University); Laureano Moro-Velazquez (Johns Hopkins University); Piotr Żelasko (Meaning); Jesús Villalba (Johns Hopkins University); Najim Dehak (Johns Hopkins University) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-14-ANA INVESTIGATING THE IMPORTANT TEMPORAL MODULATIONS FOR DEEP-LEARNING-BASED SPEECH ACTIVITY DETECTION Tyler Vuong (Carnegie Mellon University); Nikhil Madaan (Carnegie Mellon University); Rohan Panda (Carnegie Mellon University); Richard M Stern (Carnegie Mellon University) Tue 10 Jan - Morning session (10:30-12:30)
3-1-13-ANA A MULTI-MODAL ARRAY OF INTERPRETABLE FEATURES TO EVALUATE LANGUAGE AND SPEECH PATTERNS IN DIFFERENT NEUROLOGICAL DISORDERS Anna Favaro (Johns Hopkins University); Chelsie Motley (Johns Hopkins University); Tianyu Cao (Johns Hopkins University); Miguel Iglesias (Johns Hopkins University); Ankur Butala (Johns Hopkins University); Esther S. Oh (Johns Hopkins University); Robert Stevens (Johns Hopkins Hospital); Jesús Villalba (Johns Hopkins University); Najim Dehak (Johns Hopkins University); Laureano Moro-Velazquez (Johns Hopkins University) Wed 11 Jan - Morning session (10:30-12:30)
4-1-13-ANA Efficient dynamic filter for robust and low computational feature extraction Donghyeon Kim (Korea university); Jeong-gi Kwak (Korea University); Hanseok Ko (Korea University) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-2-15-SLR FREQUENCY AND MULTI-SCALE SELECTIVE KERNEL ATTENTION FOR SPEAKER VERIFICATION Sung Hwan Mun (Seoul National University); Jee-weon Jung (Naver Corporation); Min Hyun Han (Seoul National University); Nam Soo Kim (Seoul National University) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-15-SLR AN ATTENTION-BASED BACKEND ALLOWING EFFICIENT FINE-TUNING OF TRANSFORMER MODELS FOR SPEAKER VERIFICATION Junyi Peng (Brno University of Technology); Oldrich Plchot (Brno University of Technology ); Themos Stafylakis (Omilia - Conversational Intelligence); Ladislav Mosner (Brno University of Technology ); Lukas Burget (Brno University of Technology ); Jan Cernocky (Brno University of Technology ) Tue 10 Jan - Morning session (10:30-12:30)
2-2-14-SLR Flow-ER: a Flow-based Embedding Regularization Strategy for Robust Speech Representation Learning Woo Hyun Kang (Computer Research Institute of Montreal); Jahangir Alam (Computer Research Institute of Montreal (CRIM), Montreal (Quebec) Canada); Abderrahim Fathan (Computer Research Institute of Montreal (CRIM), Montreal, Quebec, Canada) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-15-SLR UNSUPERVISED DOMAIN ADAPTATION OF NEURAL PLDA USING SEGMENT PAIRS FOR SPEAKER VERIFICATION İsmail Rasim Ülgen (Sestek - Boğaziçi University); Mustafa Levent Arslan (Sestek - Boğaziçi Üniversitesi) Tue 10 Jan - Afternoon session (15:30-17:30)
3-1-14-SLR THE CLEVER HANS EFFECT IN VOICE SPOOFING DETECTION Bhusan Chettri (Borac Solutions) Wed 11 Jan - Morning session (10:30-12:30)
3-1-15-SLR INVESTIGATING ACTIVE-LEARNING-BASED TRAINING DATA SELECTION FOR SPEECH SPOOFING COUNTERMEASURE Xin Wang (National Institute of Informatics); Junichi Yamagishi (National Institute of Informatics) Wed 11 Jan - Morning session (10:30-12:30)
4-1-14-SLR HOW TO BOOST ANTI-SPOOFING WITH X-VECTORS Xinyue Ma (Tsinghua University); Shanshan Zhang (Tencent Research); Shen Huang (Tencent Research); Ji Gao (Tencent Research); Ying Hu (Xinjiang University); Liang HE (Tsinghua University) Thu 12 Jan - Morning session (10:30-12:30)
4-1-15-SLR A COMPREHENSIVE STUDY ON SELF-SUPERVISED DISTILLATION FOR SPEAKER REPRESENTATION LEARNING Zhengyang Chen (Shanghai Jiao Tong University); Yao Qian (Microsoft); Bing Han (Shanghai Jiao Tong University); Yanmin Qian (Shanghai Jiao Tong University); Michael Zeng (Microsoft) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-2-16-DIA Joint speaker diarisation and tracking in switching state-space model Jeremy H. M. Wong (Institute for Infocomm Research); Yifan Gong (Microsoft) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-16-DIA Diarisation using location tracking with agglomerative clustering Jeremy H. M. Wong (Institute for Infocomm Research); Igor Abramovski (Microsoft); Xiong Xiao (Microsoft); Yifan Gong (Microsoft) Tue 10 Jan - Morning session (10:30-12:30)
2-2-5-DIA Continual Self-supervised Domain Adaptation for End-to-end Speaker Diarization Juan Manuel Coria (Université Paris-Saclay CNRS, LISN); Hervé Bredin (CNRS); Sahar Ghannay (Université Paris-Saclay CNRS, LISN); Sophie Rosset (LISN) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-16-DIA Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization Shota Horiguchi (Hitachi, Ltd.); Yuki Takashima (Hitachi, Ltd.); Shinji Watanabe (Carnegie Mellon University); Paola Garcia (Johns Hopkins University) Tue 10 Jan - Afternoon session (15:30-17:30)
3-1-16-DIA BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications Juan Pablo Zuluaga Gomez (Idiap Research Institute); Seyyed Saeed Sarfjoo (Idiap Research Institute); Amrutha Prasad (Idiap Research Institute); Iuliia Nigmatulina (Idiap Research Institute); Petr Motlicek (Idiap); Karel Ondrej (BUT); Oliver Ohneiser (DLR); Hartmut Helmke (DLR) Wed 11 Jan - Morning session (10:30-12:30)
4-1-16-DIA Low-Latency Speech Separation Guided Diarization for Telephone Conversations Giovanni Morrone (Università Politecnica delle Marche); Samuele Cornell (Università Politecnica delle Marche); Desh Raj (Johns Hopkins University); Luca Serafini (Università Politecnica delle Marche); Enrico Zovato (PerVoice S.p.A.); Alessio Brutti (FBK); Stefano Squartini (Università Politecnica delle Marche) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-1-17-TLP Fine Grained Spoken Document Summarization Through Text Segmentation Samantha Kotey (Trinity College Dublin); Rozenn Dahyot (Maynooth University); Naomi Harte (Trinity College Dublin) Mon 9 Jan - Morning session (10:30-12:30)
1-2-17-TLP AN ANALYSIS OF THE EFFECTS OF DECODING ALGORITHMS ON FAIRNESS IN OPEN-ENDED LANGUAGE GENERATION Jwala Dhamala (Amazon Alexa AI); Varun Kumar (Amazon Alexa ); Rahul Gupta (Amazon); Kai-Wei Chang (UCLA); Aram Galstyan (USC Information Sciences Institute) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-17-TLP N-BEST HYPOTHESES RERANKING FOR TEXT-TO-SQL SYSTEMS Lu Zeng (Amazon); Sree Hari Krishnan Parthasarathi (Amazon); Dilek Z Hakkani-Tur (Amazon Alexa AI) Tue 10 Jan - Morning session (10:30-12:30)
3-1-17-TLP Efficient Text Analysis with Pre-trained Neural Network Models Jia Cui (Tencent ); Heng Lu (Tencent AI Lab); Wenjie Wang (Emory University); Shiyin Kang (Tencent); Liqiang He (Tencent); Guangzhi Li (Tencent); Dong Yu (Tencent AI Lab) Wed 11 Jan - Morning session (10:30-12:30)
2-1-24-TLP Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition Sharman W Tan (Microsoft); Piyush Behre (Microsoft); Nick Kibre (Microsoft); Issac Alphonso (Microsoft); Shawn Chang () Wed 11 Jan - Morning session (10:30-12:30)
4-1-17-TLP Empirical Analysis of Training Strategies of Transformer-based Japanese Chit-chat Systems Hiroaki Sugiyama (NTT); Masahiro Mizukami (NTT); Tsunehiro Arimoto (NTT); Hiromi Narimatsu (NTT); Yuya Chiba (NTT); Hideharu Nakajima (NTT); Toyomi Meguro (NTT) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-1-18-MMP Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection Xuanjun Chen (National Taiwan University); Haibin Wu (National Taiwan University); Hung-yi Lee (National Taiwan University); Helen Meng (The Chinese University of Hong Kong); Roger Jang () Mon 9 Jan - Morning session (10:30-12:30)
1-1-19-MMP Towards visually prompted keyword localisation for zero-resource spoken languages Leanne Nortje (Stellenbosch University); Herman Kamper (Stellenbosch University) Mon 9 Jan - Morning session (10:30-12:30)
1-2-18-MMP Exploiting information from native data for non-native automatic pronunciation assessment Binghuai Lin (MIG, Tencent Science and Technology Ltd.); Liyuan wang (Tencent Technology Co., Ltd) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-8-MMP TRANSFORMER-BASED LIP-READING WITH REGULARIZED DROPOUT AND RELAXED ATTENTION Zhengyang Li (Technische Universität Carolo-Wilhelmina Braunschweig); Timo Lohrenz (Technische Universität Carolo-Wilhelmina Braunschweig); Matthias Dunkelberg (Technische Universität Carolo-Wilhelmina Braunschweig); Tim Fingscheidt (Technische Universität Carolo-Wilhelmina Braunschweig) Tue 10 Jan - Morning session (10:30-12:30)
2-1-18-MMP SpeechCLIP: Integrating Speech with Pre-trained Vision and Language Model Yi-Jen Shih (National Taiwan University); Hsuan-Fu Wang (Academia Sinica); Heng-Jui Chang (Massachusetts Institute of Technology); Layne Berry (University of Texas at Austin); Hung-yi Lee (National Taiwan University); David Harwath (The University of Texas at Austin) Tue 10 Jan - Morning session (10:30-12:30)
2-2-18-MMP YFACC: A Yorùbá Speech-Image Dataset for Cross-lingual Keyword Localisation through Visual Grounding Kayode K Olaleye (University of Stellenbosch); Dan Oneață (University Politehnica of Bucharest); Herman Kamper (Stellenbosch University) Tue 10 Jan - Afternoon session (15:30-17:30)
3-1-18-MMP ON THE USE OF MODALITY-SPECIFIC LARGE-SCALE PRE-TRAINED ENCODERS FOR MULTIMODAL SENTIMENT ANALYSIS Atsushi Ando (NTT Corporation); Ryo Masumura (NTT Corporation); Akihiko Takashima (NTT); Satoshi Suzuki (NTT Computer and Data Science Laboratories / The University of Electro-Communications); Naoki Makishima (NTT Corporation); Keita Suzuki (NTT Corporation); Takafumi Moriya (NTT Corporation); Takanori Ashihara (NTT Corporation); Hiroshi Sato (NTT Corporation) Wed 11 Jan - Morning session (10:30-12:30)
4-1-18-MMP An Analysis of Semantically-Aligned Speech-Text Embeddings Muhammad Huzaifah (Institute for Infocomm Research, ASTAR); Ivan Kukanov (Institute for Infocomm Research, ASTAR) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-1-1-MLP Exploration of Language-Specific Self-Attention Parameters for Multilingual End-to-End Speech Recognition Brady Houston (AWS AI Labs); Katrin Kirchhoff (Amazon) Mon 9 Jan - Morning session (10:30-12:30)
1-1-5-MLP How Do Phonological Properties Affect Bilingual Automatic Speech Recognition? Shelly Jain (International Institute of Information Technology, Hyderabad); Aditya Yadavalli (International Institute of Information Technology, Hyderabad); Sai Ganesh Mirishkar (IIIT Hyderabad); Anil Vuppala (International Institute of Information Technology Hyderabad) Mon 9 Jan - Morning session (10:30-12:30)
1-1-6-MLP Scaling Up Deliberation for Multilingual ASR Ke Hu (Google); Tara Sainath (Google); Bo Li (Google) Mon 9 Jan - Morning session (10:30-12:30)
1-2-2-MLP Code-switched language modelling using a code predictive LSTM in under-resourced South African languages Joshua Miles Jansen Van Vüren (Stellenbosch University); Thomas Niesler (Stellenbosch University) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-5-MLP IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS Le Minh Nguyen (University of Groningen); Shekhar Nayak (University of Groningen); Matt Coler (University of Groningen) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-19-MLP Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition Amir Hussein (Johns Hopkins University); Shammur Chowdhury (QCRI); Ahmed Abdelali (QCRI); Najim Dehak (Johns Hopkins University); Ahmed Ali (Qatar Computing Research Institute, HBKU); Sanjeev Khudanpur (Johns Hopkins University) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-23-MLP FLEURS: FEW-SHOT LEARNING EVALUATION OF UNIVERSAL REPRESENTATIONS OF SPEECH Alexis Conneau (FAIR); Min Ma (Google Research); Simran Khanuja (Google); Yu Zhang (Google); Vera Axelrod (Google, Inc); Siddharth Dalmia (Carnegie Mellon University ); Jason Riesa (Google); Clara Rivera (Google); Ankur Bapna (Google Research) Tue 10 Jan - Morning session (10:30-12:30)
2-2-8-MLP Improving Semi-supervised E2E ASR using CycleGAN and Inter-domain Losses Chia-Yu Li (Institute for Natural Language Processing (IMS), University of Stuttgart); Ngoc Thang Vu (University of Stuttgart) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-19-MLP MULTILINGUAL SPEECH EMOTION RECOGNITION WITH MULTI-GATING MECHANISM AND NEURAL ARCHITECTURE SEARCH Zihan Wang (Columbia University); Qi Meng (Columbia University); Haifeng Lan (Columbia University ); xinrui zhang (Columbia University); Kehao Guo (Columbia University); Akshat Gupta (JPMorgan) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-22-MLP Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using $\beta$-VAE Hui Lu (The Chinese University of Hong Kong); Disong Wang (The Chinese University of Hong Kong); Xixin Wu (The Chinese University of Hong Kong); Zhiyong Wu (Tsinghua University); Xunying Liu (The Chinese University of Hong Kong); Helen Meng (The Chinese University of Hong Kong) Tue 10 Jan - Afternoon session (15:30-17:30)
3-1-2-MLP Exploring a unified ASR for multiple south Indian languages leveraging multilingual acoustic and language models ANOOP C. S. (Indian Institute of Science, Bengaluru); Ramakrishnan A G (INDIAN INSTITUTE OF SCIENCE) Wed 11 Jan - Morning session (10:30-12:30)
4-1-5-MLP A Truly Multilingual First Pass and Monolingual Second Pass Streaming On-Device ASR System Sepand Mavandadi (Google); Bo Li (Google); Chao Zhang (Google); Brian Farris (Google); Tara Sainath (Google); Trevor Strohman‎ (Google) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-1-20-EMR SPEECH EMOTION RECOGNITION WITH COMPLEMENTARY ACOUSTIC REPRESENTATIONS Xiaoming Zhang (Nanjing University of Technology); Fan Zhang (IBM Massachusetts Labratory); Xiaodong Cui (IBM T. J. Watson Research Center); Wei Zhang (Wayfair) Mon 9 Jan - Morning session (10:30-12:30)
1-2-20-EMR A ZERO-SHOT APPROACH TO IDENTIFYING CHILDREN’S SPEECH IN AUTOMATIC GENDER CLASSIFICATION Amruta Saraf (Pindrop); Ganesh Sivaraman (Pindrop); Elie Khoury (Pindrop) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-19-EMR Distribution-based Emotion Recognition in Conversation Wen Wu (University of Cambridge); Chao Zhang (University of Cambridge); Phil Woodland (Machine Intelligence Laboratory, Cambridge University Department of Engineering) Tue 10 Jan - Morning session (10:30-12:30)
3-1-19-EMR Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora Yuanchao Li (University of Edinburgh); Yumnah Mohamied (University of Edinburgh); Peter Bell (University of Edinburgh ); Catherine Lai (University of Edinburgh) Wed 11 Jan - Morning session (10:30-12:30)
4-1-19-EMR Combining Contrastive and Non-Contrastive Losses for Fine-Tuning Pretrained Models in Speech Analysis Florian Lux (University of Stuttgart); Ching-Yi Chen (University of Stuttgart); Ngoc Thang Vu (University of Stuttgart) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-1-21-TTS WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration Yuma Koizumi (Google); Kohei Yatabe (Tokyo University of Agriculture and Technology); Heiga Zen (Google); Michiel Bacchiani (Google) Mon 9 Jan - Morning session (10:30-12:30)
1-1-22-TTS On granularity of prosodic representations in expressive text-to-speech Mikolaj Babianski (Amazon); Kamil Pokora (Amazon); Raahil Shah (Amazon); Rafał Sienkiewicz (Amazon); Daniel Korzekwa (Amazon); Viacheslav Klimkov (Amazon) Mon 9 Jan - Morning session (10:30-12:30)
1-1-23-TTS Can we use Common Voice to train a Multi-Speaker TTS system? Sewade O Ogun (Inria); Vincent Colotte (LORIA); Emmanuel Vincent (Inria) Mon 9 Jan - Morning session (10:30-12:30)
1-2-21-TTS GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models Matthew Baas (Stellenbosch University); Herman Kamper (Stellenbosch University) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-22-TTS Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy Sarina Meyer (University of Stuttgart); Pascal Tilli (University of Stuttgart); Pavel Denisov (University of Stuttgart); Florian Lux (University of Stuttgart); Julia Koch (University of Stuttgart); Ngoc Thang Vu (University of Stuttgart) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-20-TTS StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models Yinghao A Li (Columbia University); Cong Han (Columbia Univeristy); Nima Mesgarani (Columbia University) Tue 10 Jan - Morning session (10:30-12:30)
2-1-21-TTS Learning accent representation with multi-level VAE towards controllable speech synthesis Jan Melechovsky (Singapore University of Technology and Design); Ambuj Mehrish (SUTD); Dorien Herremans (Singapore University of Technology and Design); Berrak Sisman (Singapore University of Technology and Design (SUTD)) Tue 10 Jan - Morning session (10:30-12:30)
2-1-22-TTS vTTS: visual-text to speech Yoshifumi Nakano (The University of Tokyo); Takaaki Saeki (The University of Tokyo); Shinnosuke Takamichi (The University of Tokyo); Katsuhito Sudoh (Nara Institute of Science and Techonology); Hiroshi Saruwatari (The University of Tokyo) Tue 10 Jan - Morning session (10:30-12:30)
2-2-20-TTS Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech Dominik Wagner (Technische Hochschule Nuernberg Georg Simon Ohm); Sebastian P Bayerl (Technische Hochschule Nürnberg Georg Simon Ohm); Hector Cordourier (Intel); Tobias Bocklet (TH Nürnberg ) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-21-TTS Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion Ding Ma (Nagoya University); Lester Phillip G Violeta (Nagoya University); Kazuhiro Kobayashi (Nagoya University); Tomoki Toda (Nagoya University) Tue 10 Jan - Afternoon session (15:30-17:30)
3-1-20-TTS SIMD-SIZE AWARE WEIGHT REGULARIZATION FOR FAST NEURAL VOCODING ON CPU Hiroki Kanagawa (NTT Corporation); Yusuke Ijima (NTT Corporation) Wed 11 Jan - Morning session (10:30-12:30)
3-1-21-TTS Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech Florian Lux (University of Stuttgart); Julia Koch (University of Stuttgart); Ngoc Thang Vu (University of Stuttgart) Wed 11 Jan - Morning session (10:30-12:30)
3-1-22-TTS Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation Rendi Chevi (Kata.ai); Radityo Eko Prasojo (Kata.ai); Alham Fikri Aji (Amazon); Andros Tjandra (Meta AI, US); Sakriani Sakti (Japan Advanced Institute of Science and Technology) Wed 11 Jan - Morning session (10:30-12:30)
4-1-20-TTS Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss Efthymios Georgiou (National Technical University of Athens); Kosmas Kritsis (Athena Research Center); Georgios Paraskevopoulos (National Technical University of Athens); Athanasios Katsamanis ("ATHENA R.C., Behavioral Signal Technologies"); Vassilis Katsouros (Athena Research Center); Alexandros Potamianos (National Technical University of Athens) Thu 12 Jan - Morning session (10:30-12:30)
4-1-21-TTS Remap, warp and attend: Non-parallel many-to-many accent conversion with Normalizing Flows Abdelhamid Ezzerg (Amazon); Thomas Merritt (Amazon); Kayoko Yanagisawa (Amazon); Piotr Bilinski (Amazon); Magdalena Proszewska (Jagiellonian University); Kamil Pokora (Amazon); Renard Korzeniowski (Amazon); Roberto Barra-Chicote (Amazon); Daniel Korzekwa (amazon) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-2-23-RES STOP: A DATASET FOR SPOKEN TASK ORIENTED SEMANTIC PARSING Paden Tomasello (Meta); Akshat Shrivastava (Meta); Daniel A Lazar (Meta); Po-chun Hsu (Meta); Duc Le (Meta); Adithya Sagar (Facebook AI); Ali Elkahky (Meta); Jade Copet (Meta); Wei-Ning Hsu (Massachusetts Institute of Technology); Yossi Adi (Facebook AI Research ); Robin Algayres (Meta); Tu Anh Nguyen (Meta); Emmanuel Dupoux (Facebook AI Research); Luke Zettlemoyer (Facebook); Abdel-rahman Mohamed (Facebook AI Research (FAIR)) Mon 9 Jan - Afternoon session (15:00-17:00)
2-2-23-RES Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition Injy Hamed (New York University Abu Dhabi; Stuttgart University); Amir Hussein (Johns Hopkins University); Oumnia Chellah (Stanford University); Shammur Chowdhury (QCRI); Hamdy Mubarak (Qatar Computing Research Institute, HBKU); Sunayana Sitaram (Microsoft Research); Nizar Habash (); Ahmed Ali (Qatar Computing Research Institute, HBKU) Tue 10 Jan - Afternoon session (15:30-17:30)
4-1-22-RES MASC: Massive Arabic Speech Corpus Mohammad Al-Fetyani (Appswave); Mohammad AlBarham (Appswave); Gheith A. Abandah (); Adham Alsharkawi (The University of Jordan); Maha Dawas (Planning and Statistics Authority) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
1-1-15-MLS Speed-Robust Keyword Spotting via Soft Self-Attention on Multi-Scale Features Chaoyue Ding (SenseTime Group Limited); Jiakui Li (SenseTime Group Limited); Martin Zong (SenseTime Group Limited); Baoxiang Li (SenseTime Group Limited) Mon 9 Jan - Morning session (10:30-12:30)
1-1-24-MLS Distilling Sequence-to-Sequence Voice Conversion Models For Streaming Conversion Applications Kou Tanaka (NTT corpration); Hirokazu Kameoka (NTT Communication Science Laboratories, NTT Corporation); Takuhiro Kaneko (NTT Corporation); Shogo Seki (NTT Corporation) Mon 9 Jan - Morning session (10:30-12:30)
1-1-25-MLS AUTOMATIC PREDICTION OF INTELLIGIBILITY OF WORDS AND PHONEMES PRODUCED ORALLY BY JAPANESE LEARNERS OF ENGLISH Nobuaki Minematsu (The University of Tokyo); Chuanbo Zhu (The University of Tokyo); Takuya Kunihara (The University of Tokyo); Daisuke Saito (The University of Tokyo); Noriko Nakanishi (Kobe Gakuin University) Mon 9 Jan - Morning session (10:30-12:30)
1-2-24-MLS SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning Zuheng Kang (Ping An Technology (Shenzhen) Co., Ltd); Jianzong Wang (Ping An Technology (Shenzhen) Co., Ltd); Junqing Peng (Ping An Technology (Shenzhen) Co., Ltd); Jing Xiao (Ping An Insurance (Group) Company of China) Mon 9 Jan - Afternoon session (15:00-17:00)
1-2-25-MLS PEPPANET: EFFECTIVE MISPRONUNCIATION DETECTION AND DIAGNOSIS LEVERAGING PHONETIC, PHONOLOGICAL, AND ACOUSTIC CUES Bi-Cheng Yan (National Taiwan Normal University ); Hsin-Wei Wang (National Taiwan Normal University); Berlin Chen (National Taiwan Normal University) Mon 9 Jan - Afternoon session (15:00-17:00)
2-1-24-MLS Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection Samuele Cornell (Università Politecnica delle Marche); Thomas Balestri (Amazon); Thibaud Senechal (Amazon) Tue 10 Jan - Morning session (10:30-12:30)
2-2-17-MLS TDOA ESTIMATION OF SPEECH SOURCE IN NOISY REVERBERANT ENVIRONMENTS Suliang Bu (University of Missouri); Tuo Zhao (University of Missouri); Yunxin Zhao (University of Missouri) Tue 10 Jan - Afternoon session (15:30-17:30)
2-2-24-MLS Phoneme Segmentation Using Self-Supervised Speech Models Luke Strgar (University of Texas, Austin); David Harwath (The University of Texas at Austin) Tue 10 Jan - Afternoon session (15:30-17:30)
3-1-23-MLS An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition Chao-Han Huck Yang (Georgia Institute of Technology ); I-Fan Chen (Amazon Inc.); Andreas Stolcke (Amazon); Sabato M Siniscalchi (Kore University of Enna); Chin-hui Lee (Georgia Institute of Technology) Wed 11 Jan - Morning session (10:30-12:30)
4-1-23-MLS PHONE-LEVEL PRONUNCIATION SCORING FOR L1 USING WEIGHTED-DYNAMIC TIME WARPING Aghilas SINI (Univ Rennes, CNRS, IRISA); Antoine Perquin (Univ Rennes, CNRS, IRISA); Damien Lolive (Univ Rennes, CNRS, IRISA); Arnaud Delhay (IRISA) Thu 12 Jan - Morning session (10:30-12:30)
4-1-24-MLS PROFICIENCY ASSESSMENT OF L2 SPOKEN ENGLISH USING WAV2VEC 2.0 Stefano Bannò (University of Trento); Marco Matassoni (Fondazione Bruno Kessler) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
4-2-1-SUP SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning Tzu-hsun Feng (National Taiwan University); Annie Dong (Meta); Ching-Feng Yeh (Facebook); Shu-wen Yang (National Taiwan University); Tzu-Quan Lin (National Taiwan University); Jiatong Shi (Carnegie Mellon University); Kai-Wei Chang (National Taiwan University); Zili Huang (Johns Hopkins University); Haibin Wu (National Taiwan University); Xuankai Chang (Carnegie Mellon University); Shinji Watanabe (Carnegie Mellon University); Abdel-rahman Mohamed (Facebook AI Research (FAIR)); Shang-Wen Li (Meta); Hung-yi Lee (National Taiwan University) Thu 12 Jan - JSALT 2022 Reports and Superb Challenge overview (8:30 - 10:00)
1-1-26-SUP On the Utility of Self-supervised Models for Prosody-related Tasks Guan-Ting Lin (National Taiwan University); Chi Luen Feng (National Taiwan University); Wei-Ping Huang (National Taiwan University); Yuan Tseng (National Taiwan University); Chen An Li (National Taiwan University); Tzu-Han Lin (National Taiwan University ); Hung-yi Lee (National Taiwan University); Nigel Ward (UTEP) Mon 9 Jan - Morning session (10:30-12:30)
2-1-25-SUP Improving generalizability of distilled self-supervised speech processing models under distorted settings Kuan-Po Huang (National Taiwan University); YU-KUAN FU (NTU); Tsu-Yuan Hsu (National Taiwan University); Fabian Alejandro Ritter Gutierrez (National University of Singapore); Fan-Lin Wang (Academia Sinica); Liang-Hsuan Tseng (National Taiwan University); Yu Zhang (Google); Hung-yi Lee (National Taiwan University) Tue 10 Jan - Morning session (10:30-12:30)
2-2-25-SUP Exploring Efficient-tuning Methods in Self-supervised Speech Models Zih-Ching Chen (National Taiwan University); Chin-Lun Fu (National Taiwan University); Chih Ying Liu (National Taiwan University); Shang-Wen Li (AWS AI); Hung-yi Lee (National Taiwan University) Tue 10 Jan - Afternoon session (15:30-17:30)
3-1-25-SUP On Compressing Sequences for Self-Supervised Speech Models Yen Meng (National Taiwan University); Hsuan-Jui Chen (National Taiwan University); Jiatong Shi (Carnegie Mellon University); Shinji Watanabe (Carnegie Mellon University); Paola Garcia (Johns Hopkins University); Hung-yi Lee (National Taiwan University); Hao Tang (The University of Edinburgh) Wed 11 Jan - Morning session (10:30-12:30)
4-1-25-SUP Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations Themos Stafylakis (Omilia - Conversational Intelligence); Ladislav Mošner (Brno University of Technology); Sofoklis Kakouros (University of Helsinki); Plchot Oldřich (Brno University of Technology); Lukas Burget (Brno University of Technology); Jan Honza Cernocky (Brno University of Technology) Thu 12 Jan - Morning session (10:30-12:30)

Poster ID Paper Title Authors Session
2-3-1-DEMO ISPEAK: INTERACTIVE SPOKEN LANGUAGE UNDERSTANDING SYSTEM FOR CHILDREN WITH SPEECH AND LANGUAGE DISORDERS Baihan Lin (Columbia University Irving Medical Center, New York, US), Xinxin Zhang (Elizabeth Seton Children’s Center, New York, US) Tue 10 Jan - Demo session (15:30 -17:30)
2-3-2-DEMO LUX-ASR: BUILDING AN ASR SYSTEM FOR THE LUXEMBOURGISH LANGUAGE Peter Gilles (University of Luxembourg, Luxembourg), Nina Hosseini-Kivanani (University of Luxembourg, Luxembourg), Leopold Hillah (University of Luxembourg, Luxembourg) Tue 10 Jan - Demo session (15:30 -17:30)
2-3-3-DEMO ON-DEVICE STREAMING TARGET-SPEAKER ASR WITH NEURAL TRANSDUCER Takafumi Moriya (NTT Corporation, Japan), Hiroshi Sato (NTT Corporation, Japan), Tsubasa Ochiai (NTT Corporation, Japan), Marc Delcroix (NTT Corporation, Japan), Taichi Asami (NTT Corporation, Japan) Tue 10 Jan - Demo session (15:30 -17:30)
2-3-4-DEMO VOICE-ENABLED AUDIOVISUAL AGENT FOR QUESTION ANSWERING IN ENGLISH AND ARABIC Oscar Saz (Emotech Ltd, London, UK), Ahmed Abdellah (Emotech Ltd, London, UK), Luca McArthur (Emotech Ltd, London, UK), Daniel McKenna (Emotech Ltd, London, UK), Simon Shelley (Emotech Ltd, London, UK), Xinyue Zhang (Emotech Ltd, London, UK) Tue 10 Jan - Demo session (15:30 -17:30)