Program at a Glance

This year, we are experimenting with a new approach to sessions. We have organized all sessions to include papers from all the areas to maximize authors’ likelihood of seeing other posters in their area. We hope that this will allow more interactions between participants. We will collect feedback to hear your opinion.

The poster ID is as follows: [Day]-[Session]-[Poster Number]-[Topic],
where:
[Day] is the day of the conference (1,2, 3 or 4),
[Session] is 1 for monring and 2 for afternoon sessions,
[Poster Number] is the number of the poster within the session and
[Topic] is the technical area of the work as follows

Topic IDTechnical Area
ASR01. Automatic speech recognition
SLP02. Spoken language processing
SES03. Speech enhancement and separation
ANA04. Speech analysis
SLR05. Speaker and language recognition
DIA06. Speaker diarization
TLP07. Text-only language processing
MMP08. Multimodal speech processing
MLP09. Multilingual processing
EMR10. Emotion recognition and paralinguistics
TTS11. Speech synthesis and spoken language generation
RES12. Resources (new corpora, toolkits, evaluation metrics, etc.)
MLS13. Machine learning for speech applications
SUP14. SUPERB challenge

Technical Program Per Day

  1. Poster IDPaper TitlePaper ID
    1-1-1-MLPExploration of Language-Specific Self-Attention Parameters for Multilingual End-to-End Speech Recognition13
    1-1-2-ASRASBERT: ASR-SPECIFIC SELF-SUPERVISED LEARNING WITH SELF-TRAINING71
    1-1-3-ASRSUB-8-BIT QUANTIZATION FOR ON-DEVICE SPEECH RECOGNITION: A REGULARIZATION-FREE APPROACH134
    1-1-4-ASRG-AUGMENT: SEARCHING FOR THE META-STRUCTURE OF DATA AUGMENTATION POLICIES FOR ASR240
    1-1-5-MLPHow Do Phonological Properties Affect Bilingual Automatic Speech Recognition?329
    1-1-6-MLPScaling Up Deliberation for Multilingual ASR109
    1-1-7-ASRContext-aware Neural Confidence Estimation for Rare Word Speech Recognition278
    1-1-8-ASRFlickering reduction with partial hypothesis reranking for streaming ASR86
    1-1-9-ASRInterDecoder: Using Attention Decoders as Intermediate Regularization for CTC-based Speech Recognition187
    1-1-10-SLPAutomatic Rating of Spontaneous Speech for Low-Resource Languages199
    1-1-11-SLPMixture of Domain Experts for Language Understanding: An Analysis of Modularity, Task Performance, and Memory Tradeoffs25
    1-1-12-SESMULTI-STAGE PROGRESSIVE AUDIO BANDWIDTH EXTENSION69
    1-1-13-SESJOINT OPTIMIZATION OF DIFFUSION PROBABILISTIC-BASED MULTICHANNEL SPEECH ENHANCEMENT WITH FAR-FIELD SPEAKER VERIFICATION243
    1-1-14-ANALearning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation175
    1-1-15-MLSSpeed-Robust Keyword Spotting via Soft Self-Attention on Multi-Scale Features79
    1-1-16-DIAContinual Self-supervised Domain Adaptation for End-to-end Speaker Diarization4
    1-1-17-TLPFine Grained Spoken Document Summarization Through Text Segmentation7
    1-1-18-MMPPush-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection42
    1-1-19-MMPTowards visually prompted keyword localisation for zero-resource spoken languages178
    1-1-20-EMRSPEECH EMOTION RECOGNITION WITH COMPLEMENTARY ACOUSTIC REPRESENTATIONS315
    1-1-21-TTSWaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration141
    1-1-22-TTSOn granularity of prosodic representations in expressive text-to-speech258
    1-1-23-TTSCan we use Common Voice to train a Multi-Speaker TTS system?81
    1-1-24-MLSDistilling Sequence-to-Sequence Voice Conversion Models For Streaming Conversion Applications180
    1-1-25-MLSAUTOMATIC PREDICTION OF INTELLIGIBILITY OF WORDS AND PHONEMES PRODUCED ORALLY BY JAPANESE LEARNERS OF ENGLISH355
    1-1-26-SUPOn the Utility of Self-supervised Models for Prosody-related Tasks313

  2. Poster IDPaper TitlePaper ID
    1-2-1-ASRJOIST: A Joint Speech and Text Streaming Model For ASR23
    1-2-2-MLPCode-switched language modelling using a code predictive LSTM in under-resourced South African languages76
    1-2-3-ASRA CONTEXT-AWARE KNOWLEDGE TRANSFERRING STRATEGY FOR CTC-BASED ASR147
    1-2-4-ASRMaestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR252
    1-2-5-MLPIMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS342
    1-2-6-ASRAlternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR121
    1-2-7-ASRE-Branchformer: Branchformer with Enhanced merging for speech recognition310
    1-2-8-ASRCONFORMER-BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH KD COMPRESSION AND TWO-PASS ARCHITECTURE169
    1-2-9-ASRAccelerator-Aware Training for Transducer-based Speech Recognition220
    1-2-10-SLPA DATA-DRIVEN INVESTIGATION OF NOISE-ADAPTIVE UTTERANCE GENERATION WITH LINGUISTIC MODIFICATION85
    1-2-11-SLPOn the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding111
    1-2-12-SESSpatial-DCCRN: DCCRN Equipped with Frame-level Angle Feature and Hybrid Filtering for Multi-channel Speech Enhancement70
    1-2-13-SESIMPROVED NORMALIZING FLOW-BASED SPEECH ENHANCEMENT USING AN ALL-POLE GAMMATONE FILTERBANK FOR CONDITIONAL INPUT REPRESENTATION245
    1-2-14-ANAVSAMETER: EVALUATION OF A NEW OPEN-SOURCE TOOL TO MEASURE VOWEL SPACE AREA AND RELATED METRICS237
    1-2-15-SLRFREQUENCY AND MULTI-SCALE SELECTIVE KERNEL ATTENTION FOR SPEAKER VERIFICATION136
    1-2-16-DIAJoint speaker diarisation and tracking in switching state-space model10
    1-2-17-TLPAN ANALYSIS OF THE EFFECTS OF DECODING ALGORITHMS ON FAIRNESS IN OPEN-ENDED LANGUAGE GENERATION63
    1-2-18-MMPExploiting information from native data for non-native automatic pronunciation assessment123
    1-2-19-MLPTextual Data Augmentation for Arabic-English Code-Switching Speech Recognition270
    1-2-20-EMRA ZERO-SHOT APPROACH TO IDENTIFYING CHILDREN’S SPEECH IN AUTOMATIC GENDER CLASSIFICATION322
    1-2-21-TTSGAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models167
    1-2-22-TTSAnonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy273
    1-2-23-RESSTOP: A DATASET FOR SPOKEN TASK ORIENTED SEMANTIC PARSING120
    1-2-24-MLSSVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning204
    1-2-25-MLSPEPPANET: EFFECTIVE MISPRONUNCIATION DETECTION AND DIAGNOSIS LEVERAGING PHONETIC, PHONOLOGICAL, AND ACOUSTIC CUES368

  3. Poster IDPaper TitlePaper ID
    2-1-1-ASRUntied Positional Encodings for Efficient Transformer-based Speech Recognition29
    2-1-2-ASRMatch to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio92
    2-1-3-ASRPronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition150
    2-1-4-ASRDamage Control during Domain Adaptation for Transducer Based Automatic Speech Recognition254
    2-1-5-ASRPADA: PRUNING ASSISTED DOMAIN ADAPTATION FOR SELF-SUPERVISED SPEECH REPRESENTATIONS361
    2-1-6-ASRMFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario137
    2-1-7-ASRFast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition157
    2-1-8-MMPTRANSFORMER-BASED LIP-READING WITH REGULARIZED DROPOUT AND RELAXED ATTENTION84
    2-1-9-ASRResidual Adapters for Targeted Updates in RNN-Transducer Based Speech Recognition System227
    2-1-10-SLPResponse Timing Estimation for Spoken Dialog Systems based on Syntactic Completeness Prediction309
    2-1-11-SLPWeak-Supervised Dysarthria-invariant Features for Spoken Language Understanding using an FHVAE and Adversarial Training194
    2-1-12-SESExploring WavLM on Speech Enhancement149
    2-1-13-SESAdaptive-FSN: Integrating full-band extraction and adaptive sub-band encoding for monaural speech enhancement247
    2-1-14-ANAINVESTIGATING THE IMPORTANT TEMPORAL MODULATIONS FOR DEEP-LEARNING-BASED SPEECH ACTIVITY DETECTION276
    2-1-15-SLRAN ATTENTION-BASED BACKEND ALLOWING EFFICIENT FINE-TUNING OF TRANSFORMER MODELS FOR SPEAKER VERIFICATION179
    2-1-16-DIADiarisation using location tracking with agglomerative clustering11
    2-1-17-TLPN-BEST HYPOTHESES RERANKING FOR TEXT-TO-SQL SYSTEMS236
    2-1-18-MMPSpeechCLIP: Integrating Speech with Pre-trained Vision and Language Model146
    2-1-19-EMRDistribution-based Emotion Recognition in Conversation22
    2-1-20-TTSStyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models43
    2-1-21-TTSLearning accent representation with multi-level VAE towards controllable speech synthesis185
    2-1-22-TTSvTTS: visual-text to speech314
    2-1-23-MLPFLEURS: FEW-SHOT LEARNING EVALUATION OF UNIVERSAL REPRESENTATIONS OF SPEECH133
    2-1-24-MLSImplicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection216
    2-1-25-SUPImproving generalizability of distilled self-supervised speech processing models under distorted settings53
    2-1-26-SESAVSE CHALLENGE: AUDIO-VISUAL SPEECH ENHANCEMENT CHALLENGE374

  4. Poster IDPaper TitlePaper ID
    2-2-1-ASRIMPROVED NOISY ITERATIVE PSEUDO-LABELING FOR SEMI-SUPERVISED SPEECH RECOGNITION65
    2-2-2-ASRGUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION94
    2-2-3-ASRLearning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition166
    2-2-4-ASRNAM+: TOWARDS SCALABLE END-TO-END CONTEXTUAL BIASING FOR ADAPTIVE ASR279
    2-2-5-ASRCCC-WAV2VEC 2.0: CLUSTERING AIDED CROSS CONTRASTIVE SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS363
    2-2-6-ASRModular Hybrid Autoregressive Transducer143
    2-2-7-ASRHow Does Pre-trained Wav2Vec 2.0 Perform on Domain-Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications164
    2-2-8-MLPImproving Semi-supervised E2E ASR using CycleGAN and Inter-domain Losses115
    2-2-9-ASRInternal Language Model Personalization of E2E Automatic Speech Recognition Using Random Encoder Features256
    2-2-10-SLPBuilding Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems330
    2-2-11-SLPNON-AUTOREGRESSIVE END-TO-END APPROACHES FOR JOINT AUTOMATIC SPEECH RECOGNITION AND SPOKEN LANGUAGE UNDERSTANDING226
    2-2-12-SESTEA-PSE 2.0: SUB-BAND NETWORK FOR REAL-TIME PERSONALIZED SPEECH ENHANCEMENT172
    2-2-13-SESEEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers316
    2-2-14-SLRFlow-ER: a Flow-based Embedding Regularization Strategy for Robust Speech Representation Learning3
    2-2-15-SLRUNSUPERVISED DOMAIN ADAPTATION OF NEURAL PLDA USING SEGMENT PAIRS FOR SPEAKER VERIFICATION277
    2-2-16-DIAMutual Learning of Single- and Multi-Channel End-to-End Neural Diarization93
    2-2-17-TLPFour-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition275
    2-2-18-MMPYFACC: A Yorùbá Speech-Image Dataset for Cross-lingual Keyword Localisation through Visual Grounding153
    2-2-19-MLPMULTILINGUAL SPEECH EMOTION RECOGNITION WITH MULTI-GATING MECHANISM AND NEURAL ARCHITECTURE SEARCH113
    2-2-20-TTSGenerative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech50
    2-2-21-TTSTwo-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion217
    2-2-22-MLPDisentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using $\beta$-VAE335
    2-2-23-RESBenchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition274
    2-2-24-MLSPhoneme Segmentation Using Self-Supervised Speech Models268
    2-2-25-SUPExploring Efficient-tuning Methods in Self-supervised Speech Models106

  5. Poster IDPaper TitlePaper ID
    3-1-1-ASRTowards End-to-end Unsupervised Speech Recognition66
    3-1-2-MLPExploring a unified ASR for multiple south Indian languages leveraging multilingual acoustic and language models97
    3-1-3-ASRMonotonic segmental attention for automatic speech recognition197
    3-1-4-ASRSTREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION323
    3-1-5-ASRDUAL LEARNING FOR LARGE VOCABULARY ON-DEVICE ASR27
    3-1-6-ASRSTREAMING BILINGUAL END TO END ASR MODEL USING ATTENTION OVER MULTIPLE SOFTMAX190
    3-1-7-ASREnd-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation222
    3-1-8-ASRFully Unsupervised Training of Few-Shot Keyword Spotting127
    3-1-9-ASRLearning a Dual-Mode Speech Recognition Model via Self-Pruning287
    3-1-10-SLPImproving Noise Robustness for Spoken Content Retrieval using semi-supervised ASR and N-best transcripts for BERT-based ranking models170
    3-1-11-SLPA STUDY ON THE INTEGRATION OF PRE-TRAINED SSL, ASR, LM AND SLU MODELS FOR SPOKEN LANGUAGE UNDERSTANDING264
    3-1-12-SESLIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION177
    3-1-13-ANAA MULTI-MODAL ARRAY OF INTERPRETABLE FEATURES TO EVALUATE LANGUAGE AND SPEECH PATTERNS IN DIFFERENT NEUROLOGICAL DISORDERS107
    3-1-14-SLRTHE CLEVER HANS EFFECT IN VOICE SPOOFING DETECTION20
    3-1-15-SLRINVESTIGATING ACTIVE-LEARNING-BASED TRAINING DATA SELECTION FOR SPEECH SPOOFING COUNTERMEASURE284
    3-1-16-DIABERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications162
    3-1-17-TLPEfficient Text Analysis with Pre-trained Neural Network Models300
    3-1-18-MMPON THE USE OF MODALITY-SPECIFIC LARGE-SCALE PRE-TRAINED ENCODERS FOR MULTIMODAL SENTIMENT ANALYSIS154
    3-1-19-EMRExploration of A Self-Supervised Speech Model: A Study on Emotional Corpora188
    3-1-20-TTSSIMD-SIZE AWARE WEIGHT REGULARIZATION FOR FAST NEURAL VOCODING ON CPU64
    3-1-21-TTSExact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech219
    3-1-22-TTSNix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation352
    3-1-23-MLSAn Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition17
    3-1-24-MLSTDOA ESTIMATION OF SPEECH SOURCE IN NOISY REVERBERANT ENVIRONMENTS312
    3-1-25-SUPOn Compressing Sequences for Self-Supervised Speech Models238

  6. Poster IDPaper TitlePaper ID
    3-2-1-SUPSUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning373

  7. Poster IDPaper TitlePaper ID
    4-1-1-ASRInter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition67
    4-1-2-ASRHMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch100
    4-1-3-ASRDomain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models235
    4-1-4-ASRPersonalization of CTC Speech Recognition Models328
    4-1-5-MLPA Truly Multilingual First Pass and Monolingual Second Pass Streaming On-Device ASR System108
    4-1-6-ASRUNIFIED END-TO-END SPEECH RECOGNITION AND ENDPOINTING FOR FAST AND EFFICIENT SPEECH SYSTEMS269
    4-1-7-ASRLearning mask scalars for improved robust automatic speech recognition293
    4-1-8-ASRAn Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition163
    4-1-9-ASRMacro-block dropout for improved regularization in training end-to-end speech recognition models348
    4-1-10-SLPOn the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting214
    4-1-11-SESEnd-to-End Multi-speaker ASR with Independent Vector Analysis15
    4-1-12-SESA Hybrid Acoustic Echo Reduction Approach Using Kalman Filtering and Informed Source Extraction With Improved Training184
    4-1-13-ANAEfficient dynamic filter for robust and low computational feature extraction148
    4-1-14-SLRHOW TO BOOST ANTI-SPOOFING WITH X-VECTORS78
    4-1-15-SLRA COMPREHENSIVE STUDY ON SELF-SUPERVISED DISTILLATION FOR SPEAKER REPRESENTATION LEARNING311
    4-1-16-DIALow-Latency Speech Separation Guided Diarization for Telephone Conversations241
    4-1-17-TLPEmpirical Analysis of Training Strategies of Transformer-based Japanese Chit-chat Systems308
    4-1-18-MMPAn Analysis of Semantically-Aligned Speech-Text Embeddings174
    4-1-19-EMRCombining Contrastive and Non-Contrastive Losses for Fine-Tuning Pretrained Models in Speech Analysis218
    4-1-20-TTSRegotron: Regularizing the Tacotron2 architecture via monotonic alignment loss77
    4-1-21-TTSRemap, warp and attend: Non-parallel many-to-many accent conversion with Normalizing Flows234
    4-1-22-RESMASC: Massive Arabic Speech Corpus39
    4-1-23-MLSPHONE-LEVEL PRONUNCIATION SCORING FOR L1 USING WEIGHTED-DYNAMIC TIME WARPING35
    4-1-24-MLSPROFICIENCY ASSESSMENT OF L2 SPOKEN ENGLISH USING WAV2VEC 2.0340
    4-1-25-SUPExtracting speaker and emotion information from self-supervised speech models via channel-wise correlations250

Technical Program Per Area

Poster IDPaper TitleAuthorsSession
1-1-2-ASRASBERT: ASR-SPECIFIC SELF-SUPERVISED LEARNING WITH SELF-TRAININGHyung Yong Kim (42dot); Byeong-Yeol Kim (42dot); Seung Woo Yu (42dot); Youshin Lim (42dot); Yunkyu Lim (42dot); Hanbin Lee (42dot)Mon 9 Jan - Morning session (10:30-13:00)
1-1-3-ASRSUB-8-BIT QUANTIZATION FOR ON-DEVICE SPEECH RECOGNITION: A REGULARIZATION-FREE APPROACHKai Zhen (Amazon); Martin Radfar (Amazon); Hieu D Nguyen (Amazon); Grant Strimel (Amazon); Athanasios Mouchtaris (Amazon); Nathan Susanj (Amazon)Mon 9 Jan - Morning session (10:30-13:00)
1-1-4-ASRG-AUGMENT: SEARCHING FOR THE META-STRUCTURE OF DATA AUGMENTATION POLICIES FOR ASRYuan Wang (Google); Ekin D Cubuk (Google Brain); Andrew Rosenberg (Google LLC); Shuyang Cheng (Waymo LLC); Ron J Weiss (Google, Inc.); Bhuvana Ramabhadran (Google); Pedro Moreno (Google); Quoc Le (Google Brain); Daniel S Park (Google Brain)Mon 9 Jan - Morning session (10:30-13:00)
1-1-7-ASRContext-aware Neural Confidence Estimation for Rare Word Speech RecognitionDavid Qiu (Google); Tsendsuren Munkhdalai (Microsoft Research); Yanzhang He (Google); Khe C Sim (Google Inc.)Mon 9 Jan - Morning session (10:30-13:00)
1-1-8-ASRFlickering reduction with partial hypothesis reranking for streaming ASRAntoine Bruguier (Google); David Qiu (Google); Trevor strohman (Google); Yanzhang He (Google)Mon 9 Jan - Morning session (10:30-13:00)
1-1-9-ASRInterDecoder: Using Attention Decoders as Intermediate Regularization for CTC-based Speech RecognitionTatsuya Komatsu (LINE Corporation); Yusuke Fujita (LINE Corporation)Mon 9 Jan - Morning session (10:30-13:00)
1-2-1-ASRJOIST: A Joint Speech and Text Streaming Model For ASRTara Sainath (Google); Rohit Prabhavalkar (Google); Ankur Bapna (Google Research); Yu Zhang (Google); Zhouyuan Huo (Google ); Zhehuai Chen (Google); Bo Li (Google); Weiran Wang (Google); Trevor Strohman (Google, Inc.)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-3-ASRA CONTEXT-AWARE KNOWLEDGE TRANSFERRING STRATEGY FOR CTC-BASED ASRKe-Han Lu (National Taiwan University of Science and Technology); Kuan-Yu CHEN (National Taiwan University of Science and Technology)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-4-ASRMaestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASRZhehuai Chen (Google); Ankur Bapna (Google Research); Andrew Rosenberg (Google LLC); Yu Zhang (Google); Bhuvana Ramabhadran (Google); Pedro Moreno (Google); Nanxin Chen (Google)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-6-ASRAlternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASRYusuke Fujita (LINE Corporation); Tatsuya Komatsu (LINE Corporation); Yusuke Kida (LINE Corp)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-7-ASRE-Branchformer: Branchformer with Enhanced merging for speech recognitionKwangyoun Kim (ASAPP); Felix Wu (ASAPP); Yifan Peng (Carnegie Mellon University); Jing Pan (ASAPP); Prashant Sridhar (ASAPP); Kyu Jeong Han (ASAPP); Shinji Watanabe (Carnegie Mellon University)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-8-ASRCONFORMER-BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH KD COMPRESSION AND TWO-PASS ARCHITECTUREJinhwan Park (Samsung Research); Sichen Jin (Samsung); Junmo Park (Samsung Research); Sungsoo Kim (Samsung Electronics); Dhairya Sandhyana (Samsung Research); Changheon Lee (Samsung Electronics); Myoungji Han (Samsung Electronics); Jungin Lee (Samsung Electronics); Seokyeong Jung (Samsung Electronics); Chang Woo Han (Samsung Reserch); Chanwoo Kim (Samsung Electronics)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-9-ASRAccelerator-Aware Training for Transducer-based Speech RecognitionRupak Vignesh Swaminathan (Amazon.com); Suhaila Mumtaj Shakiah (Amazon); Hieu D Nguyen (Amazon); Raviteja chinta (Amazon.com); Tariq Afzal (Amazon.com); Nathan Susanj (Amazon.com); Athanasios Mouchtaris (Amazon.com); Grant Strimel (Amazon.com); Ariya Rastrow (Amazon Alexa)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-1-ASRUntied Positional Encodings for Efficient Transformer-based Speech RecognitionLahiru T Samarakoon (Fano Labs, Hong Kong); Ivan Fung (Fano Labs, Hong Kong)Tue 10 Jan - Morning session (10:30-13:00)
2-1-2-ASRMatch to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and AudioYan Gao (University of Cambridge); Javier Fernandez-Marques (Samsung AI, Cambridge); Titouan Parcollet (); Pedro Gusmao (University of Cambridge); Nicholas Lane (University of Cambridge and Samsung AI)Tue 10 Jan - Morning session (10:30-13:00)
2-1-3-ASRPronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognitionPeng Shen (NICT); Xugang Lu (NICT); Hisashi Kawai (NICT)Tue 10 Jan - Morning session (10:30-13:00)
2-1-4-ASRDamage Control during Domain Adaptation for Transducer Based Automatic Speech RecognitionSomshubra Majumdar (NVIDIA); Shantanu Acharya (NVIDIA); Vitaly Lavrukhin (NVIDIA); Boris Ginsburg (NVIDIA)Tue 10 Jan - Morning session (10:30-13:00)
2-1-5-ASRPADA: PRUNING ASSISTED DOMAIN ADAPTATION FOR SELF-SUPERVISED SPEECH REPRESENTATIONSVasista Sai Lodagala (Indian Institute of Technology, Madras); Sreyan Ghosh (University of Maryland, College Park); S Umesh (IIT Chennai)Tue 10 Jan - Morning session (10:30-13:00)
2-1-6-ASRMFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenarioFan Yu (Northwestern Polytechnical University); Shiliang Zhang (Alibaba Group); Pengcheng Guo (Northwestern Polytechnical University); Yuhao Liang (Northwestern Polytechnical University); Zhihao Du (Speech Lab, Alibaba Group); Yuxiao Lin (Zhejiang University); Lei Xie (Northwestern Polytechnical University)Tue 10 Jan - Morning session (10:30-13:00)
2-1-7-ASRFast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech RecognitionAleksandr Laptev (NVIDIA, ITMO University); Boris Ginsburg (NVIDIA)Tue 10 Jan - Morning session (10:30-13:00)
2-1-9-ASRResidual Adapters for Targeted Updates in RNN-Transducer Based Speech Recognition SystemSungjun Han (University of Stuttgart); Deepak Baby (Amazon Alexa); Valentin Mendelev (Amazon Alexa)Tue 10 Jan - Morning session (10:30-13:00)
2-2-1-ASRIMPROVED NOISY ITERATIVE PSEUDO-LABELING FOR SEMI-SUPERVISED SPEECH RECOGNITIONTian Li (Shumei AI Research Institute); Qingliang Meng (Shumei AI Research Institute); Yujian Sun (Shumei AI Research Institute)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-2-ASRGUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITIONAparna Khare (Amazon); Minhua Wu (Amazon Inc.); Saurabhchand Bhati (Johns Hopkins University ); Jasha Droppo (Amazon Inc.); Roland Maas (Amazon Inc.)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-3-ASRLearning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech RecognitionJakob Poncelet (KU Leuven); Hugo Van hamme (KU Leuven)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-4-ASRNAM+: TOWARDS SCALABLE END-TO-END CONTEXTUAL BIASING FOR ADAPTIVE ASRZelin Wu (Google LLC); Tsendsuren Munkhdalai (Microsoft Research); Golan Pundak (Google); Khe C Sim (Google Inc.); David Li (Google LLC); Pat Rondon (Google LLC); Tara Sainath (Google)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-5-ASRCCC-WAV2VEC 2.0: CLUSTERING AIDED CROSS CONTRASTIVE SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONSVasista Sai Lodagala (Indian Institute of Technology, Madras); Sreyan Ghosh (University of Maryland, College Park); S Umesh (IIT Chennai)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-6-ASRModular Hybrid Autoregressive TransducerZhong Meng (Google LLC); Tongzhou Chen (Google); Rohit Prabhavalkar (Google); Yu Zhang (Google); Yuan Wang (Google); Kartik Audhkhasi (Google); Jesse Emond (Google LLC); Trevor Strohman (Google LLC); Bhuvana Ramabhadran (Google); W. Ronny Huang (Google); Ehsan Variani (Google); Yinghui Huang (Google); Pedro Moreno (Google)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-7-ASRHow Does Pre-trained Wav2Vec 2.0 Perform on Domain-Shifted ASR? An Extensive Benchmark on Air Traffic Control CommunicationsJuan Pablo Zuluaga Gomez (Idiap Research Institute); Amrutha Prasad (Idiap Research Institute); Iuliia Nigmatulina (Idiap Research Institute); Seyyed Saeed Sarfjoo (Idiap Research Institute); Petr Motlicek (Idiap); Matthias Kleinert (DLR); Hartmut Helmke (DLR); Oliver Ohneiser (DLR); Qingran Zhan (Beijing Institute of Technology)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-9-ASRInternal Language Model Personalization of E2E Automatic Speech Recognition Using Random Encoder FeaturesAdam Stooke (Google); Khe C Sim (Google Inc.); Mason Chua (Google); Tsendsuren Munkhdalai (Microsoft Research); Trevor Strohman (Google)Tue 10 Jan - Afternoon session (13:30-15:00)
3-1-1-ASRTowards End-to-end Unsupervised Speech RecognitionAlexander H Liu (MIT); Wei-Ning Hsu (Massachusetts Institute of Technology); Michael Auli (Facebook); Alexei Baevski (Facebook AI Research)Wed 11 Jan - Morning session (10:30-13:00)
3-1-3-ASRMonotonic segmental attention for automatic speech recognitionAlbert Zeyer (RWTH Aachen University); Robin Schmitt (RWTH Aachen University); Wei Zhou (RWTH Aachen University); Ralf Schlüter (RWTH Aachen University); Hermann Ney ( RWTH Aachen University)Wed 11 Jan - Morning session (10:30-13:00)
3-1-4-ASRSTREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITIONYashesh Gaur (Microsoft); Nick Kibre (Microsoft); JIAN XUE (Microsoft Corporation); Kangyuan Shu (Microsoft); Yuhui Wang (Microsoft); Issac Alphonso (Microsoft); Jinyu Li (Microsoft); Yifan Gong (Microsoft)Wed 11 Jan - Morning session (10:30-13:00)
3-1-5-ASRDUAL LEARNING FOR LARGE VOCABULARY ON-DEVICE ASRCharles C Peyser (Google Inc.); W. Ronny Huang (Google); Tara Sainath (Google); Rohit Prabhavalkar (Google); Michael Picheny (NYU); Kyunghyun Cho (New York University)Wed 11 Jan - Morning session (10:30-13:00)
3-1-6-ASRSTREAMING BILINGUAL END TO END ASR MODEL USING ATTENTION OVER MULTIPLE SOFTMAXAditya R Patil (Microsoft); Vikas V Joshi (Microsoft); Purvi Agrawal (Microsoft); Rupesh Mehta (Microsoft)Wed 11 Jan - Morning session (10:30-13:00)
3-1-7-ASREnd-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning RepresentationYoshiki Masuyama (Tokyo Metropolitan University​); Xuankai Chang (Carnegie Mellon University); Samuele Cornell (Università Politecnica delle Marche); Shinji Watanabe (Carnegie Mellon University); Nobutaka Ono (Tokyo Metropolitan University)Wed 11 Jan - Morning session (10:30-13:00)
3-1-8-ASRFully Unsupervised Training of Few-Shot Keyword SpottingMinchan Kim (Seoul National University); Dongjune Lee (Seoul National University); Sung Hwan Mun (Seoul National University); Min Hyun Han (Seoul National University); Nam Soo Kim (Seoul National University)Wed 11 Jan - Morning session (10:30-13:00)
3-1-9-ASRLearning a Dual-Mode Speech Recognition Model via Self-PruningChunxi Liu (Meta AI); Yuan Shangguan (Meta AI); Haichuan Yang (Meta); Yangyang Shi (Facebook); Raghuraman Krishnamoorthi (Facebook); Ozlem Kalinli (Meta AI)Wed 11 Jan - Morning session (10:30-13:00)
4-1-1-ASRInter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech RecognitionJi Won Yoon (Seoul National University); Beom Jun Woo (Seoul National University); Sunghwan Ahn (Seoul National University); Hyeonseung Lee (Seoul National University); Nam Soo Kim (Seoul National University)Thu 12 Jan - Morning session (10:30-13:00)
4-1-2-ASRHMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from ScratchTina Raissi (RWTH Aachen University); Wei Zhou (RWTH Aachen University); Simon Berger (RWTH Aachen University); Ralf Schlüter (RWTH Aachen University); Hermann Ney ( RWTH Aachen University)Thu 12 Jan - Morning session (10:30-13:00)
4-1-3-ASRDomain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer ModelsVrunda N Sukhadia (Indian Institute Of Technology Madras); S Umesh (IIT Chennai)Thu 12 Jan - Morning session (10:30-13:00)
4-1-4-ASRPersonalization of CTC Speech Recognition ModelsSaket Dingliwal (Amazon); Monica Sunkara (Amazon); Sravan Babu Bodapati (Amazon); Srikanth Ronanki (Amazon); Jeff Farris (Amazon); Katrin Kirchhoff (Amazon)Thu 12 Jan - Morning session (10:30-13:00)
4-1-6-ASRUNIFIED END-TO-END SPEECH RECOGNITION AND ENDPOINTING FOR FAST AND EFFICIENT SPEECH SYSTEMSShaan Bijwadia (Google); Shuo-yiin Chang (Google); Tara Sainath (Google); Bo Li (Google); Chao Zhang (Google); Yanzhang He (Google)Thu 12 Jan - Morning session (10:30-13:00)
4-1-7-ASRLearning mask scalars for improved robust automatic speech recognitionArun Narayanan (Google Inc.); James Walker (Google Llc.); SANKARAN PANCHAPAGESAN (Google, LLC); Nathan Howard (Google Llc.); Yuma Koizumi (Google)Thu 12 Jan - Morning session (10:30-13:00)
4-1-8-ASRAn Investigation of Monotonic Transducers for Large-Scale Automatic Speech RecognitionNiko Moritz (Meta); Frank Seide (Meta); Duc Le (Meta); Jay Mahadeokar (Meta AI); Christian Fuegen (Facebook)Thu 12 Jan - Morning session (10:30-13:00)
4-1-9-ASRMacro-block dropout for improved regularization in training end-to-end speech recognition modelsChanwoo Kim (Samsung Electronics); Sathish Indurti (Samsung Research); Jinhwan Park (Samsung Research); Wonyong Sung (Seoul national university)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-1-10-SLPAutomatic Rating of Spontaneous Speech for Low-Resource LanguagesYaroslav Getman (Aalto University); Ragheb Al-Ghezi (Aalto University); Ekaterina Voskoboinik (Aalto University); Mittul Singh (Silo AI); Mikko Kurimo (Aalto University)Mon 9 Jan - Morning session (10:30-13:00)
1-1-11-SLPMixture of Domain Experts for Language Understanding: An Analysis of Modularity, Task Performance, and Memory TradeoffsBenjamin Kleiner (AWS AI Labs); Jack FitzGerald (Amazon Alexa Artificial Intelligence); Haidar Khan (Amazon Alexa AI); Gokhan Tur ( Amazon Alexa AI)Mon 9 Jan - Morning session (10:30-13:00)
1-2-10-SLPA DATA-DRIVEN INVESTIGATION OF NOISE-ADAPTIVE UTTERANCE GENERATION WITH LINGUISTIC MODIFICATIONAnupama Chingacham (Saarland University); Vera Demberg (Dept. of Mathematics and Computer Science, Saarland University); Dietrich Klakow (Saarland University)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-11-SLPOn the Use of Semantically-Aligned Speech Representations for Spoken Language UnderstandingGaëlle Laperrière (LIA - Avignon University); Valentin Pelloin (LIUM, Le Mans Université); Mickael Rouvier (LIA - Avignon University); Themos Stafylakis (Omilia - Conversational Intelligence); Yannick Estève (LIA - Avignon University)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-10-SLPResponse Timing Estimation for Spoken Dialog Systems based on Syntactic Completeness PredictionJin Sakuma (Waseda University); Shinya Fujie (Chiba Institute of Technology); Tetsunori Kobayashi (Waseda University)Tue 10 Jan - Morning session (10:30-13:00)
2-1-11-SLPWeak-Supervised Dysarthria-invariant Features for Spoken Language Understanding using an FHVAE and Adversarial TrainingJinzi Qi (KULeuven); Hugo Van hamme (KU LEUVEN)Tue 10 Jan - Morning session (10:30-13:00)
2-2-10-SLPBuilding Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog SystemsHong Liu (Tsinghua University); Yucheng Cai (tsinghua university); Zhijian Ou (Tsinghua University); Yi Huang (China Mobile Research); Junlan Feng (China Mobile Research)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-11-SLPNON-AUTOREGRESSIVE END-TO-END APPROACHES FOR JOINT AUTOMATIC SPEECH RECOGNITION AND SPOKEN LANGUAGE UNDERSTANDINGMohan LI (Toshiba Europe Ltd); Rama S Doddipatla (Toshiba Europe LTD)Tue 10 Jan - Afternoon session (13:30-15:00)
3-1-10-SLPImproving Noise Robustness for Spoken Content Retrieval using semi-supervised ASR and N-best transcripts for BERT-based ranking modelsYasufumi Moriya (Dublin City University); Gareth Jones (Dublin City University)Wed 11 Jan - Morning session (10:30-13:00)
3-1-11-SLPA STUDY ON THE INTEGRATION OF PRE-TRAINED SSL, ASR, LM AND SLU MODELS FOR SPOKEN LANGUAGE UNDERSTANDINGYifan Peng (Carnegie Mellon University); Siddhant Arora (Carnegie Mellon University); Yosuke Higuchi (Waseda University); Yushi Ueda (Carnegie Mellon University); Sujay Kumar (Carnegie Mellon University); Karthik Ganesan (Carnegie Mellon University); Siddharth Dalmia (Carnegie Mellon University); Xuankai Chang (Carnegie Mellon University); Shinji Watanabe (Carnegie Mellon University)Wed 11 Jan - Morning session (10:30-13:00)
4-1-10-SLPOn the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword SpottingYuan-Kuei Wu (National Taiwan University); Wei-Tsung Kao (National Taiwan University); Hung-yi Lee (National Taiwan University); Chia-Ping Chen (intelliGo Technology inc.); Zhi-Sheng Chen (intelliGo Technology inc.); Yu-Pao Tsai (intelliGo Technology inc.)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-1-12-SESMULTI-STAGE PROGRESSIVE AUDIO BANDWIDTH EXTENSIONliang wen (samsung electronics); Lizhong Wang (Samsung); Ying Zhang (Samsung Electronics); Kwang Pyo Choi (Samsung Electronics)Mon 9 Jan - Morning session (10:30-13:00)
1-1-13-SESJOINT OPTIMIZATION OF DIFFUSION PROBABILISTIC-BASED MULTICHANNEL SPEECH ENHANCEMENT WITH FAR-FIELD SPEAKER VERIFICATIONSandipana Dowerah (Inria); romain serizel (Université de Lorraine); Denis Jouvet (LORIA); Mohammad Mohammadamini (Laboratoire Informatique d’Avignon, University of Avignon); Driss Matrouf (Laboratoire Informatique d’Avignon, University of Avignon)Mon 9 Jan - Morning session (10:30-13:00)
1-2-12-SESSpatial-DCCRN: DCCRN Equipped with Frame-level Angle Feature and Hybrid Filtering for Multi-channel Speech EnhancementShubo Lv (Shaanxi Provincial Key Laboratory of Speech and Image Information Processing, School of Computer Science, Northwestern Polytechnical University); Yihui Fu (Northwestern Polytechnical University); Yukai Ju (Northwestern Polytechnical University); Lei Xie (NWPU); Weixin Zhu (Tencent); Wei Rao (Tencent); Yannan Wang (Tencent)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-13-SESIMPROVED NORMALIZING FLOW-BASED SPEECH ENHANCEMENT USING AN ALL-POLE GAMMATONE FILTERBANK FOR CONDITIONAL INPUT REPRESENTATIONMartin Strauss (International Audio Laboratories Erlangen); Matteo Torcoli (International Audio Laboratories Erlangen); Bernd Edler (International Audio Laboratories Erlangen)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-12-SESExploring WavLM on Speech EnhancementHyungchan Song (Gwangju Institute of Science and Technology); Sanyuan Chen (Harbin Institute of Technology); Zhuo Chen (Microsoft); Yu Wu (Microsoft Research Asia); Takuya Yoshioka (Microsoft); Min Tang (Microsoft); Jong Won Shin (Gwangju Institute of Science and Technology); Shujie Liu (Microsoft Research Asia)Tue 10 Jan - Morning session (10:30-13:00)
2-1-13-SESAdaptive-FSN: Integrating full-band extraction and adaptive sub-band encoding for monaural speech enhancementYu-Sheng Tsao (National Taiwan Normal University); Kuan-Hsun Ho (NTNU); Jeih-weih Hung (National Chi Nan University); Berlin Chen (National Taiwan Normal University)Tue 10 Jan - Morning session (10:30-13:00)
2-1-26-SESAVSE CHALLENGE: AUDIO-VISUAL SPEECH ENHANCEMENT CHALLENGEAndrea L Aldana (Edinburgh University); Cassia Valentini (University of Edinburgh); Ondrej Klejch (University of Edinburgh); Mandar Gogate (Edinburgh Napier University ); Kia K Dashtipour (Edinburgh Napier University); Amir Hussain (Edinburgh Napier University); Peter Bell (University of Edinburgh )Tue 10 Jan - Morning session (10:30-13:00)
2-2-12-SESTEA-PSE 2.0: SUB-BAND NETWORK FOR REAL-TIME PERSONALIZED SPEECH ENHANCEMENTYukai Ju (Northwestern Polytechnical University); Shimin Zhang (Northwestern Polytechnical University); Wei Rao (Tencent); Yannan Wang (Tencent); Tao Yu (Tencent); Lei Xie (NWPU); Shi-dong Shang (tencent)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-13-SESEEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of SpeakersSoumi Maiti (CMU); Yushi Ueda (CMU); Shinji Watanabe (CMU); chunlei zhang (Tencent AI Lab); Meng Yu (Tencent); Shixiong Zhang (Tencent); Yong Xu (Tecent)Tue 10 Jan - Afternoon session (13:30-15:00)
3-1-12-SESLIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTIONQinghua Liu (Tianjin University); Yating Huang (Institute of Automation, Chinese Academy of Sciences (CASIA)); Yunzhe Hao (Institute of Automation,Chinese Academy of Science); Jiaming Xu (Institute of Automation Chinese Academy of Sciences); Bo Xu (Institute of Automation, Chinese Academy of Sciences)Wed 11 Jan - Morning session (10:30-13:00)
4-1-11-SESEnd-to-End Multi-speaker ASR with Independent Vector AnalysisRobin Scheibler (LINE Corporation); Wangyou Zhang (Shanghai Jiao Tong University); Xuankai Chang (Carnegie Mellon University); Shinji Watanabe (Carnegie Mellon University); Yanmin Qian (Shanghai Jiao Tong University)Thu 12 Jan - Morning session (10:30-13:00)
4-1-12-SESA Hybrid Acoustic Echo Reduction Approach Using Kalman Filtering and Informed Source Extraction With Improved TrainingWolfgang Mack (AudioLabs Erlangen); Emanuel Habets (AudioLabs Erlangen)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-1-14-ANALearning Invariant Representation and Risk Minimized for Unsupervised Accent Domain AdaptationChendong Zhao (The Shenzhen International Graduate School, Tsinghua University, China); Jianzong Wang (Ping An Technology (Shenzhen) Co., Ltd); Xiaoyang Qu (Ping An Technology (Shenzhen) Co., Ltd); Haoqian Wang (Tsinghua Shenzhen International Graduate School, Tsinghua University); Jing Xiao (Ping An Insurance (Group) Company of China)Mon 9 Jan - Morning session (10:30-13:00)
1-2-14-ANAVSAMETER: EVALUATION OF A NEW OPEN-SOURCE TOOL TO MEASURE VOWEL SPACE AREA AND RELATED METRICSTianyu Cao (Johns Hopkins University); Laureano Moro-Velazquez (Johns Hopkins University); Piotr Żelasko (Meaning); Jesús Villalba (Johns Hopkins University); Najim Dehak (Johns Hopkins University)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-14-ANAINVESTIGATING THE IMPORTANT TEMPORAL MODULATIONS FOR DEEP-LEARNING-BASED SPEECH ACTIVITY DETECTIONTyler Vuong (Carnegie Mellon University); Nikhil Madaan (Carnegie Mellon University); Rohan Panda (Carnegie Mellon University); Richard M Stern (Carnegie Mellon University)Tue 10 Jan - Morning session (10:30-13:00)
3-1-13-ANAA MULTI-MODAL ARRAY OF INTERPRETABLE FEATURES TO EVALUATE LANGUAGE AND SPEECH PATTERNS IN DIFFERENT NEUROLOGICAL DISORDERSAnna Favaro (Johns Hopkins University); Chelsie Motley (Johns Hopkins University); Tianyu Cao (Johns Hopkins University); Miguel Iglesias (Johns Hopkins University); Ankur Butala (Johns Hopkins University); Esther S. Oh (Johns Hopkins University); Robert Stevens (Johns Hopkins Hospital); Jesús Villalba (Johns Hopkins University); Najim Dehak (Johns Hopkins University); Laureano Moro-Velazquez (Johns Hopkins University)Wed 11 Jan - Morning session (10:30-13:00)
4-1-13-ANAEfficient dynamic filter for robust and low computational feature extractionDonghyeon Kim (Korea university); Jeong-gi Kwak (Korea University); Hanseok Ko (Korea University)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-2-15-SLRFREQUENCY AND MULTI-SCALE SELECTIVE KERNEL ATTENTION FOR SPEAKER VERIFICATIONSung Hwan Mun (Seoul National University); Jee-weon Jung (Naver Corporation); Min Hyun Han (Seoul National University); Nam Soo Kim (Seoul National University)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-15-SLRAN ATTENTION-BASED BACKEND ALLOWING EFFICIENT FINE-TUNING OF TRANSFORMER MODELS FOR SPEAKER VERIFICATIONJunyi Peng (Brno University of Technology); Oldrich Plchot (Brno University of Technology ); Themos Stafylakis (Omilia - Conversational Intelligence); Ladislav Mosner (Brno University of Technology ); Lukas Burget (Brno University of Technology ); Jan Cernocky (Brno University of Technology )Tue 10 Jan - Morning session (10:30-13:00)
2-2-14-SLRFlow-ER: a Flow-based Embedding Regularization Strategy for Robust Speech Representation LearningWoo Hyun Kang (Computer Research Institute of Montreal); Jahangir Alam (Computer Research Institute of Montreal (CRIM), Montreal (Quebec) Canada); Abderrahim Fathan (Computer Research Institute of Montreal (CRIM), Montreal, Quebec, Canada)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-15-SLRUNSUPERVISED DOMAIN ADAPTATION OF NEURAL PLDA USING SEGMENT PAIRS FOR SPEAKER VERIFICATIONİsmail Rasim Ülgen (Sestek - Boğaziçi University); Mustafa Levent Arslan (Sestek - Boğaziçi Üniversitesi)Tue 10 Jan - Afternoon session (13:30-15:00)
3-1-14-SLRTHE CLEVER HANS EFFECT IN VOICE SPOOFING DETECTIONBhusan Chettri (Borac Solutions)Wed 11 Jan - Morning session (10:30-13:00)
3-1-15-SLRINVESTIGATING ACTIVE-LEARNING-BASED TRAINING DATA SELECTION FOR SPEECH SPOOFING COUNTERMEASUREXin Wang (National Institute of Informatics); Junichi Yamagishi (National Institute of Informatics)Wed 11 Jan - Morning session (10:30-13:00)
4-1-14-SLRHOW TO BOOST ANTI-SPOOFING WITH X-VECTORSXinyue Ma (Tsinghua University); Shanshan Zhang (Tencent Research); Shen Huang (Tencent Research); Ji Gao (Tencent Research); Ying Hu (Xinjiang University); Liang HE (Tsinghua University)Thu 12 Jan - Morning session (10:30-13:00)
4-1-15-SLRA COMPREHENSIVE STUDY ON SELF-SUPERVISED DISTILLATION FOR SPEAKER REPRESENTATION LEARNINGZhengyang Chen (Shanghai Jiao Tong University); Yao Qian (Microsoft); Bing Han (Shanghai Jiao Tong University); Yanmin Qian (Shanghai Jiao Tong University); Michael Zeng (Microsoft)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-1-16-DIAContinual Self-supervised Domain Adaptation for End-to-end Speaker DiarizationJuan Manuel Coria (Université Paris-Saclay CNRS, LISN); Hervé Bredin (CNRS); Sahar Ghannay (Université Paris-Saclay CNRS, LISN); Sophie Rosset (LISN)Mon 9 Jan - Morning session (10:30-13:00)
1-2-16-DIAJoint speaker diarisation and tracking in switching state-space modelJeremy H. M. Wong (Institute for Infocomm Research); Yifan Gong (Microsoft)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-16-DIADiarisation using location tracking with agglomerative clusteringJeremy H. M. Wong (Institute for Infocomm Research); Igor Abramovski (Microsoft); Xiong Xiao (Microsoft); Yifan Gong (Microsoft)Tue 10 Jan - Morning session (10:30-13:00)
2-2-16-DIAMutual Learning of Single- and Multi-Channel End-to-End Neural DiarizationShota Horiguchi (Hitachi, Ltd.); Yuki Takashima (Hitachi, Ltd.); Shinji Watanabe (Carnegie Mellon University); Paola Garcia (Johns Hopkins University)Tue 10 Jan - Afternoon session (13:30-15:00)
3-1-16-DIABERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control CommunicationsJuan Pablo Zuluaga Gomez (Idiap Research Institute); Seyyed Saeed Sarfjoo (Idiap Research Institute); Amrutha Prasad (Idiap Research Institute); Iuliia Nigmatulina (Idiap Research Institute); Petr Motlicek (Idiap); Karel Ondrej (BUT); Oliver Ohneiser (DLR); Hartmut Helmke (DLR)Wed 11 Jan - Morning session (10:30-13:00)
4-1-16-DIALow-Latency Speech Separation Guided Diarization for Telephone ConversationsGiovanni Morrone (Università Politecnica delle Marche); Samuele Cornell (Università Politecnica delle Marche); Desh Raj (Johns Hopkins University); Luca Serafini (Università Politecnica delle Marche); Enrico Zovato (PerVoice S.p.A.); Alessio Brutti (FBK); Stefano Squartini (Università Politecnica delle Marche)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-1-17-TLPFine Grained Spoken Document Summarization Through Text SegmentationSamantha Kotey (Trinity College Dublin); Rozenn Dahyot (Maynooth University); Naomi Harte (Trinity College Dublin)Mon 9 Jan - Morning session (10:30-13:00)
1-2-17-TLPAN ANALYSIS OF THE EFFECTS OF DECODING ALGORITHMS ON FAIRNESS IN OPEN-ENDED LANGUAGE GENERATIONJwala Dhamala (Amazon Alexa AI); Varun Kumar (Amazon Alexa ); Rahul Gupta (Amazon); Kai-Wei Chang (UCLA); Aram Galstyan (USC Information Sciences Institute)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-17-TLPN-BEST HYPOTHESES RERANKING FOR TEXT-TO-SQL SYSTEMSLu Zeng (Amazon); Sree Hari Krishnan Parthasarathi (Amazon); Dilek Z Hakkani-Tur (Amazon Alexa AI)Tue 10 Jan - Morning session (10:30-13:00)
2-2-17-TLPFour-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech RecognitionSharman W Tan (Microsoft); Piyush Behre (Microsoft); Nick Kibre (Microsoft); Issac Alphonso (Microsoft); Shawn Chang ()Tue 10 Jan - Afternoon session (13:30-15:00)
3-1-17-TLPEfficient Text Analysis with Pre-trained Neural Network ModelsJia Cui (Tencent ); Heng Lu (Tencent AI Lab); Wenjie Wang (Emory University); Shiyin Kang (Tencent); Liqiang He (Tencent); Guangzhi Li (Tencent); Dong Yu (Tencent AI Lab)Wed 11 Jan - Morning session (10:30-13:00)
4-1-17-TLPEmpirical Analysis of Training Strategies of Transformer-based Japanese Chit-chat SystemsHiroaki Sugiyama (NTT); Masahiro Mizukami (NTT); Tsunehiro Arimoto (NTT); Hiromi Narimatsu (NTT); Yuya Chiba (NTT); Hideharu Nakajima (NTT); Toyomi Meguro (NTT)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-1-18-MMPPush-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker DetectionXuanjun Chen (National Taiwan University); Haibin Wu (National Taiwan University); Hung-yi Lee (National Taiwan University); Helen Meng (The Chinese University of Hong Kong); Roger Jang ()Mon 9 Jan - Morning session (10:30-13:00)
1-1-19-MMPTowards visually prompted keyword localisation for zero-resource spoken languagesLeanne Nortje (Stellenbosch University); Herman Kamper (Stellenbosch University)Mon 9 Jan - Morning session (10:30-13:00)
1-2-18-MMPExploiting information from native data for non-native automatic pronunciation assessmentBinghuai Lin (MIG, Tencent Science and Technology Ltd.); Liyuan wang (Tencent Technology Co., Ltd)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-8-MMPTRANSFORMER-BASED LIP-READING WITH REGULARIZED DROPOUT AND RELAXED ATTENTIONZhengyang Li (Technische Universität Carolo-Wilhelmina Braunschweig); Timo Lohrenz (Technische Universität Carolo-Wilhelmina Braunschweig); Matthias Dunkelberg (Technische Universität Carolo-Wilhelmina Braunschweig); Tim Fingscheidt (Technische Universität Carolo-Wilhelmina Braunschweig)Tue 10 Jan - Morning session (10:30-13:00)
2-1-18-MMPSpeechCLIP: Integrating Speech with Pre-trained Vision and Language ModelYi-Jen Shih (National Taiwan University); Hsuan-Fu Wang (Academia Sinica); Heng-Jui Chang (Massachusetts Institute of Technology); Layne Berry (University of Texas at Austin); Hung-yi Lee (National Taiwan University); David Harwath (The University of Texas at Austin)Tue 10 Jan - Morning session (10:30-13:00)
2-2-18-MMPYFACC: A Yorùbá Speech-Image Dataset for Cross-lingual Keyword Localisation through Visual GroundingKayode K Olaleye (University of Stellenbosch); Dan Oneață (University Politehnica of Bucharest); Herman Kamper (Stellenbosch University)Tue 10 Jan - Afternoon session (13:30-15:00)
3-1-18-MMPON THE USE OF MODALITY-SPECIFIC LARGE-SCALE PRE-TRAINED ENCODERS FOR MULTIMODAL SENTIMENT ANALYSISAtsushi Ando (NTT Corporation); Ryo Masumura (NTT Corporation); Akihiko Takashima (NTT); Satoshi Suzuki (NTT Computer and Data Science Laboratories / The University of Electro-Communications); Naoki Makishima (NTT Corporation); Keita Suzuki (NTT Corporation); Takafumi Moriya (NTT Corporation); Takanori Ashihara (NTT Corporation); Hiroshi Sato (NTT Corporation)Wed 11 Jan - Morning session (10:30-13:00)
4-1-18-MMPAn Analysis of Semantically-Aligned Speech-Text EmbeddingsMuhammad Huzaifah (Institute for Infocomm Research, ASTAR); Ivan Kukanov (Institute for Infocomm Research, ASTAR)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-1-1-MLPExploration of Language-Specific Self-Attention Parameters for Multilingual End-to-End Speech RecognitionBrady Houston (AWS AI Labs); Katrin Kirchhoff (Amazon)Mon 9 Jan - Morning session (10:30-13:00)
1-1-5-MLPHow Do Phonological Properties Affect Bilingual Automatic Speech Recognition?Shelly Jain (International Institute of Information Technology, Hyderabad); Aditya Yadavalli (International Institute of Information Technology, Hyderabad); Sai Ganesh Mirishkar (IIIT Hyderabad); Anil Vuppala (International Institute of Information Technology Hyderabad)Mon 9 Jan - Morning session (10:30-13:00)
1-1-6-MLPScaling Up Deliberation for Multilingual ASRKe Hu (Google); Tara Sainath (Google); Bo Li (Google)Mon 9 Jan - Morning session (10:30-13:00)
1-2-2-MLPCode-switched language modelling using a code predictive LSTM in under-resourced South African languagesJoshua Miles Jansen Van Vüren (Stellenbosch University); Thomas Niesler (Stellenbosch University)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-5-MLPIMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONSLe Minh Nguyen (University of Groningen); Shekhar Nayak (University of Groningen); Matt Coler (University of Groningen)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-19-MLPTextual Data Augmentation for Arabic-English Code-Switching Speech RecognitionAmir Hussein (Johns Hopkins University); Shammur Chowdhury (QCRI); Ahmed Abdelali (QCRI); Najim Dehak (Johns Hopkins University); Ahmed Ali (Qatar Computing Research Institute, HBKU); Sanjeev Khudanpur (Johns Hopkins University)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-23-MLPFLEURS: FEW-SHOT LEARNING EVALUATION OF UNIVERSAL REPRESENTATIONS OF SPEECHAlexis Conneau (FAIR); Min Ma (Google Research); Simran Khanuja (Google); Yu Zhang (Google); Vera Axelrod (Google, Inc); Siddharth Dalmia (Carnegie Mellon University ); Jason Riesa (Google); Clara Rivera (Google); Ankur Bapna (Google Research)Tue 10 Jan - Morning session (10:30-13:00)
2-2-8-MLPImproving Semi-supervised E2E ASR using CycleGAN and Inter-domain LossesChia-Yu Li (Institute for Natural Language Processing (IMS), University of Stuttgart); Ngoc Thang Vu (University of Stuttgart)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-19-MLPMULTILINGUAL SPEECH EMOTION RECOGNITION WITH MULTI-GATING MECHANISM AND NEURAL ARCHITECTURE SEARCHZihan Wang (Columbia University); Qi Meng (Columbia University); Haifeng Lan (Columbia University ); xinrui zhang (Columbia University); Kehao Guo (Columbia University); Akshat Gupta (JPMorgan)Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-22-MLPDisentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using $\beta$-VAEHui Lu (The Chinese University of Hong Kong); Disong Wang (The Chinese University of Hong Kong); Xixin Wu (The Chinese University of Hong Kong); Zhiyong Wu (Tsinghua University); Xunying Liu (The Chinese University of Hong Kong); Helen Meng (The Chinese University of Hong Kong)Tue 10 Jan - Afternoon session (13:30-15:00)
3-1-2-MLPExploring a unified ASR for multiple south Indian languages leveraging multilingual acoustic and language modelsANOOP C. S. (Indian Institute of Science, Bengaluru); Ramakrishnan A G (INDIAN INSTITUTE OF SCIENCE)Wed 11 Jan - Morning session (10:30-13:00)
4-1-5-MLPA Truly Multilingual First Pass and Monolingual Second Pass Streaming On-Device ASR SystemSepand Mavandadi (Google); Bo Li (Google); Chao Zhang (Google); Brian Farris (Google); Tara Sainath (Google); Trevor Strohman‎ (Google)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-1-20-EMRSPEECH EMOTION RECOGNITION WITH COMPLEMENTARY ACOUSTIC REPRESENTATIONSXiaoming Zhang (Nanjing University of Technology); Fan Zhang (IBM Massachusetts Labratory); Xiaodong Cui (IBM T. J. Watson Research Center); Wei Zhang (Wayfair)Mon 9 Jan - Morning session (10:30-13:00)
1-2-20-EMRA ZERO-SHOT APPROACH TO IDENTIFYING CHILDREN’S SPEECH IN AUTOMATIC GENDER CLASSIFICATIONAmruta Saraf (Pindrop); Ganesh Sivaraman (Pindrop); Elie Khoury (Pindrop)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-19-EMRDistribution-based Emotion Recognition in ConversationWen Wu (University of Cambridge); Chao Zhang (University of Cambridge); Phil Woodland (Machine Intelligence Laboratory, Cambridge University Department of Engineering)Tue 10 Jan - Morning session (10:30-13:00)
3-1-19-EMRExploration of A Self-Supervised Speech Model: A Study on Emotional CorporaYuanchao Li (University of Edinburgh); Yumnah Mohamied (University of Edinburgh); Peter Bell (University of Edinburgh ); Catherine Lai (University of Edinburgh)Wed 11 Jan - Morning session (10:30-13:00)
4-1-19-EMRCombining Contrastive and Non-Contrastive Losses for Fine-Tuning Pretrained Models in Speech AnalysisFlorian Lux (University of Stuttgart); Ching-Yi Chen (University of Stuttgart); Ngoc Thang Vu (University of Stuttgart)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-1-21-TTSWaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point IterationYuma Koizumi (Google); Kohei Yatabe (Tokyo University of Agriculture and Technology); Heiga Zen (Google); Michiel Bacchiani (Google)Mon 9 Jan - Morning session (10:30-13:00)
1-1-22-TTSOn granularity of prosodic representations in expressive text-to-speechMikolaj Babianski (Amazon); Kamil Pokora (Amazon); Raahil Shah (Amazon); Rafał Sienkiewicz (Amazon); Daniel Korzekwa (Amazon); Viacheslav Klimkov (Amazon)Mon 9 Jan - Morning session (10:30-13:00)
1-1-23-TTSCan we use Common Voice to train a Multi-Speaker TTS system?Sewade O Ogun (Inria); Vincent Colotte (LORIA); Emmanuel Vincent (Inria)Mon 9 Jan - Morning session (10:30-13:00)
1-2-21-TTSGAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion ModelsMatthew Baas (Stellenbosch University); Herman Kamper (Stellenbosch University)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-22-TTSAnonymizing Speech with Generative Adversarial Networks to Preserve Speaker PrivacySarina Meyer (University of Stuttgart); Pascal Tilli (University of Stuttgart); Pavel Denisov (University of Stuttgart); Florian Lux (University of Stuttgart); Julia Koch (University of Stuttgart); Ngoc Thang Vu (University of Stuttgart)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-20-TTSStyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS ModelsYinghao A Li (Columbia University); Cong Han (Columbia Univeristy); Nima Mesgarani (Columbia University)Tue 10 Jan - Morning session (10:30-13:00)
2-1-21-TTSLearning accent representation with multi-level VAE towards controllable speech synthesisJan Melechovsky (Singapore University of Technology and Design); Ambuj Mehrish (SUTD); Dorien Herremans (Singapore University of Technology and Design); Berrak Sisman (Singapore University of Technology and Design (SUTD))Tue 10 Jan - Morning session (10:30-13:00)
2-1-22-TTSvTTS: visual-text to speechYoshifumi Nakano (The University of Tokyo); Takaaki Saeki (The University of Tokyo); Shinnosuke Takamichi (The University of Tokyo); Katsuhito Sudoh (Nara Institute of Science and Techonology); Hiroshi Saruwatari (The University of Tokyo)Tue 10 Jan - Morning session (10:30-13:00)
2-2-20-TTSGenerative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered SpeechDominik Wagner (Technische Hochschule Nuernberg Georg Simon Ohm); Sebastian P Bayerl (Technische Hochschule Nürnberg Georg Simon Ohm); Hector Cordourier (Intel); Tobias Bocklet (TH Nürnberg )Tue 10 Jan - Afternoon session (13:30-15:00)
2-2-21-TTSTwo-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversionDing Ma (Nagoya University); Lester Phillip G Violeta (Nagoya University); Kazuhiro Kobayashi (Nagoya University); Tomoki Toda (Nagoya University)Tue 10 Jan - Afternoon session (13:30-15:00)
3-1-20-TTSSIMD-SIZE AWARE WEIGHT REGULARIZATION FOR FAST NEURAL VOCODING ON CPUHiroki Kanagawa (NTT Corporation); Yusuke Ijima (NTT Corporation)Wed 11 Jan - Morning session (10:30-13:00)
3-1-21-TTSExact Prosody Cloning in Zero-Shot Multispeaker Text-to-SpeechFlorian Lux (University of Stuttgart); Julia Koch (University of Stuttgart); Ngoc Thang Vu (University of Stuttgart)Wed 11 Jan - Morning session (10:30-13:00)
3-1-22-TTSNix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise DistillationRendi Chevi (Kata.ai); Radityo Eko Prasojo (Kata.ai); Alham Fikri Aji (Amazon); Andros Tjandra (Meta AI, US); Sakriani Sakti (Japan Advanced Institute of Science and Technology)Wed 11 Jan - Morning session (10:30-13:00)
4-1-20-TTSRegotron: Regularizing the Tacotron2 architecture via monotonic alignment lossEfthymios Georgiou (National Technical University of Athens); Kosmas Kritsis (Athena Research Center); Georgios Paraskevopoulos (National Technical University of Athens); Athanasios Katsamanis (ATHENA R.C., Behavioral Signal Technologies); Vassilis Katsouros (Athena Research Center); Alexandros Potamianos (National Technical University of Athens)Thu 12 Jan - Morning session (10:30-13:00)
4-1-21-TTSRemap, warp and attend: Non-parallel many-to-many accent conversion with Normalizing FlowsAbdelhamid Ezzerg (Amazon); Thomas Merritt (Amazon); Kayoko Yanagisawa (Amazon); Piotr Bilinski (Amazon); Magdalena Proszewska (Jagiellonian University); Kamil Pokora (Amazon); Renard Korzeniowski (Amazon); Roberto Barra-Chicote (Amazon); Daniel Korzekwa (amazon)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-2-23-RESSTOP: A DATASET FOR SPOKEN TASK ORIENTED SEMANTIC PARSINGPaden Tomasello (Meta); Akshat Shrivastava (Meta); Daniel A Lazar (Meta); Po-chun Hsu (Meta); Duc Le (Meta); Adithya Sagar (Facebook AI); Ali Elkahky (Meta); Jade Copet (Meta); Wei-Ning Hsu (Massachusetts Institute of Technology); Yossi Adi (Facebook AI Research ); Robin Algayres (Meta); Tu Anh Nguyen (Meta); Emmanuel Dupoux (Facebook AI Research); Luke Zettlemoyer (Facebook); Abdel-rahman Mohamed (Facebook AI Research (FAIR))Mon 9 Jan - Afternoon session (13:30-15:00)
2-2-23-RESBenchmarking Evaluation Metrics for Code-Switching Automatic Speech RecognitionInjy Hamed (New York University Abu Dhabi; Stuttgart University); Amir Hussein (Johns Hopkins University); Oumnia Chellah (Stanford University); Shammur Chowdhury (QCRI); Hamdy Mubarak (Qatar Computing Research Institute, HBKU); Sunayana Sitaram (Microsoft Research); Nizar Habash (); Ahmed Ali (Qatar Computing Research Institute, HBKU)Tue 10 Jan - Afternoon session (13:30-15:00)
4-1-22-RESMASC: Massive Arabic Speech CorpusMohammad Al-Fetyani (Appswave); Mohammad AlBarham (Appswave); Gheith A. Abandah (); Adham Alsharkawi (The University of Jordan); Maha Dawas (Planning and Statistics Authority)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-1-15-MLSSpeed-Robust Keyword Spotting via Soft Self-Attention on Multi-Scale FeaturesChaoyue Ding (SenseTime Group Limited); Jiakui Li (SenseTime Group Limited); Martin Zong (SenseTime Group Limited); Baoxiang Li (SenseTime Group Limited)Mon 9 Jan - Morning session (10:30-13:00)
1-1-24-MLSDistilling Sequence-to-Sequence Voice Conversion Models For Streaming Conversion ApplicationsKou Tanaka (NTT corpration); Hirokazu Kameoka (NTT Communication Science Laboratories, NTT Corporation); Takuhiro Kaneko (NTT Corporation); Shogo Seki (NTT Corporation)Mon 9 Jan - Morning session (10:30-13:00)
1-1-25-MLSAUTOMATIC PREDICTION OF INTELLIGIBILITY OF WORDS AND PHONEMES PRODUCED ORALLY BY JAPANESE LEARNERS OF ENGLISHNobuaki Minematsu (The University of Tokyo); Chuanbo Zhu (The University of Tokyo); Takuya Kunihara (The University of Tokyo); Daisuke Saito (The University of Tokyo); Noriko Nakanishi (Kobe Gakuin University)Mon 9 Jan - Morning session (10:30-13:00)
1-2-24-MLSSVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution LearningZuheng Kang (Ping An Technology (Shenzhen) Co., Ltd); Jianzong Wang (Ping An Technology (Shenzhen) Co., Ltd); Junqing Peng (Ping An Technology (Shenzhen) Co., Ltd); Jing Xiao (Ping An Insurance (Group) Company of China)Mon 9 Jan - Afternoon session (13:30-15:00)
1-2-25-MLSPEPPANET: EFFECTIVE MISPRONUNCIATION DETECTION AND DIAGNOSIS LEVERAGING PHONETIC, PHONOLOGICAL, AND ACOUSTIC CUESBi-Cheng Yan (National Taiwan Normal University ); Hsin-Wei Wang (National Taiwan Normal University); Berlin Chen (National Taiwan Normal University)Mon 9 Jan - Afternoon session (13:30-15:00)
2-1-24-MLSImplicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech DetectionSamuele Cornell (Università Politecnica delle Marche); Thomas Balestri (Amazon); Thibaud Senechal (Amazon)Tue 10 Jan - Morning session (10:30-13:00)
2-2-24-MLSPhoneme Segmentation Using Self-Supervised Speech ModelsLuke Strgar (University of Texas, Austin); David Harwath (The University of Texas at Austin)Tue 10 Jan - Afternoon session (13:30-15:00)
3-1-23-MLSAn Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech RecognitionChao-Han Huck Yang (Georgia Institute of Technology ); I-Fan Chen (Amazon Inc.); Andreas Stolcke (Amazon); Sabato M Siniscalchi (Kore University of Enna); Chin-hui Lee (Georgia Institute of Technology)Wed 11 Jan - Morning session (10:30-13:00)
3-1-24-MLSTDOA ESTIMATION OF SPEECH SOURCE IN NOISY REVERBERANT ENVIRONMENTSSuliang Bu (University of Missouri); Tuo Zhao (University of Missouri); Yunxin Zhao (University of Missouri)Wed 11 Jan - Morning session (10:30-13:00)
4-1-23-MLSPHONE-LEVEL PRONUNCIATION SCORING FOR L1 USING WEIGHTED-DYNAMIC TIME WARPINGAghilas SINI (Univ Rennes, CNRS, IRISA); Antoine Perquin (Univ Rennes, CNRS, IRISA); Damien Lolive (Univ Rennes, CNRS, IRISA); Arnaud Delhay (IRISA)Thu 12 Jan - Morning session (10:30-13:00)
4-1-24-MLSPROFICIENCY ASSESSMENT OF L2 SPOKEN ENGLISH USING WAV2VEC 2.0Stefano Bannò (University of Trento); Marco Matassoni (Fondazione Bruno Kessler)Thu 12 Jan - Morning session (10:30-13:00)

Poster IDPaper TitleAuthorsSession
1-1-26-SUPOn the Utility of Self-supervised Models for Prosody-related TasksGuan-Ting Lin (National Taiwan University); Chi Luen Feng (National Taiwan University); Wei-Ping Huang (National Taiwan University); Yuan Tseng (National Taiwan University); Chen An Li (National Taiwan University); Tzu-Han Lin (National Taiwan University ); Hung-yi Lee (National Taiwan University); Nigel Ward (UTEP)Mon 9 Jan - Morning session (10:30-13:00)
2-1-25-SUPImproving generalizability of distilled self-supervised speech processing models under distorted settingsKuan-Po Huang (National Taiwan University); YU-KUAN FU (NTU); Tsu-Yuan Hsu (National Taiwan University); Fabian Alejandro Ritter Gutierrez (National University of Singapore); Fan-Lin Wang (Academia Sinica); Liang-Hsuan Tseng (National Taiwan University); Yu Zhang (Google); Hung-yi Lee (National Taiwan University)Tue 10 Jan - Morning session (10:30-13:00)
2-2-25-SUPExploring Efficient-tuning Methods in Self-supervised Speech ModelsZih-Ching Chen (National Taiwan University); Chin-Lun Fu (National Taiwan University); Chih Ying Liu (National Taiwan University); Shang-Wen Li (AWS AI); Hung-yi Lee (National Taiwan University)Tue 10 Jan - Afternoon session (13:30-15:00)
3-1-25-SUPOn Compressing Sequences for Self-Supervised Speech ModelsYen Meng (National Taiwan University); Hsuan-Jui Chen (National Taiwan University); Jiatong Shi (Carnegie Mellon University); Shinji Watanabe (Carnegie Mellon University); Paola Garcia (Johns Hopkins University); Hung-yi Lee (National Taiwan University); Hao Tang (The University of Edinburgh)Wed 11 Jan - Morning session (10:30-13:00)
3-2-1-SUPSUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningTzu-hsun Feng (National Taiwan University); Annie Dong (Meta); Ching-Feng Yeh (Facebook); Shu-wen Yang (National Taiwan University); Tzu-Quan Lin (National Taiwan University); Jiatong Shi (Carnegie Mellon University); Kai-Wei Chang (National Taiwan University); Zili Huang (Johns Hopkins University); Haibin Wu (National Taiwan University); Xuankai Chang (Carnegie Mellon University); Shinji Watanabe (Carnegie Mellon University); Abdel-rahman Mohamed (Facebook AI Research (FAIR)); Shang-Wen Li (Meta); Hung-yi Lee (National Taiwan University)Wed 11 Jan - Afternoon session (13:30-15:00)
4-1-25-SUPExtracting speaker and emotion information from self-supervised speech models via channel-wise correlationsThemos Stafylakis (Omilia - Conversational Intelligence); Ladislav Mošner (Brno University of Technology); Sofoklis Kakouros (University of Helsinki); Plchot Oldřich (Brno University of Technology); Lukas Burget (Brno University of Technology); Jan Honza Cernocky (Brno University of Technology)Thu 12 Jan - Morning session (10:30-13:00)