Keynote Speakers

Meet them in Doha, Qatar

Brain-controlled assistive hearing technologies

Listening in noisy and crowded environments is a challenging task. Assistive hearing devices can suppress certain types of background noise, but they cannot help a user focus on a single conversation amongst many without knowing which speaker is the target. Our recent scientific discoveries of speech processing in the human auditory cortex have motivated several new paths to enhance the efficacy of hearable technologies. These possibilities include, I) speech neuroprosthesis which aims to establish a direct communication channel with the brain, II) auditory attention decoding where the similarity of a listener’s brainwave to the sources in the acoustic scene is used to identify the target source, and III) increased speech perception using electrical brain stimulation. In parallel, the field of auditory scene analysis has recently seen great progress due to the emergence of deep learning models, where even solving the multi-talker speech recognition is no longer out of reach. I will discuss our recent efforts in bringing together the latest progress in speech neurophysiology, brain-computer interfaces, and speech processing technologies to design and actualize the next generation of assistive hearing devices, with the potential to augment speech communication in realistic and challenging acoustic conditions.

Nima Mesgarani
Biography: Nima Mesgarani is an associate professor at Zuckerman Mind, Brain, Behavior Institute of Columbia University in the City of New York. He received his Ph.D. from the University of Maryland and was a postdoctoral scholar at the Center for Language and Speech Processing at Johns Hopkins University and the Neurosurgery Department of the University of California San Francisco. He received the National Science Foundation Early Career Award in 2015, Pew Scholar for Innovative Biomedical Research Award in 2016, and Auditory Neuroscience Young Investigator Awards in 2019. His research was selected among the top-10 innovations of 2018 by UNICEF-Netexplo, top-10 breakthroughs of 2019 by Institute of Physics, and top-10 health innovations of 2019 by Healthcare Innovation. His interdisciplinary research combines experimental and computational methods to study speech communication in the human brain which critically impacts research in artificial models of speech processing and speech brain-computer interface technologies.
More data or larger model? Beyond brute-force machine learning for next generation dialogue models

Today's large pre-trained language models have astonishing abilities in generating human-like text. However, when these language models are integrated into dialogue systems, and in particular into task-oriented ones, their capabilities do not translate into an equally astonishing performance of the overall system. Large amounts of additional, carefully labelled data are needed to get the systems up and running. So are larger models and more data all we need to facilitate human-computer interaction? In this talk, I will examine the main aspects of dialogue modelling and indicate how each can benefit from qualitatively novel methods. I will start with the use of topological data analysis for dialogue term extraction, a key step in dialogue ontology construction. Next, I will explain how to achieve robustness in dialogue belief tracking in the absence of fine-grained labels. Finally, I will talk about parameter-efficient ways to introduce continual reinforcement learning for life-long optimisation of the dialogue policy. Each of these methods have proven successful for dialogue modelling and are promising for a wider range of applications.

Milica Gašić
Biography: Milica Gašić is a Professor of the Dialog Systems and Machine Learning Group at the Heinrich Heine University Düsseldorf. Her research focuses on fundamental questions of human-computer dialogue modelling and lie in the intersection of Natural Language Processing and Machine Learning. Prior to her current position she was a Lecturer in Spoken Dialogue Systems at the Department of Engineering, University of Cambridge where she was leading the Dialogue Systems Group. Previously, she was a Research Associate and a Senior Research Associate in the same group and a Research Fellow at Murray Edwards College. She completed her PhD under the supervision of Professor Steve Young and the topic of her thesis was Statistical Dialogue Modelling for which she received an EPSRC PhD Plus Award. She holds and MPhil degree in Computer Speech, Text and Internet Technology from the University of Cambridge and Diploma (BSc. equivalent) in Mathematics and Computer Science from the University of Belgrade. She is a member of ACL, a member of ELLIS and a senior member of IEEE as well as a member of the International Scientific Advisory Board of DFKI.
How did speech representation learning successes lead to the emergence of textless NLP research?

Recent successes of self-supervised Speech Representation Learning (SRL) approaches redefined performance in an array of downstream speech processing tasks, both for generation and recognition, even with ultra-low annotation resources. Given its independence from text resources, SRL opened the gate for modeling oral languages and dialects. Furthermore, it enabled directly modeling of oral language and audio, which carry a lot of nuances, intonations (irony), expressive vocalization (laughter), and sounds of everyday life (sizzling food sound), making AI applications more natural, inclusive, and expressive. This talk starts with a short highlight of key methods for SRL before diving into recent advances in textless recognition and generation.

Abdelrahman Mohamed
Biography: Abdelrahman Mohamed is a research scientist at Meta’s FAIR group. Before Meta, he was a principal scientist/manager in Amazon Alexa and a researcher in Microsoft Research. Abdelrahman was part of the team that started the Deep Learning revolution in Spoken Language Processing in 2009. His research work spans speech recognition, representation learning using weakly-, semi-, self-supervised methods, language understanding, and modular deep learning. Abdelrahman has more than 70 research journal and conference publications with more than 35,000 citations. He is the recipient of the IEEE Signal Processing Society Best Journal Paper Award for 2016. His current research interest focuses on improving, using, and benchmarking learned speech representations, e.g. HuBERT, Wav2vec 2.0, TextlessNLP, and SUPERB.
Multimodal fake news detection

Multimodal Fake News Detection: While initially primarily text-based, fake news has been getting increasingly multimodal, involving images (e.g., in memes), speech, and video. I will describe work on fact-checking real-world claims made in the context of political debates using both the textual and the speech modality. I will further cover some tasks that can support journalists and fact-checkers in their work such as analyzing political debates, speeches or live interviews, and spotting interesting claims to fact-check as well as detecting claims that have been fact-checked already (the latter would allow the journalist/moderator to put the politician on the spot in real time when the politician repeats a known lie), where the speech and the textual modality can complement each other. Finally, I will discuss profiling entire news outlets for their factuality and bias (which allows to detect the fake news before it was even written, by checking how trustworthy the outlet that has published it is, which is what journalists actually do), based on what they write, on how users react to that, but also on the multimodal content they use, e.g., the speech signal in the videos they post, which allows to model not only what was said, but also how it was said.

Dr. Preslav Nakov
Biography: Dr. Preslav Nakov is Professor and Acting Deputy Department Chair at the Natural Language Processing department of Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). His current research focuses on detecting and understanding disinformation, propaganda, fake news, and media bias. Previously, he was a Principal Scientist at the Qatar Computing Research Institute, HBKU, where he led the Tanbih mega-project (developed in collaboration with MIT), which aims to limit the impact of "fake news", propaganda and media bias by making users aware of what they are reading, thus promoting media literacy and critical thinking. He received his PhD degree in Computer Science from the University of California at Berkeley, supported by a Fulbright grant. Dr. Preslav Nakov is President of ACL SIGLEX, Secretary of ACL SIGSLAV, Secretary of the Truth and Trust Online board of trustees, and member of the EACL advisory board; he was also a PC chair of ACL 2022. He is a member of the editorial board of several journals including Computational Linguistics, TACL, ACM TOIS, IEEE TASL, IEEE TAC, CS&L, NLE, AI Communications, and Frontiers in AI. He authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and 250+ research papers. He received a best paper award at ACM WebSci 2022 for work on propaganda and coordinated community detection, a best paper award at CIKM 2020 for work on fake news detection in social media, a best demo paper award (honorable mention) at ACL 2020 and a best task paper award (honorable mention) at SemEval 2020, both for work on detecting propaganda techniques in text, as well as a Young Researcher Award at RANLP’2011. He was also the first to receive the Bulgarian President's John Atanasoff award, named after the inventor of the first automatic electronic digital computer. Dr. Nakov's research was featured by over 100 news outlets, including MIT Technology Review, CACM Research Highlights, Forbes, Boston Globe, Al Jazeera, Science Daily, Popular Science, Fast Company, The Register, WIRED, and Engadget.
Building ASR system for Arabic Saudi dialects

Severe dialectal and regional variations of Arabic presents a serious challenge for developers of Arabic Language Technologies. Spoken Arabic is even more challenging with dialectal variations, borrowing from several diverse languages such as Turkish, Farsi, French, Italian, and English. In this talk we will present some of the main challenges we faced in building SauTech, a state-of-the-art speech recognition system for Saudi dialects. We will also give a brief overview of Saudi Arabia’s Arabic Language Technology strategy.

Yaser Onaizan
Affiliation: Deputy CEO at the National Center for AI (NCAI), Saudi Data and AI Authority (SDAIA)