Keynote Speakers

Meet them in Doha, Qatar

Nima Mesgarani

Keynote: Monday 09:00 AM, January 9th, 2023

Abstract

Brain-controlled assistive hearing technologies
Listening in noisy and crowded environments is a challenging task. Assistive hearing devices can suppress certain types of background noise, but they cannot help a user focus on a single conversation amongst many without knowing which speaker is the target. Our recent scientific discoveries of speech processing in the human auditory cortex have motivated several new paths to enhance the efficacy of hearable technologies. These possibilities include, I) speech neuroprosthesis which aims to establish a direct communication channel with the brain, II) auditory attention decoding where the similarity of a listener’s brainwave to the sources in the acoustic scene is used to identify the target source, and III) increased speech perception using electrical brain stimulation. In parallel, the field of auditory scene analysis has recently seen great progress due to the emergence of deep learning models, where even solving the multi-talker speech recognition is no longer out of reach. I will discuss our recent efforts in bringing together the latest progress in speech neurophysiology, brain-computer interfaces, and speech processing technologies to design and actualize the next generation of assistive hearing devices, with the potential to augment speech communication in realistic and challenging acoustic conditions.

Biography

Nima Mesgarani is an associate professor at Zuckerman Mind, Brain, Behavior Institute of Columbia University in the City of New York. He received his Ph.D. from the University of Maryland and was a postdoctoral scholar at the Center for Language and Speech Processing at Johns Hopkins University and the Neurosurgery Department of the University of California San Francisco. He received the National Science Foundation Early Career Award in 2015, Pew Scholar for Innovative Biomedical Research Award in 2016, and Auditory Neuroscience Young Investigator Awards in 2019. His research was selected among the top-10 innovations of 2018 by UNICEF-Netexplo, top-10 breakthroughs of 2019 by Institute of Physics, and top-10 health innovations of 2019 by Healthcare Innovation. His interdisciplinary research combines experimental and computational methods to study speech communication in the human brain which critically impacts research in artificial models of speech processing and speech brain-computer interface technologies.

Dr. Preslav Nakov

Keynote: Monday 02:00 PM, January 9th, 2023

Abstract

Multimodal fake news detection
Multimodal Fake News Detection: While initially primarily text-based, fake news has been getting increasingly multimodal, involving images (e.g., in memes), speech, and video. I will describe work on fact-checking real-world claims made in the context of political debates using both the textual and the speech modality. I will further cover some tasks that can support journalists and fact-checkers in their work such as analyzing political debates, speeches or live interviews, and spotting interesting claims to fact-check as well as detecting claims that have been fact-checked already (the latter would allow the journalist/moderator to put the politician on the spot in real time when the politician repeats a known lie), where the speech and the textual modality can complement each other. Finally, I will discuss profiling entire news outlets for their factuality and bias (which allows to detect the fake news before it was even written, by checking how trustworthy the outlet that has published it is, which is what journalists actually do), based on what they write, on how users react to that, but also on the multimodal content they use, e.g., the speech signal in the videos they post, which allows to model not only what was said, but also how it was said.

Biography

Dr. Preslav Nakov is Professor and Acting Deputy Department Chair at the Natural Language Processing department of Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). His current research focuses on detecting and understanding disinformation, propaganda, fake news, and media bias. Previously, he was a Principal Scientist at the Qatar Computing Research Institute, HBKU, where he led the Tanbih mega-project (developed in collaboration with MIT), which aims to limit the impact of "fake news", propaganda and media bias by making users aware of what they are reading, thus promoting media literacy and critical thinking. He received his PhD degree in Computer Science from the University of California at Berkeley, supported by a Fulbright grant. Dr. Preslav Nakov is President of ACL SIGLEX, Secretary of ACL SIGSLAV, Secretary of the Truth and Trust Online board of trustees, and member of the EACL advisory board; he was also a PC chair of ACL 2022. He is a member of the editorial board of several journals including Computational Linguistics, TACL, ACM TOIS, IEEE TASL, IEEE TAC, CS&L, NLE, AI Communications, and Frontiers in AI. He authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and 250+ research papers. He received a best paper award at ACM WebSci 2022 for work on propaganda and coordinated community detection, a best paper award at CIKM 2020 for work on fake news detection in social media, a best demo paper award (honorable mention) at ACL 2020 and a best task paper award (honorable mention) at SemEval 2020, both for work on detecting propaganda techniques in text, as well as a Young Researcher Award at RANLP’2011. He was also the first to receive the Bulgarian President's John Atanasoff award, named after the inventor of the first automatic electronic digital computer. Dr. Nakov's research was featured by over 100 news outlets, including MIT Technology Review, CACM Research Highlights, Forbes, Boston Globe, Al Jazeera, Science Daily, Popular Science, Fast Company, The Register, WIRED, and Engadget.

Abdelrahman Mohamed

Keynote: Tuesday 09:00 AM, January 10th, 2023

Abstract

How did speech representation learning successes lead to the emergence of textless NLP research?
Recent successes of self-supervised Speech Representation Learning (SRL) approaches redefined performance in an array of downstream speech processing tasks, both for generation and recognition, even with ultra-low annotation resources. Given its independence from text resources, SRL opened the gate for modeling oral languages and dialects. Furthermore, it enabled directly modeling of oral language and audio, which carry a lot of nuances, intonations (irony), expressive vocalization (laughter), and sounds of everyday life (sizzling food sound), making AI applications more natural, inclusive, and expressive. This talk starts with a short highlight of key methods for SRL before diving into recent advances in textless recognition and generation.

Biography

Abdelrahman Mohamed is a research scientist at Meta’s FAIR group. Before Meta, he was a principal scientist/manager in Amazon Alexa and a researcher in Microsoft Research. Abdelrahman was part of the team that started the Deep Learning revolution in Spoken Language Processing in 2009. His research work spans speech recognition, representation learning using weakly-, semi-, self-supervised methods, language understanding, and modular deep learning. Abdelrahman has more than 70 research journal and conference publications with more than 35,000 citations. He is the recipient of the IEEE Signal Processing Society Best Journal Paper Award for 2016. His current research interest focuses on improving, using, and benchmarking learned speech representations, e.g. HuBERT, Wav2vec 2.0, TextlessNLP, and SUPERB.

Verena Rieser

Keynote: Tuesday 02:00 PM, January 10th, 2023

Abstract

A short history of data-driven dialogue systems in 5 acts: Where do we go from here?
With continued progress in deep learning, there has been an increased interest in dialogue systems, also known as “Conversational AI”. In this talk, I will provide a short review of the past 20 years of data-driven system development through the lens of 5 major initiatives I was involved in.

I will focus on the sub-task of response generation, for which I will highlight lessons learnt and ongoing challenges, which includes reducing `hallucinations’ for task-based systems, safety critical issues for open-domain chatbots, and the often overlooked problem of `good’ persona design. Throughout my talk I will ask more general questions of whether we should `blame’ the data, the model, the evaluation, or the design.

Biography

Verena Rieser is a full professor at Heriot-Watt University in Edinburgh, where she leads research on Conversational AI at the intersection of Natural Language Processing and Machine Learning. She is also a co-founder of ALANA AI and the Director of Ethics at the UK National Center for Robotics.
Verena has 20 years of experience in developing and researching data-driven conversational systems. In the early 2000s, she was one of a handful of researchers who developed a series of breakthrough innovations that laid the groundwork for statistical dialogue control using Reinforcement Learning. More recently, Verena and her team pioneered work on end-to-end natural language generation. Her current focus is on identifying and addressing ethical and societal risks in neural conversational systems.
Verena received her PhD in 2008 from Saarland University and then joined the University of Edinburgh as a postdoctoral research fellow, before taking up a faculty position at Heriot-Watt in 2011 where she was promoted to full professor in 2017. In 2020, Verena received a Royal Society/Leverhulme Senior Research Fellowship in recognition of her work in developing multimodal conversational systems.

Yaser Onaizan

Keynote: Wednesday 09:00 AM, January 11th, 2023

Abstract

Building ASR system for Arabic Saudi dialects
Severe dialectal and regional variations of Arabic presents a serious challenge for developers of Arabic Language Technologies. Spoken Arabic is even more challenging with dialectal variations, borrowing from several diverse languages such as Turkish, Farsi, French, Italian, and English. In this talk we will present some of the main challenges we faced in building SauTech, a state-of-the-art speech recognition system for Saudi dialects. We will also give a brief overview of Saudi Arabia’s Arabic Language Technology strategy.

Biography

Dr. Yaser Alonaizan joined the Saudi Data and Artificial Intelligence (SDAIA)’s National Center for AI as its Deputy CEO and Chief Scientist in January of 2022. He led Amazon AWS AI Labs focusing on Human Language Technology between 2017-2022. Prior to that, he worked at IBM Watson Research and Watson cognitive services with a focus on multilingual NLP. Throughout his career in the US, Dr. Yaser has won several outstanding innovation and leadership awards at USC/ISI, IBM, and Amazon. He served on the executive board of multiple technological organizations and was a member of many NLP conferences. He served as Associate Editor for ACM TALLIP. He has authored more than 60 refereed scientific papers in world-renowned international conferences and scientific journals and obtained 20+ US-based patents. He attained his Master of Science and Ph.D. in Computer Science from the University of Southern California’s Information Sciences Institute in 1996 & 2002, respectively. He also holds a Master of Business Administration (MBA) from Columbia University in the City of New York.