Currently, to reduce the dependency on large labeled data sets, self-supervised learning (SSL) has emerged as a popular approach in speech processing. However, what attributes make SSL capable in various conditions and tasks is under-explored. The goal for the SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning is to benchmark in multiple aspects the capability of Self-Supervised Learning (SSL) representations on speech under a standard and comprehensive framework, which is designed to provide more comparable results and analysis. With the challenge, we hope to work jointly with the community to understand the mechanism and efficacy of popular SSL techniques in various conditions and further inspire innovation in the area.
The evaluation framework of the SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning is similar to what introduced in SUPERB Benchmark, where various SSL representations are fine-tuned on various speech processing tasks with consistent recipes. This challenge includes 10 tasks, from SUPERB Benchmark and SUPERB-SG Benchmark, to measure the Content, Speaker, Paralinguistics, Semantics and Generation capabilities in SSL representations. To encourage innovating for gains beyond accuracy, such as computation efficiency and low memory-footprint, we employ diverse metrics, including memory usage and number of operations. We do NOT provide an overall metric across tasks, accuracy, and computation and memory efficiency to rank submissions for two reasons: 1) motivate the holistic understanding of SSL techniques’ attributes but not army racing for accuracy or a single metric, 2) welcome submissions for subsets of tasks such that more researchers can participate.