Posts by Collection

portfolio

publications

Investigating the Emergent Audio Classification Ability of ASR Foundation Models

Published in 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)

In this work we investigate the ability of Whisper and MMS, ASR foundation models trained primarily for speech recognition, to perform zero-shot audio classification. With simple template-based text prompts, we demonstrate that Whisper shows promising zero-shot classification performance on 8 audio-classification datasets, outperforming existing state-of-the-art zero-shot baseline accuracy by an average of 9%. To unlock the emergent ability, we introduce debiasing approaches. A simple unsupervised reweighting method of the class probabilities yields consistent significant performance gains. We also show that performance increases with model size, implying that as ASR foundation models scale up, they may exhibit improved zero-shot performance.

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.