Audio-Language Datasets of Scenes and Events: A Survey

Audio-language models (ALMs) generate linguistic descriptions of sound-producing events and scenes. Advances in dataset creation and computational power have led to significant progress in this domain. This paper surveys 69 datasets used to train ALMs, covering research up to September 2024 (<uri...

Full description

Saved in:
Bibliographic Details
Main Authors: Gijs Wijngaard, Elia Formisano, Michele Esposito, Michel Dumontier
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10854210/
Tags: Add Tag
No Tags, Be the first to tag this record!

Similar Items