Audio-Language Datasets of Scenes and Events: A Survey

Audio-Language Datasets of Scenes and Events: A Survey

Audio-language models (ALMs) generate linguistic descriptions of sound-producing events and scenes. Advances in dataset creation and computational power have led to significant progress in this domain. This paper surveys 69 datasets used to train ALMs, covering research up to September 2024 (<uri...

Full description

Saved in:

Bibliographic Details
Main Authors:	Gijs Wijngaard, Elia Formisano, Michele Esposito, Michel Dumontier
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Audio-to-language learning language-to-audio learning audio-language datasets review
Online Access:	https://ieeexplore.ieee.org/document/10854210/
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Novel Audio Copy Move Forgery Detection Method With Classification of Graph-Based Representations
by: Beste Ustubioglu, et al.
Published: (2025-01-01)

Deep convolutional neural networks for double compressed AMR audio detection
by: Aykut Büker, et al.
Published: (2021-06-01)

Audiogmenter: a MATLAB toolbox for audio data augmentation
by: Gianluca Maguolo, et al.
Published: (2025-01-01)

PENGGUNAAN MEDIA AUDIO VISUAL PADA MATA PELAJARAN PENDIDIKAN AGAMA ISLAM UNTUK MENINGKATKAN AKTIVITAS BELAJAR SISWA KELAS V SD N 09 PALEMBANG
by: Ibrahim Ibrahim, et al.
Published: (2024-01-01)

Peningkatan Kedisiplinan Siswa Sekolah Dasar Melalui Pemanfaatan Media Audio Visual
by: Siti Diyah Rachmatika, et al.
Published: (2024-09-01)

Advancements in End-to-End Audio Style Transformation: A Differentiable Approach for Voice Conversion and Musical Style Transfer
by: Shashwat Aggarwal, et al.
Published: (2025-01-01)

Audio-visual event localization with dual temporal-aware scene understanding and image-text knowledge bridging
by: Pufen Zhang, et al.
Published: (2024-11-01)

Live and mediated user engagements: A comparative dataset from two Bengali audio-story based youtube channelsMendeley Data
by: Mohammad Harun Or Rashid, et al.
Published: (2025-02-01)

LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
by: Charilaos Papaioannou, et al.
Published: (2025-01-01)

The Preferred User: How Audio Description could Change Understandings of Australian Television Audiences and Media Technology
by: Ellis Katie, et al.
Published: (2018-07-01)

Authenticity at Risk: Key Factors in the Generation and Detection of Audio Deepfakes
by: Alba Martínez-Serrano, et al.
Published: (2025-01-01)

Audio classification using grasshopper‐ride optimization algorithm‐based support vector machine
by: Suryabhan Pratap Singh, et al.
Published: (2021-08-01)

A Survey on Machine Learning Techniques for Head-Related Transfer Function Individualization
by: Davide Fantini, et al.
Published: (2025-01-01)

Learning through audio-visual aids: how does it work for students to delve into the English vowels?
by: Zikril Mulia
Published: (2022-11-01)

Apk2Audio4AndMal: Audio Based Malware Family Detection Framework
by: Oguz Emre Kural, et al.
Published: (2023-01-01)

Digital technologies in music education. Using Digital Audio Workstations (DAW) with Project-Based Learning (PBL)
by: María Elena Cuenca-Rodríguez, et al.
Published: (2025-01-01)

Comparison of Distraction Techniques using Salivary Biomarkers during Local Anaesthesia Administration in Children Aged 3–5 Years: A Clinical Study
by: Yanina Singh, et al.
Published: (2023-04-01)

Video or audio listening tests for English language teaching context: which is more effective for classroom use?
by: Clara Herlina Karjo, et al.
Published: (2022-02-01)

Modern English Language Didactics: Aspects of Dialectics and Analisis
by: E. V. Yakovleva, et al.
Published: (2013-02-01)

Pengaruh Bimbingan Kelompok dengan Media Audio Visual terhadap Motivasi Belajar Siswa di MTs Negeri 4 Jakarta
by: Rianka Anindya Rahmadhita, et al.
Published: (2023-07-01)

As mentiras do eu: procedimentos, gêneros e atores do discurso desinformativo em primeira pessoa
by: Paolo Demuru, et al.
Published: (2022-11-01)

Cross-modal matching of monosyllabic and bisyllabic items varying in phonotactic probability and lexicality
by: Kauyumari Sanchez
Published: (2025-02-01)

Spatial audio signal processing for augmented telepresence applications
by: Thomas Deppisch
Published: (2025-03-01)

Dual-Channel Deepfake Audio Detection: Leveraging Direct and Reverberant Waveforms
by: Gunwoo Lee, et al.
Published: (2025-01-01)

From Book to Playlist: How Open-Access Audio Archives are Renewing the Poetry Collection
by: Abigail Lang
Published: (2022-12-01)

An IoT-enhanced automatic music composition system integrating audio-visual learning with transformer and SketchVAE
by: Yifei Zhang
Published: (2025-02-01)

The SPN Network for Digital Audio Data Based on Elliptic Curve Over a Finite Field
by: Ijaz Khalid, et al.
Published: (2022-01-01)

AzSLD: Azerbaijani sign language dataset for fingerspelling, word, and sentence translation with baseline softwareZenodo
by: Nigar Alishzade, et al.
Published: (2025-02-01)

The Role of the Teacher in the Foreign Language Classroom – Past, Recent and Modern Developments
by: Thomas Tinnefeld
Published: (2021-06-01)

Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation
by: Soyul Han, et al.
Published: (2025-03-01)

Exploring New Dimensions of Archives: Finding Audiovisually Similar Programmes with the Help of Neural Networks
by: Sara Veldhoen, et al.
Published: (2024-12-01)

Comparison of Machine Learning Algorithms on Classification of Covid-19 Cough Sounds Using MFCC Extraction
by: Mohammad Reza Faisal, et al.
Published: (2023-12-01)

De la recherche scientifique au documentaire sur la migration, des mémoires en mouvement
by: Eva Léger
Published: (2021-01-01)

Mobile android application development for anxiety and pain management: Usability and validity testing of audio hypno-spiritual therapy in ICU settings
by: Purnawan Iwan, et al.
Published: (2025-01-01)

Elephant Sound Classification Using Deep Learning Optimization
by: Hiruni Dewmini, et al.
Published: (2025-01-01)

An Automatic Approach for the Identification of Offensive Language in Perso-Arabic Urdu Language: Dataset Creation and Evaluation
by: Salah Ud Din, et al.
Published: (2025-01-01)

Binaural auditory beats vs music of choice as audio distraction behaviour guidance technique among children: A randomized controlled trial
by: Bhuvanesh N. Bhusari, et al.
Published: (2025-01-01)

Applying Fourier Neural Operator to insect wingbeat sound classification: Introducing CF-ResNet-1D
by: Béla J. Szekeres, et al.
Published: (2025-05-01)

Webinar as a Form of E-Learning in Higher Education
by: S. D. Kalinina
Published: (2015-04-01)

The Cinematic Visions and Dreams of Edgard Varèse
by: Gabriele
Published: (2025-02-01)