Embedding-based pair generation for contrastive representation learning in audio-visual surveillance data

Smart cities deploy various sensors such as microphones and RGB cameras to collect data to improve the safety and comfort of the citizens. As data annotation is expensive, self-supervised methods such as contrastive learning are used to learn audio-visual representations for downstream tasks. Focusi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wei-Cheng Wang, Sander De Coninck, Sam Leroux, Pieter Simoens
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-01-01
Series:	Frontiers in Robotics and AI
Subjects:	self-supervised learning surveillance audio-visual representation learning contrastive learning audio-visual event localization anomaly detection
Online Access:	https://www.frontiersin.org/articles/10.3389/frobt.2024.1490718/full
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://www.frontiersin.org/articles/10.3389/frobt.2024.1490718/full

Embedding-based pair generation for contrastive representation learning in audio-visual surveillance data

Internet

Similar Items