Embedding-based pair generation for contrastive representation learning in audio-visual surveillance data
Smart cities deploy various sensors such as microphones and RGB cameras to collect data to improve the safety and comfort of the citizens. As data annotation is expensive, self-supervised methods such as contrastive learning are used to learn audio-visual representations for downstream tasks. Focusi...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Robotics and AI |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/frobt.2024.1490718/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|