A Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance Segmentation

Video instance segmentation, a key technology for intelligent sensing in visual perception, plays a key role in automated surveillance, robotics, and smart cities. These scenarios rely on real-time and efficient target-tracking capabilities for accurate perception and intelligent analysis of dynamic...

Full description

Saved in:
Bibliographic Details
Main Authors: Mingzhu Liu, Wei Zhang, Haoran Wei
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/2/459
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832587535548678144
author Mingzhu Liu
Wei Zhang
Haoran Wei
author_facet Mingzhu Liu
Wei Zhang
Haoran Wei
author_sort Mingzhu Liu
collection DOAJ
description Video instance segmentation, a key technology for intelligent sensing in visual perception, plays a key role in automated surveillance, robotics, and smart cities. These scenarios rely on real-time and efficient target-tracking capabilities for accurate perception and intelligent analysis of dynamic environments. However, traditional video instance segmentation methods face complex models, high computational overheads, and slow segmentation speeds in time-series feature extraction, especially in resource-constrained environments. To address these challenges, a Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance Segmentation (DCFA-LVIS) is proposed in this paper. In feature extraction, a DCEResNet backbone network structure based on a dual-channel feature enhancement mechanism is designed to improve the model’s accuracy by enhancing the feature extraction and representation capabilities. In instance tracking, a dual-frequency perceptual enhancement network structure is constructed, which uses an independent instance query mechanism to capture temporal information and combines with a frequency-aware attention mechanism to capture instance features on different attention layers of high and low frequencies, respectively, to effectively reduce the complexity of the model, decrease the number of parameters, and improve the segmentation efficiency. Experiments show that the model proposed in this paper achieves state-of-the-art segmentation performance with few parameters on the YouTube-VIS dataset, demonstrating its efficiency and practicality. This method significantly enhances the application efficiency and adaptability of visual perception intelligent sensing technology in video data acquisition and processing, providing strong support for its widespread deployment.
format Article
id doaj-art-ece152747009491a941d9fcc75f763e5
institution Kabale University
issn 1424-8220
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-ece152747009491a941d9fcc75f763e52025-01-24T13:49:00ZengMDPI AGSensors1424-82202025-01-0125245910.3390/s25020459A Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance SegmentationMingzhu Liu0Wei Zhang1Haoran Wei2The Higher Educational Key Laboratory for Measuring & Control Technology and Instrumentation of Heilongjiang Province, Harbin University of Science and Technology, Harbin 150080, ChinaThe Higher Educational Key Laboratory for Measuring & Control Technology and Instrumentation of Heilongjiang Province, Harbin University of Science and Technology, Harbin 150080, ChinaThe Higher Educational Key Laboratory for Measuring & Control Technology and Instrumentation of Heilongjiang Province, Harbin University of Science and Technology, Harbin 150080, ChinaVideo instance segmentation, a key technology for intelligent sensing in visual perception, plays a key role in automated surveillance, robotics, and smart cities. These scenarios rely on real-time and efficient target-tracking capabilities for accurate perception and intelligent analysis of dynamic environments. However, traditional video instance segmentation methods face complex models, high computational overheads, and slow segmentation speeds in time-series feature extraction, especially in resource-constrained environments. To address these challenges, a Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance Segmentation (DCFA-LVIS) is proposed in this paper. In feature extraction, a DCEResNet backbone network structure based on a dual-channel feature enhancement mechanism is designed to improve the model’s accuracy by enhancing the feature extraction and representation capabilities. In instance tracking, a dual-frequency perceptual enhancement network structure is constructed, which uses an independent instance query mechanism to capture temporal information and combines with a frequency-aware attention mechanism to capture instance features on different attention layers of high and low frequencies, respectively, to effectively reduce the complexity of the model, decrease the number of parameters, and improve the segmentation efficiency. Experiments show that the model proposed in this paper achieves state-of-the-art segmentation performance with few parameters on the YouTube-VIS dataset, demonstrating its efficiency and practicality. This method significantly enhances the application efficiency and adaptability of visual perception intelligent sensing technology in video data acquisition and processing, providing strong support for its widespread deployment.https://www.mdpi.com/1424-8220/25/2/459video understandingvideo transformervisual perception intelligent sensingvideo instance segmentationlightweight
spellingShingle Mingzhu Liu
Wei Zhang
Haoran Wei
A Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance Segmentation
Sensors
video understanding
video transformer
visual perception intelligent sensing
video instance segmentation
lightweight
title A Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance Segmentation
title_full A Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance Segmentation
title_fullStr A Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance Segmentation
title_full_unstemmed A Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance Segmentation
title_short A Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance Segmentation
title_sort dual channel and frequency aware approach for lightweight video instance segmentation
topic video understanding
video transformer
visual perception intelligent sensing
video instance segmentation
lightweight
url https://www.mdpi.com/1424-8220/25/2/459
work_keys_str_mv AT mingzhuliu adualchannelandfrequencyawareapproachforlightweightvideoinstancesegmentation
AT weizhang adualchannelandfrequencyawareapproachforlightweightvideoinstancesegmentation
AT haoranwei adualchannelandfrequencyawareapproachforlightweightvideoinstancesegmentation
AT mingzhuliu dualchannelandfrequencyawareapproachforlightweightvideoinstancesegmentation
AT weizhang dualchannelandfrequencyawareapproachforlightweightvideoinstancesegmentation
AT haoranwei dualchannelandfrequencyawareapproachforlightweightvideoinstancesegmentation