VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator

Recently, visual tracking algorithms have achieved impressive results by combining dynamic templates. However, the instability of visual images and the incorrect timing of template updates lead to decreased tracking accuracy and stability in intricate scenarios. To address these issues, we propose a...

Full description

Saved in:
Bibliographic Details
Main Authors: Fuchao Yang, Mingkai Jiang, Qiaohong Hao, Xiaolei Zhao, Qinghe Feng
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Machine Learning with Applications
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666827024000641
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850103792438083584
author Fuchao Yang
Mingkai Jiang
Qiaohong Hao
Xiaolei Zhao
Qinghe Feng
author_facet Fuchao Yang
Mingkai Jiang
Qiaohong Hao
Xiaolei Zhao
Qinghe Feng
author_sort Fuchao Yang
collection DOAJ
description Recently, visual tracking algorithms have achieved impressive results by combining dynamic templates. However, the instability of visual images and the incorrect timing of template updates lead to decreased tracking accuracy and stability in intricate scenarios. To address these issues, we propose a visual tracking algorithm through visual language fusion and a state update evaluator (VLFSE). Specifically, our approach introduces a multimodal attention mechanism that uses self-attention to mine and integrate information from diverse sources effectively. This mechanism ensures a richer, context-aware representation of the target, enabling more accurate tracking even in complex scenes. Moreover, we recognize the critical need for precise template updates to maintain tracking accuracy over time. To this end, we develop a state update evaluator, a component trained online to assess the necessity and timing of template updates accurately. This evaluator acts as a safeguard, preventing erroneous updates and ensuring the tracker adapts optimally to changes in the target’s appearance. The experimental results on challenging visual language tracking datasets demonstrate our tracker’s superior performance, showcasing its adaptability and accuracy in complex tracking scenarios.
format Article
id doaj-art-148d6ddb51a740759e2d55de81929405
institution DOAJ
issn 2666-8270
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series Machine Learning with Applications
spelling doaj-art-148d6ddb51a740759e2d55de819294052025-08-20T02:39:28ZengElsevierMachine Learning with Applications2666-82702024-12-011810058810.1016/j.mlwa.2024.100588VLFSE: Enhancing visual tracking through visual language fusion and state update evaluatorFuchao Yang0Mingkai Jiang1Qiaohong Hao2Xiaolei Zhao3Qinghe Feng4School of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, ChinaSchool of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, ChinaSchool of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, ChinaSchool of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, China; China Radio Wave Propagation Research Institute, Qingdao, Shandong, ChinaSchool of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, China; Corresponding author.Recently, visual tracking algorithms have achieved impressive results by combining dynamic templates. However, the instability of visual images and the incorrect timing of template updates lead to decreased tracking accuracy and stability in intricate scenarios. To address these issues, we propose a visual tracking algorithm through visual language fusion and a state update evaluator (VLFSE). Specifically, our approach introduces a multimodal attention mechanism that uses self-attention to mine and integrate information from diverse sources effectively. This mechanism ensures a richer, context-aware representation of the target, enabling more accurate tracking even in complex scenes. Moreover, we recognize the critical need for precise template updates to maintain tracking accuracy over time. To this end, we develop a state update evaluator, a component trained online to assess the necessity and timing of template updates accurately. This evaluator acts as a safeguard, preventing erroneous updates and ensuring the tracker adapts optimally to changes in the target’s appearance. The experimental results on challenging visual language tracking datasets demonstrate our tracker’s superior performance, showcasing its adaptability and accuracy in complex tracking scenarios.http://www.sciencedirect.com/science/article/pii/S2666827024000641Transformer-based trackerVisual language trackingState update evaluator
spellingShingle Fuchao Yang
Mingkai Jiang
Qiaohong Hao
Xiaolei Zhao
Qinghe Feng
VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator
Machine Learning with Applications
Transformer-based tracker
Visual language tracking
State update evaluator
title VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator
title_full VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator
title_fullStr VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator
title_full_unstemmed VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator
title_short VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator
title_sort vlfse enhancing visual tracking through visual language fusion and state update evaluator
topic Transformer-based tracker
Visual language tracking
State update evaluator
url http://www.sciencedirect.com/science/article/pii/S2666827024000641
work_keys_str_mv AT fuchaoyang vlfseenhancingvisualtrackingthroughvisuallanguagefusionandstateupdateevaluator
AT mingkaijiang vlfseenhancingvisualtrackingthroughvisuallanguagefusionandstateupdateevaluator
AT qiaohonghao vlfseenhancingvisualtrackingthroughvisuallanguagefusionandstateupdateevaluator
AT xiaoleizhao vlfseenhancingvisualtrackingthroughvisuallanguagefusionandstateupdateevaluator
AT qinghefeng vlfseenhancingvisualtrackingthroughvisuallanguagefusionandstateupdateevaluator