VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator
Recently, visual tracking algorithms have achieved impressive results by combining dynamic templates. However, the instability of visual images and the incorrect timing of template updates lead to decreased tracking accuracy and stability in intricate scenarios. To address these issues, we propose a...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2024-12-01
|
| Series: | Machine Learning with Applications |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2666827024000641 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850103792438083584 |
|---|---|
| author | Fuchao Yang Mingkai Jiang Qiaohong Hao Xiaolei Zhao Qinghe Feng |
| author_facet | Fuchao Yang Mingkai Jiang Qiaohong Hao Xiaolei Zhao Qinghe Feng |
| author_sort | Fuchao Yang |
| collection | DOAJ |
| description | Recently, visual tracking algorithms have achieved impressive results by combining dynamic templates. However, the instability of visual images and the incorrect timing of template updates lead to decreased tracking accuracy and stability in intricate scenarios. To address these issues, we propose a visual tracking algorithm through visual language fusion and a state update evaluator (VLFSE). Specifically, our approach introduces a multimodal attention mechanism that uses self-attention to mine and integrate information from diverse sources effectively. This mechanism ensures a richer, context-aware representation of the target, enabling more accurate tracking even in complex scenes. Moreover, we recognize the critical need for precise template updates to maintain tracking accuracy over time. To this end, we develop a state update evaluator, a component trained online to assess the necessity and timing of template updates accurately. This evaluator acts as a safeguard, preventing erroneous updates and ensuring the tracker adapts optimally to changes in the target’s appearance. The experimental results on challenging visual language tracking datasets demonstrate our tracker’s superior performance, showcasing its adaptability and accuracy in complex tracking scenarios. |
| format | Article |
| id | doaj-art-148d6ddb51a740759e2d55de81929405 |
| institution | DOAJ |
| issn | 2666-8270 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Machine Learning with Applications |
| spelling | doaj-art-148d6ddb51a740759e2d55de819294052025-08-20T02:39:28ZengElsevierMachine Learning with Applications2666-82702024-12-011810058810.1016/j.mlwa.2024.100588VLFSE: Enhancing visual tracking through visual language fusion and state update evaluatorFuchao Yang0Mingkai Jiang1Qiaohong Hao2Xiaolei Zhao3Qinghe Feng4School of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, ChinaSchool of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, ChinaSchool of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, ChinaSchool of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, China; China Radio Wave Propagation Research Institute, Qingdao, Shandong, ChinaSchool of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, China; Corresponding author.Recently, visual tracking algorithms have achieved impressive results by combining dynamic templates. However, the instability of visual images and the incorrect timing of template updates lead to decreased tracking accuracy and stability in intricate scenarios. To address these issues, we propose a visual tracking algorithm through visual language fusion and a state update evaluator (VLFSE). Specifically, our approach introduces a multimodal attention mechanism that uses self-attention to mine and integrate information from diverse sources effectively. This mechanism ensures a richer, context-aware representation of the target, enabling more accurate tracking even in complex scenes. Moreover, we recognize the critical need for precise template updates to maintain tracking accuracy over time. To this end, we develop a state update evaluator, a component trained online to assess the necessity and timing of template updates accurately. This evaluator acts as a safeguard, preventing erroneous updates and ensuring the tracker adapts optimally to changes in the target’s appearance. The experimental results on challenging visual language tracking datasets demonstrate our tracker’s superior performance, showcasing its adaptability and accuracy in complex tracking scenarios.http://www.sciencedirect.com/science/article/pii/S2666827024000641Transformer-based trackerVisual language trackingState update evaluator |
| spellingShingle | Fuchao Yang Mingkai Jiang Qiaohong Hao Xiaolei Zhao Qinghe Feng VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator Machine Learning with Applications Transformer-based tracker Visual language tracking State update evaluator |
| title | VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator |
| title_full | VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator |
| title_fullStr | VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator |
| title_full_unstemmed | VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator |
| title_short | VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator |
| title_sort | vlfse enhancing visual tracking through visual language fusion and state update evaluator |
| topic | Transformer-based tracker Visual language tracking State update evaluator |
| url | http://www.sciencedirect.com/science/article/pii/S2666827024000641 |
| work_keys_str_mv | AT fuchaoyang vlfseenhancingvisualtrackingthroughvisuallanguagefusionandstateupdateevaluator AT mingkaijiang vlfseenhancingvisualtrackingthroughvisuallanguagefusionandstateupdateevaluator AT qiaohonghao vlfseenhancingvisualtrackingthroughvisuallanguagefusionandstateupdateevaluator AT xiaoleizhao vlfseenhancingvisualtrackingthroughvisuallanguagefusionandstateupdateevaluator AT qinghefeng vlfseenhancingvisualtrackingthroughvisuallanguagefusionandstateupdateevaluator |