Brain-inspired multimodal motion and fine-grained action recognition
IntroductionTraditional action recognition methods predominantly rely on a single modality, such as vision or motion, which presents significant limitations when dealing with fine-grained action recognition. These methods struggle particularly with video data containing complex combinations of actio...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Neurorobotics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fnbot.2024.1502071/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832589828489740288 |
---|---|
author | Yuening Li Xiuhua Yang Changkui Chen |
author_facet | Yuening Li Xiuhua Yang Changkui Chen |
author_sort | Yuening Li |
collection | DOAJ |
description | IntroductionTraditional action recognition methods predominantly rely on a single modality, such as vision or motion, which presents significant limitations when dealing with fine-grained action recognition. These methods struggle particularly with video data containing complex combinations of actions and subtle motion variations.MethodsTypically, they depend on handcrafted feature extractors or simple convolutional neural network (CNN) architectures, which makes effective multimodal fusion challenging. This study introduces a novel architecture called FGM-CLIP (Fine-Grained Motion CLIP) to enhance fine-grained action recognition. FGM-CLIP leverages the powerful capabilities of Contrastive Language-Image Pretraining (CLIP), integrating a fine-grained motion encoder and a multimodal fusion layer to achieve precise end-to-end action recognition. By jointly optimizing visual and motion features, the model captures subtle action variations, resulting in higher classification accuracy in complex video data.Results and discussionExperimental results demonstrate that FGM-CLIP significantly outperforms existing methods on multiple fine-grained action recognition datasets. Its multimodal fusion strategy notably improves the model's robustness and accuracy, particularly for videos with intricate action patterns. |
format | Article |
id | doaj-art-36661640b19742d5868c68f3b2c4c0d7 |
institution | Kabale University |
issn | 1662-5218 |
language | English |
publishDate | 2025-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Neurorobotics |
spelling | doaj-art-36661640b19742d5868c68f3b2c4c0d72025-01-24T07:13:57ZengFrontiers Media S.A.Frontiers in Neurorobotics1662-52182025-01-011810.3389/fnbot.2024.15020711502071Brain-inspired multimodal motion and fine-grained action recognitionYuening Li0Xiuhua Yang1Changkui Chen2Wuhan Sports University, Wuhan, ChinaWuhan Sports University, Wuhan, ChinaSchool of Physical Education and Training, Party School of Shandong Provincial Committee of the Communist Party of China (Shandong Administrative Institute), Jinan, Shandong, ChinaIntroductionTraditional action recognition methods predominantly rely on a single modality, such as vision or motion, which presents significant limitations when dealing with fine-grained action recognition. These methods struggle particularly with video data containing complex combinations of actions and subtle motion variations.MethodsTypically, they depend on handcrafted feature extractors or simple convolutional neural network (CNN) architectures, which makes effective multimodal fusion challenging. This study introduces a novel architecture called FGM-CLIP (Fine-Grained Motion CLIP) to enhance fine-grained action recognition. FGM-CLIP leverages the powerful capabilities of Contrastive Language-Image Pretraining (CLIP), integrating a fine-grained motion encoder and a multimodal fusion layer to achieve precise end-to-end action recognition. By jointly optimizing visual and motion features, the model captures subtle action variations, resulting in higher classification accuracy in complex video data.Results and discussionExperimental results demonstrate that FGM-CLIP significantly outperforms existing methods on multiple fine-grained action recognition datasets. Its multimodal fusion strategy notably improves the model's robustness and accuracy, particularly for videos with intricate action patterns.https://www.frontiersin.org/articles/10.3389/fnbot.2024.1502071/fullbrain-inspiredmultimodalaction recognitionCLIPclustering algorithms |
spellingShingle | Yuening Li Xiuhua Yang Changkui Chen Brain-inspired multimodal motion and fine-grained action recognition Frontiers in Neurorobotics brain-inspired multimodal action recognition CLIP clustering algorithms |
title | Brain-inspired multimodal motion and fine-grained action recognition |
title_full | Brain-inspired multimodal motion and fine-grained action recognition |
title_fullStr | Brain-inspired multimodal motion and fine-grained action recognition |
title_full_unstemmed | Brain-inspired multimodal motion and fine-grained action recognition |
title_short | Brain-inspired multimodal motion and fine-grained action recognition |
title_sort | brain inspired multimodal motion and fine grained action recognition |
topic | brain-inspired multimodal action recognition CLIP clustering algorithms |
url | https://www.frontiersin.org/articles/10.3389/fnbot.2024.1502071/full |
work_keys_str_mv | AT yueningli braininspiredmultimodalmotionandfinegrainedactionrecognition AT xiuhuayang braininspiredmultimodalmotionandfinegrainedactionrecognition AT changkuichen braininspiredmultimodalmotionandfinegrainedactionrecognition |