Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach

Human Activity Recognition (HAR) has been employed in a number of applications including sports analytic, healthcare monitoring, surveillance, and human-computer interaction. Despite a decade of research on HAR, existing models still find it challenging under conditions like occlusion, computational...

Full description

Saved in:
Bibliographic Details
Main Authors: Aftab Ul Nabi, Jinglun Shi, Kamlesh, Awais Khan Jumani, Jameel Ahmed Bhutto
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10767709/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850260211303972864
author Aftab Ul Nabi
Jinglun Shi
Kamlesh
Awais Khan Jumani
Jameel Ahmed Bhutto
author_facet Aftab Ul Nabi
Jinglun Shi
Kamlesh
Awais Khan Jumani
Jameel Ahmed Bhutto
author_sort Aftab Ul Nabi
collection DOAJ
description Human Activity Recognition (HAR) has been employed in a number of applications including sports analytic, healthcare monitoring, surveillance, and human-computer interaction. Despite a decade of research on HAR, existing models still find it challenging under conditions like occlusion, computational efficiency, and capturing long-term temporal dependencies. To address these shortcomings, we present BiTransAct, a novel hybrid model that incorporates EfficientNet-B0 for spatial features extraction as well as Transformer Encoder to obtain the temporal relationships in video data. To evaluate the performance of our proposed model we have employed a video based dataset called SPHAR-Dataset-1.0. This dataset contains 7,759 videos with 14 diverse activities and 421,441 samples. From our experiments its established that BiTransAct consistently excels other deep learning based models like SWIN, EfficientNet, and RegNet in terms of both classification accuracy and precision. Its efficiency in handling large datasets without compromising on performance makes it stronger candidate for real-time HAR tasks. Furthermore, the features like self-attention mechanism and dynamic learning rate make BiTransAct even more robust and avoid overfitting. The results demonstrate that BiTransAct provides a scalable, efficient solution for HAR applications, with particular relevance for real-world scenarios such as video surveillance and healthcare monitoring.
format Article
id doaj-art-c7c93c8d4f87460a8869f087d1e3831f
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c7c93c8d4f87460a8869f087d1e3831f2025-08-20T01:55:41ZengIEEEIEEE Access2169-35362024-01-011218451718452810.1109/ACCESS.2024.350659810767709Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct ApproachAftab Ul Nabi0https://orcid.org/0000-0002-5612-6265Jinglun Shi1https://orcid.org/0000-0003-3933-9274 Kamlesh2Awais Khan Jumani3Jameel Ahmed Bhutto4School of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, ChinaSchool of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, ChinaSchool of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, ChinaSchool of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, ChinaDepartment of Computer Science, Huanggang Normal University, Huanggang, ChinaHuman Activity Recognition (HAR) has been employed in a number of applications including sports analytic, healthcare monitoring, surveillance, and human-computer interaction. Despite a decade of research on HAR, existing models still find it challenging under conditions like occlusion, computational efficiency, and capturing long-term temporal dependencies. To address these shortcomings, we present BiTransAct, a novel hybrid model that incorporates EfficientNet-B0 for spatial features extraction as well as Transformer Encoder to obtain the temporal relationships in video data. To evaluate the performance of our proposed model we have employed a video based dataset called SPHAR-Dataset-1.0. This dataset contains 7,759 videos with 14 diverse activities and 421,441 samples. From our experiments its established that BiTransAct consistently excels other deep learning based models like SWIN, EfficientNet, and RegNet in terms of both classification accuracy and precision. Its efficiency in handling large datasets without compromising on performance makes it stronger candidate for real-time HAR tasks. Furthermore, the features like self-attention mechanism and dynamic learning rate make BiTransAct even more robust and avoid overfitting. The results demonstrate that BiTransAct provides a scalable, efficient solution for HAR applications, with particular relevance for real-world scenarios such as video surveillance and healthcare monitoring.https://ieeexplore.ieee.org/document/10767709/BiTransActEfficientNethuman activity recognition (HAR)RegNetSWIN transformer
spellingShingle Aftab Ul Nabi
Jinglun Shi
Kamlesh
Awais Khan Jumani
Jameel Ahmed Bhutto
Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach
IEEE Access
BiTransAct
EfficientNet
human activity recognition (HAR)
RegNet
SWIN transformer
title Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach
title_full Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach
title_fullStr Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach
title_full_unstemmed Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach
title_short Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach
title_sort hybrid transformer efficientnet model for robust human activity recognition the bitransact approach
topic BiTransAct
EfficientNet
human activity recognition (HAR)
RegNet
SWIN transformer
url https://ieeexplore.ieee.org/document/10767709/
work_keys_str_mv AT aftabulnabi hybridtransformerefficientnetmodelforrobusthumanactivityrecognitionthebitransactapproach
AT jinglunshi hybridtransformerefficientnetmodelforrobusthumanactivityrecognitionthebitransactapproach
AT kamlesh hybridtransformerefficientnetmodelforrobusthumanactivityrecognitionthebitransactapproach
AT awaiskhanjumani hybridtransformerefficientnetmodelforrobusthumanactivityrecognitionthebitransactapproach
AT jameelahmedbhutto hybridtransformerefficientnetmodelforrobusthumanactivityrecognitionthebitransactapproach