Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach
Human Activity Recognition (HAR) has been employed in a number of applications including sports analytic, healthcare monitoring, surveillance, and human-computer interaction. Despite a decade of research on HAR, existing models still find it challenging under conditions like occlusion, computational...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10767709/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850260211303972864 |
|---|---|
| author | Aftab Ul Nabi Jinglun Shi Kamlesh Awais Khan Jumani Jameel Ahmed Bhutto |
| author_facet | Aftab Ul Nabi Jinglun Shi Kamlesh Awais Khan Jumani Jameel Ahmed Bhutto |
| author_sort | Aftab Ul Nabi |
| collection | DOAJ |
| description | Human Activity Recognition (HAR) has been employed in a number of applications including sports analytic, healthcare monitoring, surveillance, and human-computer interaction. Despite a decade of research on HAR, existing models still find it challenging under conditions like occlusion, computational efficiency, and capturing long-term temporal dependencies. To address these shortcomings, we present BiTransAct, a novel hybrid model that incorporates EfficientNet-B0 for spatial features extraction as well as Transformer Encoder to obtain the temporal relationships in video data. To evaluate the performance of our proposed model we have employed a video based dataset called SPHAR-Dataset-1.0. This dataset contains 7,759 videos with 14 diverse activities and 421,441 samples. From our experiments its established that BiTransAct consistently excels other deep learning based models like SWIN, EfficientNet, and RegNet in terms of both classification accuracy and precision. Its efficiency in handling large datasets without compromising on performance makes it stronger candidate for real-time HAR tasks. Furthermore, the features like self-attention mechanism and dynamic learning rate make BiTransAct even more robust and avoid overfitting. The results demonstrate that BiTransAct provides a scalable, efficient solution for HAR applications, with particular relevance for real-world scenarios such as video surveillance and healthcare monitoring. |
| format | Article |
| id | doaj-art-c7c93c8d4f87460a8869f087d1e3831f |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-c7c93c8d4f87460a8869f087d1e3831f2025-08-20T01:55:41ZengIEEEIEEE Access2169-35362024-01-011218451718452810.1109/ACCESS.2024.350659810767709Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct ApproachAftab Ul Nabi0https://orcid.org/0000-0002-5612-6265Jinglun Shi1https://orcid.org/0000-0003-3933-9274 Kamlesh2Awais Khan Jumani3Jameel Ahmed Bhutto4School of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, ChinaSchool of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, ChinaSchool of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, ChinaSchool of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, ChinaDepartment of Computer Science, Huanggang Normal University, Huanggang, ChinaHuman Activity Recognition (HAR) has been employed in a number of applications including sports analytic, healthcare monitoring, surveillance, and human-computer interaction. Despite a decade of research on HAR, existing models still find it challenging under conditions like occlusion, computational efficiency, and capturing long-term temporal dependencies. To address these shortcomings, we present BiTransAct, a novel hybrid model that incorporates EfficientNet-B0 for spatial features extraction as well as Transformer Encoder to obtain the temporal relationships in video data. To evaluate the performance of our proposed model we have employed a video based dataset called SPHAR-Dataset-1.0. This dataset contains 7,759 videos with 14 diverse activities and 421,441 samples. From our experiments its established that BiTransAct consistently excels other deep learning based models like SWIN, EfficientNet, and RegNet in terms of both classification accuracy and precision. Its efficiency in handling large datasets without compromising on performance makes it stronger candidate for real-time HAR tasks. Furthermore, the features like self-attention mechanism and dynamic learning rate make BiTransAct even more robust and avoid overfitting. The results demonstrate that BiTransAct provides a scalable, efficient solution for HAR applications, with particular relevance for real-world scenarios such as video surveillance and healthcare monitoring.https://ieeexplore.ieee.org/document/10767709/BiTransActEfficientNethuman activity recognition (HAR)RegNetSWIN transformer |
| spellingShingle | Aftab Ul Nabi Jinglun Shi Kamlesh Awais Khan Jumani Jameel Ahmed Bhutto Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach IEEE Access BiTransAct EfficientNet human activity recognition (HAR) RegNet SWIN transformer |
| title | Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach |
| title_full | Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach |
| title_fullStr | Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach |
| title_full_unstemmed | Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach |
| title_short | Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach |
| title_sort | hybrid transformer efficientnet model for robust human activity recognition the bitransact approach |
| topic | BiTransAct EfficientNet human activity recognition (HAR) RegNet SWIN transformer |
| url | https://ieeexplore.ieee.org/document/10767709/ |
| work_keys_str_mv | AT aftabulnabi hybridtransformerefficientnetmodelforrobusthumanactivityrecognitionthebitransactapproach AT jinglunshi hybridtransformerefficientnetmodelforrobusthumanactivityrecognitionthebitransactapproach AT kamlesh hybridtransformerefficientnetmodelforrobusthumanactivityrecognitionthebitransactapproach AT awaiskhanjumani hybridtransformerefficientnetmodelforrobusthumanactivityrecognitionthebitransactapproach AT jameelahmedbhutto hybridtransformerefficientnetmodelforrobusthumanactivityrecognitionthebitransactapproach |