Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition

The Transformer model has received significant attention in Human Activity Recognition (HAR) due to its self-attention mechanism that captures long dependencies in time series. However, for Inertial Measurement Unit (IMU) sensor time-series signals, the Transformer model does not effectively utilize...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xin Guo, Young Kim, Xueli Ning, Se Dong Min
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Sensors
Subjects:	human activity recognition inertial measurement units (IMUs) transformer model relative position embedding convolutional neural networks (CNNs) time series signal
Online Access:	https://www.mdpi.com/1424-8220/25/2/301
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832587573252325376
author	Xin Guo Young Kim Xueli Ning Se Dong Min
author_facet	Xin Guo Young Kim Xueli Ning Se Dong Min
author_sort	Xin Guo
collection	DOAJ
description	The Transformer model has received significant attention in Human Activity Recognition (HAR) due to its self-attention mechanism that captures long dependencies in time series. However, for Inertial Measurement Unit (IMU) sensor time-series signals, the Transformer model does not effectively utilize the a priori information of strong complex temporal correlations. Therefore, we proposed using multi-layer convolutional layers as a Convolutional Feature Extractor Block (CFEB). CFEB enables the Transformer model to leverage both local and global time series features for activity classification. Meanwhile, the absolute position embedding (APE) in existing Transformer models cannot accurately represent the distance relationship between individuals at different time points. To further explore positional correlations in temporal signals, this paper introduces the Vector-based Relative Position Embedding (vRPE), aiming to provide more relative temporal position information within sensor signals for the Transformer model. Combining these innovations, we conduct extensive experiments on three HAR benchmark datasets: KU-HAR, UniMiB SHAR, and USC-HAD. Experimental results demonstrate that our proposed enhancement scheme substantially elevates the performance of the Transformer model in HAR.
format	Article
id	doaj-art-64a8adb1dbce44e78b33e6294d942ecc
institution	Kabale University
issn	1424-8220
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj-art-64a8adb1dbce44e78b33e6294d942ecc2025-01-24T13:48:25ZengMDPI AGSensors1424-82202025-01-0125230110.3390/s25020301Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity RecognitionXin Guo0Young Kim1Xueli Ning2Se Dong Min3Department of Software Convergence, Soonchunhyang University, Asan 31538, Republic of KoreaDepartment of Software Convergence, Soonchunhyang University, Asan 31538, Republic of KoreaDepartment of Software Convergence, Soonchunhyang University, Asan 31538, Republic of KoreaDepartment of Software Convergence, Soonchunhyang University, Asan 31538, Republic of KoreaThe Transformer model has received significant attention in Human Activity Recognition (HAR) due to its self-attention mechanism that captures long dependencies in time series. However, for Inertial Measurement Unit (IMU) sensor time-series signals, the Transformer model does not effectively utilize the a priori information of strong complex temporal correlations. Therefore, we proposed using multi-layer convolutional layers as a Convolutional Feature Extractor Block (CFEB). CFEB enables the Transformer model to leverage both local and global time series features for activity classification. Meanwhile, the absolute position embedding (APE) in existing Transformer models cannot accurately represent the distance relationship between individuals at different time points. To further explore positional correlations in temporal signals, this paper introduces the Vector-based Relative Position Embedding (vRPE), aiming to provide more relative temporal position information within sensor signals for the Transformer model. Combining these innovations, we conduct extensive experiments on three HAR benchmark datasets: KU-HAR, UniMiB SHAR, and USC-HAD. Experimental results demonstrate that our proposed enhancement scheme substantially elevates the performance of the Transformer model in HAR.https://www.mdpi.com/1424-8220/25/2/301human activity recognitioninertial measurement units (IMUs)transformer modelrelative position embeddingconvolutional neural networks (CNNs)time series signal
spellingShingle	Xin Guo Young Kim Xueli Ning Se Dong Min Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition Sensors human activity recognition inertial measurement units (IMUs) transformer model relative position embedding convolutional neural networks (CNNs) time series signal
title	Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition
title_full	Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition
title_fullStr	Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition
title_full_unstemmed	Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition
title_short	Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition
title_sort	enhancing the transformer model with a convolutional feature extractor block and vector based relative position embedding for human activity recognition
topic	human activity recognition inertial measurement units (IMUs) transformer model relative position embedding convolutional neural networks (CNNs) time series signal
url	https://www.mdpi.com/1424-8220/25/2/301
work_keys_str_mv	AT xinguo enhancingthetransformermodelwithaconvolutionalfeatureextractorblockandvectorbasedrelativepositionembeddingforhumanactivityrecognition AT youngkim enhancingthetransformermodelwithaconvolutionalfeatureextractorblockandvectorbasedrelativepositionembeddingforhumanactivityrecognition AT xuelining enhancingthetransformermodelwithaconvolutionalfeatureextractorblockandvectorbasedrelativepositionembeddingforhumanactivityrecognition AT sedongmin enhancingthetransformermodelwithaconvolutionalfeatureextractorblockandvectorbasedrelativepositionembeddingforhumanactivityrecognition

Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition

Similar Items