STFormer: Spatio‐temporal former for hand–object interaction recognition from egocentric RGB video

STFormer: Spatio‐temporal former for hand–object interaction recognition from egocentric RGB video

Abstract In recent years, video‐based hand–object interaction has received widespread attention from researchers. However, due to the complexity and occlusion of hand movements, hand–object interaction recognition based on RGB videos remains a highly challenging task. Here, an end‐to‐end spatio‐temp...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jiao Liang, Xihan Wang, Jiayi Yang, Quanli Gao
Format:	Article
Language:	English
Published:	Wiley 2024-09-01
Series:	Electronics Letters
Subjects:	computer vision image classification pose estimation
Online Access:	https://doi.org/10.1049/ell2.70010
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Unified Framework for Recognizing Dynamic Hand Actions and Estimating Hand Pose from First-Person RGB Videos
by: Jiayi Yang, et al.
Published: (2025-06-01)

Benchmarking 2D Egocentric Hand Pose Datasets
by: Olga Taran, et al.
Published: (2025-01-01)

Visibility Aware In-Hand Object Pose Tracking in Videos With Transformers
by: Phan Xuan Tan, et al.
Published: (2025-01-01)

Learning spatio-temporal context for basketball action pose estimation with a multi-stream network
by: Zhihao Zhang, et al.
Published: (2025-08-01)

Empowering Efficient Spatio-Temporal Learning with a 3D CNN for Pose-Based Action Recognition
by: Ziliang Ren, et al.
Published: (2024-11-01)

Robust Human Pose Estimation Method for Body-to-Body Occlusion Using RGB-D Fusion Neural Network
by: Jae-hyuk Yoon, et al.
Published: (2025-08-01)

Improving Hand Pose Recognition Using Localization and Zoom Normalizations over MediaPipe Landmarks
by: Miguel Ángel Remiro, et al.
Published: (2023-11-01)

Behaviour recognition of housed sheep based on spatio-temporal information
by: Lina Zhang, et al.
Published: (2024-12-01)

SPPNet: Single-Person Human Parsing and Pose Estimation in RGB Videos
by: Aditi Verma, et al.
Published: (2025-03-01)

Accuracy Evaluation of 3D Pose Reconstruction Algorithms Through Stereo Camera Information Fusion for Physical Exercises with MediaPipe Pose
by: Sebastian Dill, et al.
Published: (2024-12-01)

Advances in Skeleton-Based Fall Detection in RGB Videos: From Handcrafted to Deep Learning Approaches
by: Van-Ha Hoang, et al.
Published: (2023-01-01)

Fall recognition using a three stream spatio temporal GCN model with adaptive feature aggregation
by: Jungpil Shin, et al.
Published: (2025-03-01)

Gradient boosting regression for faster Partitioned Iterated Function Systems‐based head pose estimation
by: Paola Barra, et al.
Published: (2022-07-01)

PyBodyTrack: A python library for multi-algorithm motion quantification and tracking in videos
by: Angel Ruiz-Zafra, et al.
Published: (2025-09-01)

An Improved Pose Estimation Method Based on Projection Vector With Noise Error Uncertainty
by: Jiashan Cui, et al.
Published: (2019-01-01)

Moving Toward Automated Construction Management: An Automated Construction Worker Efficiency Evaluation System
by: Chaojun Zhang, et al.
Published: (2025-07-01)

Action Recognition via Multi-View Perception Feature Tracking for Human–Robot Interaction
by: Chaitanya Bandi, et al.
Published: (2025-04-01)

Efficient Real-Time Sports Action Pose Estimation via EfficientPose and Temporal Graph Convolution
by: Yuanzhe Ma, et al.
Published: (2025-01-01)

Depth-Guided Monocular Object Pose Estimation for Warehouse Automation
by: Phan Xuan Tan, et al.
Published: (2025-01-01)

Towards Intelligent Assessment in Personalized Physiotherapy with Computer Vision
by: Victor García, et al.
Published: (2025-05-01)

STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting Transformer-Based AI Video Models
by: Zerui Wang, et al.
Published: (2025-01-01)

Prior-free 3D human pose estimation in a video using limb-vectors
by: Anam Memon, et al.
Published: (2024-12-01)

Detecting Activities of Daily Living in Egocentric Video to Contextualize Hand Use at Home in Outpatient Neurorehabilitation Settings
by: Adesh Kadambi, et al.
Published: (2025-01-01)

Assembly Error Measurement Method of Gear Shafting based on Stereo Vision
by: Menghao Chen, et al.
Published: (2021-05-01)

Toward a Recognition System for Mexican Sign Language: Arm Movement Detection
by: Gabriela Hilario-Acuapan, et al.
Published: (2025-06-01)

Leveraging modality‐specific and shared features for RGB‐T salient object detection
by: Shuo Wang, et al.
Published: (2024-12-01)

Automated Video Assistant Referee in Lead Climbing
by: Eliane Künzler, et al.
Published: (2025-01-01)

EPLC-Pose: A Lightweight Student Posture Recognition Network Under Panoramic Classroom
by: Yanhong Ji, et al.
Published: (2025-01-01)

Computer Vision-Based Drowsiness Detection Using Handcrafted Feature Extraction for Edge Computing Devices
by: Valerius Owen, et al.
Published: (2025-01-01)

VR interactive input system based on INS and binocular vision fusion
by: Hongxia Zhao, et al.
Published: (2024-12-01)

An Approach using Skeleton-based Representations and Neural Networks for Yoga Pose Recognition
by: Nguyen Hai Thanh, et al.
Published: (2025-01-01)

ICP Enhancement Algorithm for 6D Pose Tracking of Household Objects
by: Hyunho Hwang, et al.
Published: (2025-01-01)

Exploring senior high school student's abilities in mathematical problem posing
by: Muhtarom Muhtarom, et al.
Published: (2020-02-01)

Human action recognition network containing hands based on NPoseC3D59
by: Rui Li, et al.
Published: (2025-07-01)

Single-Handed Gesture Recognition with RGB Camera for Drone Motion Control
by: Guhnoo Yun, et al.
Published: (2024-11-01)

Contactless Infant Height Measurement for Enhanced Early Detection of Stunting Using Computer Vision Techniques
by: Risfendra, et al.
Published: (2025-01-01)

Aleatoric and Epistemic Uncertainty Reduction for Vision-Based Landing System for Fixed-Wing Aircraft: VLS-FWA
by: Dheeraj Bharti, et al.
Published: (2025-01-01)

SMS3D: 3D Synthetic Mushroom Scenes Dataset for 3D Object Detection and Pose Estimation
by: Abdollah Zakeri, et al.
Published: (2025-04-01)

Interactive Content Retrieval in Egocentric Videos Based on Vague Semantic Queries
by: Linda Ablaoui, et al.
Published: (2025-06-01)

Forecasting high-dimensional spatio-temporal systems from sparse measurements
by: Jialin Song, et al.
Published: (2024-01-01)