RotJoint-Based Action Analyzer: A Robust Pose Comparison Pipeline

Human pose comparison involves measuring the similarities in body postures between individuals to understand movement patterns and interactions, yet existing methods are often insufficiently robust and flexible. In this paper, we propose a RotJoint-based pipeline for pose similarity estimation that...

Full description

Saved in:
Bibliographic Details
Main Authors: Guo Gan, Guang Yang, Zhengrong Liu, Ruiyan Xia, Zhenqing Zhu, Yuke Qiu, Hong Zhou, Yangwei Ying
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3737
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human pose comparison involves measuring the similarities in body postures between individuals to understand movement patterns and interactions, yet existing methods are often insufficiently robust and flexible. In this paper, we propose a RotJoint-based pipeline for pose similarity estimation that is both fine-grained and generalizable, as well as robust. Firstly, we developed a comprehensive benchmark for action ambiguity that intuitively and effectively evaluates the robustness of pose comparison methods against challenges such as body shape variations, viewpoint variations, and torsional poses. To address these challenges, we define a feature representation called RotJoints, which is strongly correlated with both the semantic and spatial characteristics of the pose. This parameter emphasizes the description of limb rotations across multiple dimensions, rather than merely describing orientation. Finally, we propose TemporalRotNet, a Transformer-based network, trained via supervised contrastive learning to capture spatial–temporal motion features. It achieves 93.7% accuracy on NTU-RGB+D close set action classification and 88% on the open set, demonstrating its effectiveness for dynamic motion analysis. Extensive experiments demonstrate that our RotJoint-based pipeline produces results more aligned with human understanding across a wide range of common pose comparison tasks and achieves superior performance in situations prone to ambiguity.
ISSN:2076-3417