Command-driven semantic robotic grasping towards user-specified tasks

Abstract Despite significant advancements in robotic grasp driven by visual perception, deploying robots in unstructured environments to perform user-specified tasks still poses considerable challenges. Natural language offers an intuitive means of specifying task objectives, reducing ambiguity. In...

Full description

Saved in:
Bibliographic Details
Main Authors: Qing Lyu, Qingwen Ye, Xiaoyan Chen, Qiuju Zhang
Format: Article
Language:English
Published: Springer 2025-06-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-025-01981-y
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Despite significant advancements in robotic grasp driven by visual perception, deploying robots in unstructured environments to perform user-specified tasks still poses considerable challenges. Natural language offers an intuitive means of specifying task objectives, reducing ambiguity. In this study, we introduced natural language into a vision-guided grasp system by employing visual attributes as a mediating bridge between language instructions and visual observations. We propose a command-driven semantic grasp architecture that integrates pixel attention within the visual attribute recognition module and includes a modified grasp pose estimation network to enhance prediction accuracy. Our experimental results show that our approach improves the performance of the submodules including visual attribute recognition and grasp pose estimation compared to baseline models. Furthermore, we demonstrate that our proposed model exhibits notable effectiveness in real-world user-specified grasping experiments.
ISSN:2199-4536
2198-6053