Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach

Programming by demonstrations is one of the most efficient methods for knowledge transfer to develop advanced learning systems, provided that teachers deliver abundant and correct demonstrations, and learners correctly perceive them. Nevertheless, demonstrations are sparse and inaccurate in almost a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nafee Mourad, Ali Ezzeddine, Babak Nadjar Araabi, Majid Nili Ahmadabadi
Format:	Article
Language:	English
Published:	Wiley 2020-01-01
Series:	Journal of Robotics
Online Access:	http://dx.doi.org/10.1155/2020/3849309
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832566016655228928
author	Nafee Mourad Ali Ezzeddine Babak Nadjar Araabi Majid Nili Ahmadabadi
author_facet	Nafee Mourad Ali Ezzeddine Babak Nadjar Araabi Majid Nili Ahmadabadi
author_sort	Nafee Mourad
collection	DOAJ
description	Programming by demonstrations is one of the most efficient methods for knowledge transfer to develop advanced learning systems, provided that teachers deliver abundant and correct demonstrations, and learners correctly perceive them. Nevertheless, demonstrations are sparse and inaccurate in almost all real-world problems. Complementary information is needed to compensate these shortcomings of demonstrations. In this paper, we target programming by a combination of nonoptimal and sparse demonstrations and a limited number of binary evaluative feedbacks, where the learner uses its own evaluated experiences as new demonstrations in an extended inverse reinforcement learning method. This provides the learner with a broader generalization and less regret as well as robustness in face of sparsity and nonoptimality in demonstrations and feedbacks. Our method alleviates the unrealistic burden on teachers to provide optimal and abundant demonstrations. Employing an evaluative feedback, which is easy for teachers to deliver, provides the opportunity to correct the learner’s behavior in an interactive social setting without requiring teachers to know and use their own accurate reward function. Here, we enhance the inverse reinforcement learning (IRL) to estimate the reward function using a mixture of nonoptimal and sparse demonstrations and evaluative feedbacks. Our method, called IRL from demonstration and human’s critique (IRLDC), has two phases. The teacher first provides some demonstrations for the learner to initialize its policy. Next, the learner interacts with the environment and the teacher provides binary evaluative feedbacks. Taking into account possible inconsistencies and mistakes in issuing and receiving feedbacks, the learner revises the estimated reward function by solving a single optimization problem. The IRLDC is devised to handle errors and sparsities in demonstrations and feedbacks and can generalize different combinations of these two sources expertise. We apply our method to three domains: a simulated navigation task, a simulated car driving problem with human interactions, and a navigation experiment of a mobile robot. The results indicate that the IRLDC significantly enhances the learning process where the standard IRL methods fail and learning from feedbacks (LfF) methods has a high regret. Also, the IRLDC works well at different levels of sparsity and optimality of the teacher’s demonstrations and feedbacks, where other state-of-the-art methods fail.
format	Article
id	doaj-art-b18600c504f84c9ea356e2208464cc5e
institution	Kabale University
issn	1687-9600 1687-9619
language	English
publishDate	2020-01-01
publisher	Wiley
record_format	Article
series	Journal of Robotics
spelling	doaj-art-b18600c504f84c9ea356e2208464cc5e2025-02-03T01:05:21ZengWileyJournal of Robotics1687-96001687-96192020-01-01202010.1155/2020/38493093849309Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning ApproachNafee Mourad0Ali Ezzeddine1Babak Nadjar Araabi2Majid Nili Ahmadabadi3Cognitive Systems Laboratory, School of ECE, College of Engineering, University of Tehran, Tehran, IranMachine Learning and Computational Modeling Laboratory, School of ECE, College of Engineering, University of Tehran, Tehran, IranMachine Learning and Computational Modeling Laboratory, School of ECE, College of Engineering, University of Tehran, Tehran, IranCognitive Systems Laboratory, School of ECE, College of Engineering, University of Tehran, Tehran, IranProgramming by demonstrations is one of the most efficient methods for knowledge transfer to develop advanced learning systems, provided that teachers deliver abundant and correct demonstrations, and learners correctly perceive them. Nevertheless, demonstrations are sparse and inaccurate in almost all real-world problems. Complementary information is needed to compensate these shortcomings of demonstrations. In this paper, we target programming by a combination of nonoptimal and sparse demonstrations and a limited number of binary evaluative feedbacks, where the learner uses its own evaluated experiences as new demonstrations in an extended inverse reinforcement learning method. This provides the learner with a broader generalization and less regret as well as robustness in face of sparsity and nonoptimality in demonstrations and feedbacks. Our method alleviates the unrealistic burden on teachers to provide optimal and abundant demonstrations. Employing an evaluative feedback, which is easy for teachers to deliver, provides the opportunity to correct the learner’s behavior in an interactive social setting without requiring teachers to know and use their own accurate reward function. Here, we enhance the inverse reinforcement learning (IRL) to estimate the reward function using a mixture of nonoptimal and sparse demonstrations and evaluative feedbacks. Our method, called IRL from demonstration and human’s critique (IRLDC), has two phases. The teacher first provides some demonstrations for the learner to initialize its policy. Next, the learner interacts with the environment and the teacher provides binary evaluative feedbacks. Taking into account possible inconsistencies and mistakes in issuing and receiving feedbacks, the learner revises the estimated reward function by solving a single optimization problem. The IRLDC is devised to handle errors and sparsities in demonstrations and feedbacks and can generalize different combinations of these two sources expertise. We apply our method to three domains: a simulated navigation task, a simulated car driving problem with human interactions, and a navigation experiment of a mobile robot. The results indicate that the IRLDC significantly enhances the learning process where the standard IRL methods fail and learning from feedbacks (LfF) methods has a high regret. Also, the IRLDC works well at different levels of sparsity and optimality of the teacher’s demonstrations and feedbacks, where other state-of-the-art methods fail.http://dx.doi.org/10.1155/2020/3849309
spellingShingle	Nafee Mourad Ali Ezzeddine Babak Nadjar Araabi Majid Nili Ahmadabadi Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach Journal of Robotics
title	Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach
title_full	Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach
title_fullStr	Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach
title_full_unstemmed	Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach
title_short	Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach
title_sort	learning from demonstrations and human evaluative feedbacks handling sparsity and imperfection using inverse reinforcement learning approach
url	http://dx.doi.org/10.1155/2020/3849309
work_keys_str_mv	AT nafeemourad learningfromdemonstrationsandhumanevaluativefeedbackshandlingsparsityandimperfectionusinginversereinforcementlearningapproach AT aliezzeddine learningfromdemonstrationsandhumanevaluativefeedbackshandlingsparsityandimperfectionusinginversereinforcementlearningapproach AT babaknadjararaabi learningfromdemonstrationsandhumanevaluativefeedbackshandlingsparsityandimperfectionusinginversereinforcementlearningapproach AT majidniliahmadabadi learningfromdemonstrationsandhumanevaluativefeedbackshandlingsparsityandimperfectionusinginversereinforcementlearningapproach

Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach

Similar Items