Safe Switching Model-Free Value Iteration for General Nonlinear Systems
This paper presents a solution to the well-known challenge of ensuring safety guarantees during the evaluation of controllers tuned using Value Iteration (VI) techniques on real systems. We propose an approach called Safe Switching Model-Free Value Iteration (SSMFVI), which guarantees both stability...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10988815/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper presents a solution to the well-known challenge of ensuring safety guarantees during the evaluation of controllers tuned using Value Iteration (VI) techniques on real systems. We propose an approach called Safe Switching Model-Free Value Iteration (SSMFVI), which guarantees both stability and safety of a system in closed loop with a controller optimized with Value Iteration in a model-free manner. A state dependent switching rule is designed to alternate between the VI tuned controller and an initial known stabilizing admissible controller. Using the initial controller Q-function and the one continuously actualized during learning, the stability of the switching mechanism in closed loop with the system is derived using the Multiple Lyapunov Functions (MLF) framework. To guarantee safety during runtime operation, the systems maximum one-step transition is estimated. Then, the switching control signal is designed to select the MFVI controller only in the region of the state space both covered by the collected transitions during the exploration phase and with the distance to the unsafe set greater than the computed maximum one-step transition. This subset of the state space is determined using single-class Support Vector Machine (SVM) classification. The method includes mechanisms for early instability detection and chattering reduction near switching surfaces. The validation is conducted on a linear first order system for visualization of the results and on a real Electric Braking System circuit system, demonstrating the effectiveness of the proposed control method. |
|---|---|
| ISSN: | 2169-3536 |