Safety-critical Policy Iteration Algorithm for Control under Model Uncertainty
Source: By:Author(s)
DOI: https://doi.org/10.30564/aia.v4i1.4361
Abstract: Safety is an important aim in designing safe-critical systems. To design such systems, many policy iterative algorithms are introduced to find safe optimal controllers. Due to the fact that in most practical systems, finding accurate information from the system is rather impossible, a new online training method is presented in this paper to perform an iterative reinforcement learning based algorithm using real data instead of identifying system dynamics. Also, in this paper the impact of model uncertainty is examined on control Lyapunov functions (CLF) and control barrier functions (CBF) dynamic limitations. The Sum of Square program is used to iteratively find an optimal safe control solution. The simulation results which are applied on a quarter car model show the efficiency of the proposed method in the fields of optimality and robustness. References:[1] Beard, R.W., Saridis, G.N., Wen, J.T., 1997. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation, Automatica. 33(12), 2159- 2177. DOI: https://doi.org/10.1016/S0005-1098(97)00128-3 [2] Vamvoudakis, K.G., Lewis, F.L., 2010. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica. 46(5), 878-888. DOI: https://doi.org/10.1109/IJCNN.2009.5178586 [3] Lewis, F.L., Vamvoudakis, K.G., 2011. Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data. IEEE Transactions on Systems. 41(1), 14-25. http://www.derongliu.org/adp/adpcdrom/Vamvoudakis2011.pdf [4] Kiumarsi, B., Lewis, F.L., 2015. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems, IEEE Transactions on Neural Networks and Learning Systems. 26(1), 140-151. DOI: https://doi.org/10.1109/TNNLS.2014.2358227 [5] Modares, H., Lewis, F.L., Naghibi-Sistani, M.B., 2014. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Automatica. 50(1), 193-202. DOI: https://doi.org/10.1016/j.automatica.2013.09.043 [6] Wang, D., Liu, D., Zhang, Y., et al., 2018. Neural network robust tracking control with adaptive critic framework for uncertain nonlinear systems, Neural Networks. 97(1), 11-18. DOI: https://doi.org/10.1016/j.neunet.2017.09.005 [7] Bhasin, S., Kamalapurkar, R., Johnson, M., et al., 2013. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems, Automatica. 49(1), 82-92. https://ncr.mae. ufl.edu/papers/auto13.pdf [8] Gao, W., Jiang, Z., 2018. Learning-based adaptive optimal trackingcontrol of strict-feedback nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems. 29(1), 2614-2624. https://ieeexplore.ieee.org/ielaam/5962385/8360119/8100742- aam.pdf [9] Abu-Khalaf, M., Lewis, F.L., 2004. Nearly optimal state feedback control of constrained nonlinear systems using a neural networks hjb approach. Annual Reviews in Control. 28(2), 239-251. DOI: http://dx.doi.org/10.1016/j.arcontrol.2004.07.002 [10] Ames, A.D., Grizzle, J.W., Tabuada, P., 2014. Control barrier function based quadratic programs with application to adaptive cruise control. 53rd IEEE Conference on Decision and Control. pp. 6271-6278. DOI: https://doi.org/10.1109/CDC.2014.7040372 [11] Ames, A.D., Xu, X., Grizzle, J.W., et al., 2017. Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control. 62(8), 3861-3876. DOI: https://doi.org/10.1109/TAC.2016.2638961 [12] Nguyen, Q., Sreenath, K., 2016. Exponential control barrier functions for enforcing high relative-degree safety-critical constraints. 2016 American Control Conference (ACC). pp. 322-328. DOI: https://doi.org/10.1109/ACC.2016.7524935 [13] Romdlony, M.Z., Jayawardhana, B., 2014. Uniting control Lyapunov and control barrier functions. 53rd IEEE Conference on Decision and Control. pp. 2293- 2298.DOI: https://doi.org/10.1109/CDC.2014.7039737 [14] Xu, X., Tabuada, P., Grizzle, J.W., et al., 2015. Robustness of control barrier functions for safety critical control. Analysis and Design of Hybrid Systems ADHS IFAC Papers Online. 48(27), 54-61. DOI: https://doi.org/10.1016/j.ifacol.2015.11.152 [15] Prajna, S., Rantzer, A., 2005. On the necessity of barrier certificates. 16thIFAC World Congress IFAC Proceedings. 38(1), 526-531. DOI: https://doi.org/10.3182/20050703-6-CZ-1902.00743 [16] Ames, A.D., Powell, M., 2013. Towards the unification of locomotion and manipulation through control lyapunov nctions and quadratic programs. In Control of Cyber Physical Systems. pp. 219-240. http://ames. caltech.edu/unify_ames_powell.pdf [17] Galloway, K., Sreenath, K., Ames, A.D., et al., 2015. Torque saturation in bipedal robotic walking through control Lyapunov function-based quadratic programs. pp. 323-332. DOI: https://doi.org/10.1109/ACCESS.2015.2419630 [18] Taylor, A.J., Dorobantu, V.D., Le, H.M., et al., 2019. Episodic learning with control lyapunov functions for uncertain robotic systems. ArXiv preprint. https:// arxiv.org/abs/1903 [19] Taylor, A.J., Singletary, A., Yue, Y., et al., 2019. Learning for safety-critical control with control barrier functions. ArXiv preprint. https://arxiv.org/abs/1912.10099 [20] Westenbroek, T., Fridovich-Keil, D., Mazumdar, E., et al., 2019. Feedback linearization for unknown systems via reinforcement learning. ArXiv preprint. https://arxiv.org/abs/1910.13272 [21] Hwangbo, J., Lee, J., Dosovitskiy, A., et al., 2019. Learning agile and dynamic motor skills for legged robots. Science Robotics. 4(26), 58-72. https://arxiv. org/abs/1901.08652 [22] Levine, S., Finn, C., Darrell, T., et al., 2016. Endtoend training of deep visuomotor policies. Learning Research. 17(1), 1532-4435. https://arxiv.org/abs/1504.00702 [23] Bansal, S., Calandra, R., Xiao, T., et al., 2017. Goal-driven dynamics learning via Bayesian optimization. 56th Annual Conference on Decision and Control (CDC). pp. 5168-5173. DOI: https://doi.org/10.1109/CDC.2017.8264425 [24] Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., et al., 2019. A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactions on Automatic Control. 64(7), 2737-2752. DOI: https://doi.org/10.1109/TAC.2018.2876389 [25] Prajna, S., Jadbabaie, A., 2004. Safety verification of hybrid systems using barrier certificates. In International Workshop on Hybrid Systems: Computation and Control. Springer. 2993(1), 477-492. https://viterbi-web.usc.edu/~jdeshmuk/teaching/cs699-fm-forcps/Papers/A5.pdf [26] Yazdani, N.M., Moghaddam, R.K., Kiumarsi, B., et al., 2020. A Safety-Certified Policy Iteration Algorithm for Control of Constrained Nonlinear Systems. IEEE Control Systems Letters. 4(3), 686-691. DOI: https://doi.org/10.1109/LCSYS.2020.2990632 [27] Lewis, F.L., Vrabie, D., Syrmos, V.L., 2012. Optimal control, 3rd Edition. John Wiley & Sons. [28] Wang, L., Ames, A., Egerstedt, M., 2016. Safety barrier certificates for heterogeneous multi-robot systems. 2016 American Control Conference (ACC). pp. 5213-5218. DOI: https://doi.org/10.1109/ACC.2016.7526486 [29] Ames, A.D., Coogan, S., Egerstedt, M., et al., 2019. Control barrier functions: Theory and applications. In Proc 2019 European Control Conference. https:// arxiv.org/abs/1903.11199 [30] Jiang, Y., Jiang, Z., 2015. Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Transactions on Automatic Control. 60(1), 2917-2929. DOI: https://doi.org/10.1109/TAC.2015.2414811 [31] Gaspar, P., Szaszi, I., Bokor, J., 2003. Active suspension design using linear parameter varying control. International Journal of Vehicle Autonomous Systems. 1(2), 206-221. DOI: https://doi.org/10.1016/S1474-6670(17)30403-2 [32] Silver, D., Lever, G., Heess, N., et al., 2014. Deterministic policy gradient algorithms. International conference on machine learning. pp. 387-395. http:// proceedings.mlr.press/v32/silver14.pdf [33] Papachristodoulou, A., Anderson, J., Valmorbida, G., et al., 2013. SOSTOOLS: Sum of squares optimization toolbox for MATLAB. Control and Dynamical Systems, California Institute of Technology, Pasadena. http://arxiv.org/abs/1310.4716 [34] Xu, J., Xie, L., Wang, Y., 2009. Simultaneous stabilization and robust control of polynomial nonlinear systems using SOS techniques. IEEE Transactions on Automatic Control. 54(8), 1892-1897. DOI: https://doi.org/10.1109/TAC.2009.2022108