Detecting unsafe behavior in neural network imitation policies for caregiving robotics
DOI:
https://doi.org/10.20535/SRIT.2308-8893.2024.4.07Keywords:
assistive robotics, reinforcement learning, diffusion models, imitation learning, anomaly detectionAbstract
This paper explores the application of imitation learning in caregiving robotics, aiming at addressing the increasing demand for automated assistance in caring for the elderly and disabled. While leveraging advancements in deep learning and control algorithms, the study focuses on training neural network policies using offline demonstrations. A key challenge addressed is the “Policy Stopping” problem, which is crucial for enhancing safety in imitation learning-based policies, particularly diffusion policies. Novel solutions proposed include ensemble predictors and adaptations of the normalizing flow-based algorithm for early anomaly detection. Comparative evaluations against anomaly detection methods like VAE and Tran-AD demonstrate superior performance on assistive robotics benchmarks. The paper concludes by discussing further research in integrating safety models into policy training, which is crucial for the reliable deployment of neural network policies in caregiving robotics.
References
J. Broekens, M. Heerink, and H. Rosendal, “Assistive social robots in elderly care: A review,” Gerontechnology, vol. 8, no. 2, pp. 94–103, 2009. doi: https://doi.org/10.4017/gt.2009.08.02.002.00
D.M. Taylor, “Americans with disabilities: 2014,” US Census Bureau, pp. 1–32, 2018.
Dan Hendrycks et al., “The many faces of robustness: A critical analysis of out-of-distribution generalization,” Proceedings of the IEEE/CVF international conference on computer vision, 2021. doi: 10.1109/ICCV48922.2021.00823
Clemente Lauretti et al., “Learning by demonstration for planning activities of daily living in rehabilitation and assistive robotics,” IEEE Robotics and Automation Letters, vol. 2, issue 3, pp. 1375–1382, 2017. doi: 10.1109/LRA.2017.2669369
Matteo Saveriano, Fares J. Abu-Dakka, Aljaz Kramberger, and Luka Peternel, “Dynamic movement primitives in robotics: A tutorial survey,” The International Journal of Robotics Research, vol. 42, issue 13, pp. 1133–1184, 2023.
Z. Erickson, V. Gangaram, A. Kapusta, C.K. Liu, and C.C. Kemp, “Assistive gym: A physics simulation framework for assistive robotics,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020, pp. 10169–10176.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint, 2017. doi: https://doi.org/10.48550/arXiv.1707.06347
Jakhotiya Yash, Iman Haque, “Improving Assistive Robotics with Deep Reinforcement Learning,” arXiv preprint, 2022. doi: https://doi.org/10.48550/arXiv.2209.02160
Maryam Zare, Parham M. Kebria, Abbas Khosravi, and Saeid Nahavandi, “A survey of imitation learning: Algorithms, recent developments, and challenges,” arXiv preprint, 2023. doi: https://doi.org/10.48550/arXiv.2309.02473
Chi Cheng et al., “Diffusion policy: Visuomotor policy learning via action diffusion,” arXiv preprint, 2023. doi: https://doi.org/10.48550/arXiv.2303.04137
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” arXiv preprint, 2020. doi: https://doi.org/10.48550/arXiv.2006.11239
Vincent Mai, Mani Kaustubh, and Paull Liam, “Sample efficient deep reinforcement learning via uncertainty estimation,” arXiv preprint, 2022. doi: https://doi.org/10.48550/arXiv.2201.01666
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole, “Score-based generative modeling through stochastic differential equations,” arXiv preprint, 2020. doi: https://doi.org/10.48550/arXiv.2011.13456
Michael Janner, Justin Fu, Marvin Zhang, and Sergey Levine, “When to trust your model: Model-based policy optimization,” Advances in Neural Information Processing Systems 32, 2019. doi: 10.48550/arXiv.1906.08253
Shunan Guo, Zhuochen Jin, Qing Chen, David Gotz, Hongyuan Zha, and Nan Cao, “Visual anomaly detection in event sequence data,” 2019 IEEE International Conference on Big Data (Big Data). doi: 10.1109/BigData47090.2019.9005687
Diederik P. Kingma, Max Welling, “Auto-encoding variational bayes,” arXiv preprint, 2013. doi: https://doi.org/10.48550/arXiv.1312.6114
Shreshth Tuli, Giuliano Casale, and Nicholas R. Jennings, “TranAD: Deep transformer networks for anomaly detection in multivariate time series data,” arXiv preprint, 2022. doi: https://doi.org/10.48550/arXiv.2201.07284
Jan Thieß Brockmann, Marco Rudolph, Bodo Rosenhahn, and Bastian Wandt, “The voraus-AD Dataset for Anomaly Detection in Robot Applications,” IEEE Transactions on Robotics, 2023. doi: 10.1109/TRO.2023.3332224
A. Tytarenko, Assistive Gym Fork. 2024. Accessed on June 19, 2024. [Online]. Available: https://github.com/titardrew/assistive-gym
Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” arXiv preprint, 2023. doi: https://doi.org/10.48550/arXiv.2304.13705
Tianhe Yu et al., “Mopo: Model-based offline policy optimization,” Advances in Neural Information Processing Systems 33, pp. 14129–14142, 2020. Available: https://proceedings.nips.cc/paper/2020/file/a322852ce0df73e204b7e67cbbef0d0a-Paper.pdf
Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, and Thorsten Joachims, "MOReL: Model-based offline reinforcement learning,” Advances in Neural Information Processing Systems 33, pp. 21810–21823, 2020. Available: https://proceedings.neurips.cc/paper_files/paper/2020/file/f7efa4f864ae9b88d43527f4b14f750f-Paper.pdf
Laura Smith, Yunhao Cao, and Sergey Levine, “Grow your limits: Continuous Improvement with Real-World RL for Robotic Locomotion,” arXiv preprint, 2023. doi: https://doi.org/10.48550/arXiv.2310.17634
Weitang Liu, Xiaoyun Wang, John D. Owens, and Yixuan Li, “Energy-based out-of-distribution detection,” Advances in Neural Information Processing Systems 33, pp. 21464–21475, 2020. Available: https://proceedings.neurips.cc/paper/2020/file/f5496252609c43eb8a3d147ab9b9c006-Paper.pdf