Detecting unsafe behavior in neural network imitation policies for caregiving robotics

Andrii Tytarenko

doi:10.20535/SRIT.2308-8893.2024.4.07

Authors

Andrii Tytarenko Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Kyiv, Ukraine https://orcid.org/0000-0002-8265-642X

DOI:

https://doi.org/10.20535/SRIT.2308-8893.2024.4.07

Keywords:

assistive robotics, reinforcement learning, diffusion models, imitation learning, anomaly detection

Abstract

This paper explores the application of imitation learning in caregiving robotics, aiming at addressing the increasing demand for automated assistance in caring for the elderly and disabled. While leveraging advancements in deep learning and control algorithms, the study focuses on training neural network policies using offline demonstrations. A key challenge addressed is the “Policy Stopping” problem, which is crucial for enhancing safety in imitation learning-based policies, particularly diffusion policies. Novel solutions proposed include ensemble predictors and adaptations of the normalizing flow-based algorithm for early anomaly detection. Comparative evaluations against anomaly detection methods like VAE and Tran-AD demonstrate superior performance on assistive robotics benchmarks. The paper concludes by discussing further research in integrating safety models into policy training, which is crucial for the reliable deployment of neural network policies in caregiving robotics.

Author Biography

Andrii Tytarenko, Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Kyiv

Ph.D. student at Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Kyiv, Ukraine.

References

J. Broekens, M. Heerink, and H. Rosendal, “Assistive social robots in elderly care: A review,” Gerontechnology, vol. 8, no. 2, pp. 94–103, 2009. doi: https://doi.org/10.4017/gt.2009.08.02.002.00

D.M. Taylor, “Americans with disabilities: 2014,” US Census Bureau, pp. 1–32, 2018.

Dan Hendrycks et al., “The many faces of robustness: A critical analysis of out-of-distribution generalization,” Proceedings of the IEEE/CVF international conference on computer vision, 2021. doi: 10.1109/ICCV48922.2021.00823

Clemente Lauretti et al., “Learning by demonstration for planning activities of daily living in rehabilitation and assistive robotics,” IEEE Robotics and Automation Letters, vol. 2, issue 3, pp. 1375–1382, 2017. doi: 10.1109/LRA.2017.2669369

Matteo Saveriano, Fares J. Abu-Dakka, Aljaz Kramberger, and Luka Peternel, “Dynamic movement primitives in robotics: A tutorial survey,” The International Journal of Robotics Research, vol. 42, issue 13, pp. 1133–1184, 2023.

Z. Erickson, V. Gangaram, A. Kapusta, C.K. Liu, and C.C. Kemp, “Assistive gym: A physics simulation framework for assistive robotics,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020, pp. 10169–10176.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint, 2017. doi: https://doi.org/10.48550/arXiv.1707.06347

Jakhotiya Yash, Iman Haque, “Improving Assistive Robotics with Deep Reinforcement Learning,” arXiv preprint, 2022. doi: https://doi.org/10.48550/arXiv.2209.02160

Maryam Zare, Parham M. Kebria, Abbas Khosravi, and Saeid Nahavandi, “A survey of imitation learning: Algorithms, recent developments, and challenges,” arXiv preprint, 2023. doi: https://doi.org/10.48550/arXiv.2309.02473

Chi Cheng et al., “Diffusion policy: Visuomotor policy learning via action diffusion,” arXiv preprint, 2023. doi: https://doi.org/10.48550/arXiv.2303.04137

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” arXiv preprint, 2020. doi: https://doi.org/10.48550/arXiv.2006.11239

Vincent Mai, Mani Kaustubh, and Paull Liam, “Sample efficient deep reinforcement learning via uncertainty estimation,” arXiv preprint, 2022. doi: https://doi.org/10.48550/arXiv.2201.01666

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole, “Score-based generative modeling through stochastic differential equations,” arXiv preprint, 2020. doi: https://doi.org/10.48550/arXiv.2011.13456

Michael Janner, Justin Fu, Marvin Zhang, and Sergey Levine, “When to trust your model: Model-based policy optimization,” Advances in Neural Information Processing Systems 32, 2019. doi: 10.48550/arXiv.1906.08253

Shunan Guo, Zhuochen Jin, Qing Chen, David Gotz, Hongyuan Zha, and Nan Cao, “Visual anomaly detection in event sequence data,” 2019 IEEE International Conference on Big Data (Big Data). doi: 10.1109/BigData47090.2019.9005687

Diederik P. Kingma, Max Welling, “Auto-encoding variational bayes,” arXiv preprint, 2013. doi: https://doi.org/10.48550/arXiv.1312.6114

Shreshth Tuli, Giuliano Casale, and Nicholas R. Jennings, “TranAD: Deep transformer networks for anomaly detection in multivariate time series data,” arXiv preprint, 2022. doi: https://doi.org/10.48550/arXiv.2201.07284

Jan Thieß Brockmann, Marco Rudolph, Bodo Rosenhahn, and Bastian Wandt, “The voraus-AD Dataset for Anomaly Detection in Robot Applications,” IEEE Transactions on Robotics, 2023. doi: 10.1109/TRO.2023.3332224

A. Tytarenko, Assistive Gym Fork. 2024. Accessed on June 19, 2024. [Online]. Available: https://github.com/titardrew/assistive-gym

Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” arXiv preprint, 2023. doi: https://doi.org/10.48550/arXiv.2304.13705

Tianhe Yu et al., “Mopo: Model-based offline policy optimization,” Advances in Neural Information Processing Systems 33, pp. 14129–14142, 2020. Available: https://proceedings.nips.cc/paper/2020/file/a322852ce0df73e204b7e67cbbef0d0a-Paper.pdf

Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, and Thorsten Joachims, "MOReL: Model-based offline reinforcement learning,” Advances in Neural Information Processing Systems 33, pp. 21810–21823, 2020. Available: https://proceedings.neurips.cc/paper_files/paper/2020/file/f7efa4f864ae9b88d43527f4b14f750f-Paper.pdf

Laura Smith, Yunhao Cao, and Sergey Levine, “Grow your limits: Continuous Improvement with Real-World RL for Robotic Locomotion,” arXiv preprint, 2023. doi: https://doi.org/10.48550/arXiv.2310.17634

Weitang Liu, Xiaoyun Wang, John D. Owens, and Yixuan Li, “Energy-based out-of-distribution detection,” Advances in Neural Information Processing Systems 33, pp. 21464–21475, 2020. Available: https://proceedings.neurips.cc/paper/2020/file/f5496252609c43eb8a3d147ab9b9c006-Paper.pdf

Detecting unsafe behavior in neural network imitation policies for caregiving robotics

Authors

DOI:

Keywords:

Abstract

Author Biography

Andrii Tytarenko, Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Kyiv

References

Downloads

Published

Issue

Section

License

Information

Make a Submission