Multi-step prediction in linearized latent state spaces for representation learning
Keywords:representation learning, learning controllable embedding, reinforcement learning, latent state space
In this paper, we derive a novel method as a generalization over LCEs such as E2C. The method develops the idea of learning a locally linear state space by adding a multi-step prediction, thus allowing for more explicit control over the curvature. We show that the method outperforms E2C without drastic model changes which come with other works, such as PCC and P3C. We discuss the relation between E2C and the presented method and derive update equations. We provide empirical evidence, which suggests that by considering the multi-step prediction, our method – ms-E2C – allows learning much better latent state spaces in terms of curvature and next state predictability. Finally, we also discuss certain stability challenges we encounter with multi-step predictions and how to mitigate them.
G. Dulac-Arnold et al., “Deep reinforcement learning in large discrete action spaces,” arXiv preprint arXiv:1512.07679, 2015. doi: 10.48550/arXiv.1512.07679.
S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020. doi: 10.48550/arXiv.2005.01643.
E.A. Feinberg, P.O. Kasyanov, and M.Z. Zgurovsky, “Partially observable total-cost markov decision processes with weakly continuous transition probabilities,” Mathematics of Operations Research, vol. 41, no. 2, pp. 656–681, 2016. doi: 10.1287/moor.2015.0746.
E.A. Feinberg, P.O. Kasyanov, and M.Z. Zgurovsky, “Convergence of probability measures and markov decision models with incomplete information,” Proceedings of the Steklov Institute of Mathematics, vol. 287, no. 1, pp. 96–117, 2014. doi: 10.1134/S0081543814080069.
O. Vinyals et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019. doi: 10.1038/s41586-019-1724-z.
S. Reed et al., “A generalist agent,” arXiv preprint arXiv:2205.06175, 2022.
D. Ha and J. Schmidhuber, “World models,” arXiv preprint arXiv:1803.10122, 2018. doi: 10.48550/arXiv.1803.10122.
T.M. Moerland, J. Broekens, and C.M. Jonker, “Model-based reinforcement learning: A survey,” arXiv preprint arXiv:2006.16712, 2020. doi: 10.48550/arXiv.2006.16712.
D. Hafner et al., “Learning latent dynamics for planning from pixels,” in International conference on machine learning, PMLR, 2019, pp. 2555–2565. doi: 10.48550/arXiv.1811.04551.
R.F. Prudencio, M.R. Maximo, and E.L. Colombini, “A survey on offline reinforcement learning: Taxonomy, review, and open problems,” arXiv preprint arXiv:2203.01387, 2022. doi: 10.48550/arXiv.2203.01387.
W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear biological movement systems,” in ICINCO (1), Citeseer, 2004, pp. 222–229. doi:10.5220/0001143902220229.
M. Watter, J. Springenberg, J. Boedecker, and M. Riedmiller, “Embed to control: A locally linear latent dynamics model for control from raw images,” Advances in neural information processing systems, vol. 28, 2015.
N. Levine, Y. Chow, R. Shu, A. Li, M. Ghavamzadeh, and H. Bui, “Prediction, consistency, curvature: Representation learning for locallylinear control,” arXiv preprint arXiv:1909.01506, 2019.
R. Shu et al., “Predictive coding for locally-linear control,” in International Conference on Machine Learning, PMLR, 2020, pp. 8862–8871. doi: 10.5555/3524938.3525760.
M. Lechner, R. Hasani, D. Rus, and R. Grosu, “Gershgorin loss stabilizes the recurrent neural network compartment of an end-to-end robot learning scheme,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020, pp. 5446–5452. doi: 10.1109/ICRA40945.2020.9196608.
R.A. Horn and C.R. Johnson, Matrix analysis. Cambridge university press, 2012. doi: 10.5555/2422911.
M.D. Zeiler, D. Krishnan, G.W. Taylor, and R. Fergus, “Deconvolutional networks,” in 2010 IEEE Computer Society Conference on computer vision and pattern recognition, pp. 2528–2535. doi: 10.1109/CVPR.2010.5539957
A. Tytarenko, Rl-research. Available: https://github.com/titardrew/rl-research, 2022.