A projected primal-dual gradient optimal control method for deep reinforcement learning

Gottschalk, S.; Burger, M.; Gerdts, M.

doi:10.1186/s13362-020-00075-3

2020

Journal Article

Abstract

In this contribution, we start with a policy-based Reinforcement Learning ansatz using neural networks. The underlying Markov Decision Process consists of a transition probability representing the dynamical system and a policy realized by a neural network mapping the current state to parameters of a distribution. Therefrom, the next control can be sampled. In this setting, the neural network is replaced by an ODE, which is based on a recently discussed interpretation of neural networks. The resulting infinite optimization problem is transformed into an optimization problem similar to the well-known optimal control problems. Afterwards, the necessary optimality conditions are established and from this a new numerical algorithm is derived. The operating principle is shown with two examples. It is applied to a simple example, where a moving point is steered through an obstacle course to a desired end position in a 2D plane. The second example shows the applicability to more complex problems. There, the aim is to control the finger tip of a human arm model with five degrees of freedom and 29 Hill's muscle models to a desired end position.

Author(s)

Gottschalk, S.

Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM

Burger, M.

Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM

Gerdts, M.

Universität der Bundeswehr München

Journal

Journal of Mathematics in Industry. Online journal

Options

A projected primal-dual gradient optimal control method for deep reinforcement learning