Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

A projected primal-dual gradient optimal control method for deep reinforcement learning

: Gottschalk, S.; Burger, M.; Gerdts, M.

Volltext ()

Journal of Mathematics in Industry 10 (2020), Art. 9, 22 S.
ISSN: 2190-5983
Zeitschriftenaufsatz, Elektronische Publikation
Fraunhofer ITWM ()

In this contribution, we start with a policy-based Reinforcement Learning ansatz using neural networks. The underlying Markov Decision Process consists of a transition probability representing the dynamical system and a policy realized by a neural network mapping the current state to parameters of a distribution. Therefrom, the next control can be sampled. In this setting, the neural network is replaced by an ODE, which is based on a recently discussed interpretation of neural networks. The resulting infinite optimization problem is transformed into an optimization problem similar to the well-known optimal control problems. Afterwards, the necessary optimality conditions are established and from this a new numerical algorithm is derived. The operating principle is shown with two examples. It is applied to a simple example, where a moving point is steered through an obstacle course to a desired end position in a 2D plane. The second example shows the applicability to more complex problems. There, the aim is to control the finger tip of a human arm model with five degrees of freedom and 29 Hill’s muscle models to a desired end position.