## Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten. # A projected primal-dual gradient optimal control method for deep reinforcement learning

**Abstract**

In this contribution, we start with a policy-based Reinforcement Learning ansatz using neural networks. The underlying Markov Decision Process consists of a transition probability representing the dynamical system and a policy realized by a neural network mapping the current state to parameters of a distribution. Therefrom, the next control can be sampled. In this setting, the neural network is replaced by an ODE, which is based on a recently discussed interpretation of neural networks. The resulting infinite optimization problem is transformed into an optimization problem similar to the well-known optimal control problems. Afterwards, the necessary optimality conditions are established and from this a new numerical algorithm is derived. The operating principle is shown with two examples. It is applied to a simple example, where a moving point is steered through an obstacle course to a desired end position in a 2D plane. The second example shows the applicability to more complex problems. There, the aim is to control the finger tip of a human arm model with five degrees of freedom and 29 Hill’s muscle models to a desired end position.