Gottschalk, S.S.GottschalkBurger, M.M.BurgerGerdts, M.M.Gerdts2022-03-062022-03-062020https://publica.fraunhofer.de/handle/publica/26187410.1186/s13362-020-00075-3In this contribution, we start with a policy-based Reinforcement Learning ansatz using neural networks. The underlying Markov Decision Process consists of a transition probability representing the dynamical system and a policy realized by a neural network mapping the current state to parameters of a distribution. Therefrom, the next control can be sampled. In this setting, the neural network is replaced by an ODE, which is based on a recently discussed interpretation of neural networks. The resulting infinite optimization problem is transformed into an optimization problem similar to the well-known optimal control problems. Afterwards, the necessary optimality conditions are established and from this a new numerical algorithm is derived. The operating principle is shown with two examples. It is applied to a simple example, where a moving point is steered through an obstacle course to a desired end position in a 2D plane. The second example shows the applicability to more complex problems. There, the aim is to control the finger tip of a human arm model with five degrees of freedom and 29 Hill's muscle models to a desired end position.en003006519A projected primal-dual gradient optimal control method for deep reinforcement learningjournal article