Ramamurthy, RajkumarRajkumarRamamurthyBauckhage, ChristianChristianBauckhageSifa, RafetRafetSifaWrobel, StefanStefanWrobel2022-03-142022-03-142018https://publica.fraunhofer.de/handle/publica/40270210.1007/978-3-030-01424-7_1We analyze the use of simultaneous perturbation stochastic approximation (SPSA), a stochastic optimization technique, for solving reinforcement learning problems. In particular, we consider settings of partial observability and leverage the short-term memory capabilities of echo state networks (ESNs) to learn parameterized control policies. Using SPSA, we propose three different variants to adapt the weight matrices of an ESN to the task at hand. Experimental results on classic control problems with both discrete and continuous action spaces reveal that ESNs trained using SPSA approaches outperform conventional ESNs trained using temporal difference and policy gradient methods.en005006629Policy learning using SPSAconference paper