Policy learning using SPSA

Ramamurthy, Rajkumar; Bauckhage, Christian; Sifa, Rafet; Wrobel, Stefan

doi:10.1007/978-3-030-01424-7_1

2018

Conference Paper

Abstract

We analyze the use of simultaneous perturbation stochastic approximation (SPSA), a stochastic optimization technique, for solving reinforcement learning problems. In particular, we consider settings of partial observability and leverage the short-term memory capabilities of echo state networks (ESNs) to learn parameterized control policies. Using SPSA, we propose three different variants to adapt the weight matrices of an ESN to the task at hand. Experimental results on classic control problems with both discrete and continuous action spaces reveal that ESNs trained using SPSA approaches outperform conventional ESNs trained using temporal difference and policy gradient methods.

Author(s)

Mainwork

Artificial Neural Networks and Machine Learning - ICANN 2018. Proceedings, Part III

Project(s)

ML2R

Funder

Bundesministerium für Bildung und Forschung BMBF (Deutschland)

Conference

International Conference on Artificial Neural Networks (ICANN) 2018

Options

Policy learning using SPSA