Options
2002
Conference Paper
Titel
DIVA: A self organizing adaptive world model for reinforcement learning
Abstract
Reinforcement learning algorithms without an internal world model often suffer from overly long time to converge. Mostly the agent has to be successful a several hundred times before it could learn how to behave in even simple environments. In this case, a world model could be useful to reduce the number of real world trials by performing the action virtually in the world model. This may help to propagate the Reinforcement Q- or V- values much faster through the state (action) space and could be interpreted as a simple form of planning. In the following investigation we introduce a self organizing deterministic world model ""DIVA"" (""Discretization Improvement by Variance reduction"") with an adaptive discretization, which can speed up learning by using common methods like Suttons Dyna-Q. Proposed in this article, the ""DIVA""-model is implemented in a six legged walking robot, which learns how to walk in a minimum of time and with a minimum of real world moving trials.