Thursday, March 17, 2011
Single recurrent neural network (15 hidden neurons) evolved to balance seven double-pole carts simultaneously. No velocity information is available to the controller, so it is a non-Markovian task. For each of the seven carts, the controller has access to the cart position, and the angle of both poles, for a total of 21 inputs. The controller has seven outputs with which it applies force to the carts at each time step. The force is continuously valued, rather than "bang-bang".
It is able to successfully keep balance for 5000 time steps, although it becomes unstable at the end. The process of evolving the controller took precisely 13,373,736 fitness evaluations (roughly 20 hours of CPU time.)
This controller was, for the sake of time, only evolved on a single initial condition (with slightly off-center pole angles), so it is unknown how well it generalizes to other initial conditions. However, it still proves to be a very challenging task.
Tuesday, March 08, 2011
Same as my previous post, except this time with three carts, and the following changes in experimental setup:
First, the forces applied to the carts have been switched to continuous rather than "bang-bang".
Second, the random number generator is being seeded such that the fitness evaluations are always performed over the same set of 10 initial conditions (previously, it had been given an entirely new set of 10 for each evaluation). The advantage is that this greatly reduces noise in the fitness evaluation function. This does have the drawback of potentially harming generalization to some degree, although in this particular case I don't think it is really much of an issue.
Now to try four...
Sunday, March 06, 2011
Above is an animation depicting the results of having evolved a single recurrent neural network (with 10 hidden neurons) to act as a balancing controller for two carts simultaneously.
The RNN has no access to velocity information, thus making the task non-Markovian, and hence, significantly more difficult.
Cart-pole balancing is a fairly standard benchmark problem in the Reinforcement Learning literature, although to the best of my knowledge this is the first example of controlling multiple carts at the same time.