Sunday, March 06, 2011

Non-Markovian Double Pole Balancing with Multiple Carts



Above is an animation depicting the results of having evolved a single recurrent neural network (with 10 hidden neurons) to act as a balancing controller for two carts simultaneously.

The RNN has no access to velocity information, thus making the task non-Markovian, and hence, significantly more difficult.

Cart-pole balancing is a fairly standard benchmark problem in the Reinforcement Learning literature, although to the best of my knowledge this is the first example of controlling multiple carts at the same time.

3 comments:

Aldux said...

Very good work!! seems impressive, I'm working in something similar, but the balancing pole is inverted, and using ESN, but I'm using matlab, could I possibly get a hand on your code? cheers!!

Unknown said...

Thanks! That was from ages ago, and the code has mutated quite a bit in the meantime, so I don't really have it in a form that reproduces that experiment. Sorry. But I can tell you that I used CMA-ES to evolve the recurrent neural nets. Also, I was likely using sinusoidal hidden neurons rather than sigmoids.

I'm curious, how are you training your system? I'm familiar with ESNs - but what are you using to generate the readout function? Reinforcement learning? Evolutionary algorithms?

Aldux said...

Let tell you about my research...

I'm want to implement ESN in swing-free (dampening oscillations) trajectory tracking of a quadcopter with a slung load under it, this load behaves like a pendulum, it changes the natural frequencies and of course puts additional strain on the controller...

So, I have my mathematical model for the slung load and the quadcopter, my first idea was to implement a control with inverse dynamics to get the data needed to train the ESN... but after reading lots of papers, I'm kind of lost now, hehe

Another idea is to just specify the reference trajectory for the suspended load only and then make the quadcopter learn its own trajectory which ensures that the slung load tracks the reference trajectory, but, then again I need data... but then I saw your videos and projects on RNN and ESN and its really close, I was wondering if you have some advices or recommendations I could follow??

Thanks a lot Tom!!

Best regards!