Friday, February 18, 2011

Animated Fitness Landscapes







Now using time to visualize a 3rd dimension in the high-dimensional space. See my previous posts for more details.

Tuesday, February 15, 2011

Fitness Landscapes

Here are some visualizations of fitness landscapes for recurrent neural networks. I have taken 2-dimensional "slices" out of an otherwise very high-dimensional parameter space. To create a "slice", two links from the RNN are randomly selected and then systematically varied and re-evaluated for fitness. So in order to generate one of these 800-by-800 pixel images, 640,000 fitness evaluations are required.













I had previously done some similar experiments, although I think the results above are a bit nicer, aesthetically.

Monday, January 31, 2011

High-Order Adaptive Mutation



This is a visualization of an experiment I'm doing involving high-order, self-adaptive mutation distributions. This idea has grown out of an ongoing conversation with the creator of floatworld, a really neat (and increasingly full-featured) Artificial Life simulator.

The basic idea is that over time, the environment an adaptive organism finds itself in will change, and it may do so at varying rates. Rather than allowing only a fixed size/rate of mutation, why not allow the parameters controlling mutation to themselves be folded into the evolutionary process?

The visualization above is the result of creating such a system, and additionally making use of a covariance matrix, so that the distribution can be stretched and rotated arbitrarily.

Thursday, December 09, 2010

Double Pole Balancing with Exploration-Estimation



I've been doing some further experiments with pole balancing (also called the inverted pendulum problem).

In my previous experiment, I had partitioned the four-dimensional state space into a set of discrete states (typically several hundred), and then used observations of transitions conditioned on actions to compute transition probabilities in order to produce a Markov decision process. I then had it learn an optimal policy over the MDP using dynamic programming. This approach seemed to be fairly effective; it was usually able to learn to balance a single pole indefinitely after roughly 10 to 15 balancing attempts (plus an additional 10 episodes at the beginning using a completely random policy in order to gather a bit of data to learn from). Between each balancing attempt, it would relearn the model from scratch.

So I next decided to try it out on a more challenging version of the problem - balancing with an additional pole. This newer version proved too much for it, and it failed to learn to solve the problem, even after being given several hundred balancing attempts in which to do so. My guess is that increasing the dimensionality of the state space from four to six dimensions was the key cause of its failure. The problem is that as the dimensionality of the state space increases, it is necessary to enlist more and more states in order to represent it properly, but the more states used, the more observations necessary in order to populate the probabilistic model of the transitions between those states. In other words, this particular approach simply will not scale to large problems.

So I decided to try a different approach. I borrowed an idea I had read about in Bio-Inspired Artificial Intelligence, described below:

Yet another method consists in the coevolution of a robot's controller and of the simulator parameters that are most likely to differ from the real world and that may affect the quality of the transfer (Bongard and Lipson 2005). This method, also known as the estimation-exploration algorithm, consists of coevolving three populations, one encoding possible simulator parameters, one encoding robot controllers used only to find a good simulator, and the third also encoding robot controllers, but these controllers are meant to allow the robot to fulfill its task, such as forward locomotion. Coevolution happens in several passes through a three-stage process. In stage 1, a randomly generated population of controllers is evolved in the default simulator and the best individual is executed on the real robot while the time series of sensory values is recorded. In stage 2, the population of simulators is evolved for reducing the difference between the time series recorded on the real robot and the time series obtained by running the same controller within the simulator. The best evolved simulator is then used for stage 1 where a new randomly generated population of controllers is evolved and the best individual is tested on the real robot to generate the time series for another pass of simulator evolution. This two-stage coevolution is repeated several times until the error between simulated and real robot behavior is as small as possible. Then, the best simulator found so far is used by a third population (the third stage), in which robot controllers are again evolved, this time to elicit some desired behavior. The best controller from this population is then executed on the physical robot, and, if the evolved simulator was accurate, the real robot will exhibit the same behavior as that seen in the evolved simulator.

It was shown that 16 passes of the three-stage process were sufficient to evolve a good control system that could be transferred to an articulated robot. In this case, the real robot was used to test only 17 controllers: the 16 controllers used to find an accurate simulator, and the 17th evolved to enable the robot to perform some desired task (Bongard et al. 2006). The authors also showed that this approach could be used to adapt the control system to cope with a damaged robot (for example, loss of a leg, malfunctioning of a sensor, etc.). (pg 510)

Here is a link to the paper An exploration-estimation algorithm for synthesis and analysis of engineering systems using minimal physical testing (2004), describing the concept in much greater detail.

This is essentially what I did, and the results turned out to be far more effective than my attempts with the dynamic programming approach from before.

It is my strong conviction that it is critical to focus on learning mechanisms that can work effectively using a minimal number of actual interactions with the target environment ("physical testing".) For the double pole balance task, I could have very easily performed neuroevolution directly within the environment itself (as I did here), without all the bother of having it first inductively learn a simulation of it. However, there are many real-world situations in which interactions are expensive, sometimes in terms of money, but more often in terms of time.

There may be an analogy to be made with mental rehearsal, or daydreaming.

Saturday, November 20, 2010

Balancing with Dynamic Programming



I used dynamic programming to balance a simulated cart-pole. In this case, the task is learned after just 5 balancing attempts, although it usually does not learn the task quite so quickly.

The blue box on the right displays the internal states of the system; its "brain", if you will.

For this particular experiment I didn't use any form of evolutionary algorithm, nor neural network (perhaps contrary to the title of my blog). Instead, I partitioned the state vectors into several hundred discrete states (745 in this case), and used the observed transitions between these states to produce a Markov Decision Process. I then applied Dynamic Programming to that MDP to generate the control policy.

The video shows one of the runs that did a bit better than average. However, so far I have found that the problem will usually be solved in an average of about 10 balancing attempts (the very fastest I have observed so far is 2) so I think it is still representative of the resulting behavior.

As an example, here is a run that takes a bit longer than the first:



Note that this is the fully-observable version of the cart-pole task; in other words, it has access not only to the cart position and pole angle, but also the cart velocity and pole angular velocity. This makes it significantly easier to solve than the non-Markov version of the problem. I plan to attempt the non-Markov version in future experiments.

Here are some additional videos:
Balancing with Dynamic Programming 3 (6 trials)
Balancing with Dynamic Programming 4 (11 trials)
Balancing with Dynamic Programming 5 (4 trials)
Balancing with Dynamic Programming 6 (7 trials)
Balancing with Dynamic Programming 7 (2 trials)

Friday, July 02, 2010

Discretized RNN state-space, Part 2


Graph created using same setup as described in my previous post, but this time done with a larger number of hidden-state vectors (2500).

Thursday, June 24, 2010

Discretized RNN state-space



Here is a graph of the discretized state space of a recurrent neural network I trained on the embedded Reber grammar. Notice that the state space splits at the beginning in order to preserve information about the long-range transitions.