Tuesday, January 17, 2006

Non-Markov Tic-Tac-Toe Experiment



EXPERIMENTAL SETUP

1) The Neural Network (NN) always plays X; O is played by a hand-crafted computer opponent.
2) The computer opponent is of intermediate skill, it is not perfect, but can only be beaten if it is "forked" by X.
3) The NN is recurrent, and has multiple time steps to decide on a given move.
4) The NN controls a "focus of attention" (FOA) that can be moved around the TTT grid by using the following actions: turn right, turn left, move forward.
5) The FOA allows the NN to "see" only a single cell at a time. The input to the NN is 0 for an empty space, 1 for X, and -1 for O.
6) At the beginning of each turn the FOA of the NN is placed in the center of the grid facing a randomly selected direction (north, east, south, or west).
7) The NN is given no hint as to its current location or the direction it is facing. It must build up an internal representation based on what it has already seen through the FOA.
8) If the NN inadvertantly attempts to move the FOA out of the boundaries of the grid, it automatically forfits the game.
9) The NN selects its move by moving the FOA to a given cell and then performing a "halt" action. This "halt" action must be taken within the specified time-limit (50) or else it automatically forfits the game.
10) The move selected by the NN after halting must be legal (i.e. must be on an empty cell). If the cell is already occupied then it automatically forfits the game.
11) Feedback is given after playing a set of games (1000). Fitness is computed as the sum of the number of games won in the trial set. No credit is given for draws, losses, or premature forfits.


As you can see, these are rather Draconian measures. To summarize: in order to be successful, the NN is expected to learn to move around its "focus of attention" that only allows it to see a single location at a time while being careful not to accidentally move outside of boundaries it has not been informed of, create an internal representation of what it has already seen, use that information to navigate successfully to an empty spot, announce that it is halting, and do all of this repeatedly for each and every one of its turns in such a way that it manages to beat a pre-programmed opponent that can never be beaten without being forked! (Remember, it receives absolutely no credit for a draw).

RESULTS

So far, after a few hours of CPU time, it has managed to discover a solution that can win 64% of the games it plays, even under these drastically difficult conditions!!

No comments: