Monday, December 31, 2012

Learning to Generate Mona Lisa, Animated

This is a follow-up to my previous post. It shows the learning process over the course of 500 epochs. The neural network being used has 4 hidden layers of 300 rectified linear units.

EDIT: I just posted this to reddit, and as pointed out by quotemycode, it wasn't very clear what is going on, so I've added some clarification:

The point of this is not to improve on the original image. Of course, that is a nice goal, but that is not at all what I'm doing here. The point of this is to demonstrate the necessity of using deep, as opposed to shallow, neural networks to generate reasonably good results on this task. The reason I did that was because, while experimenting with a standard benchmark for deep NNs, MNIST (digit recognition), I found it could be solved to almost state-of-the-art levels with a very large single hidden layer. The other standard benchmark tasks are much larger scale, and require extensive use of GPUs to get good results. I, however, do not have access to this level of hardware. The advantage of something like this is that I can get feedback in a matter of minutes using a relatively low-end computer. For example, MNIST is 784 dimensions of input times 60,000 training examples. The image I used here is 300x429. This means that I can run through an epoch of training about 182 times faster than MNIST. This fast feedback loop is really useful when experimenting with different deep architectures. So that is why I'm doing this; it isn't about generating better images, it's about having a faster way to experiment with deep architectures. And it happens to generate some neat looking animations in the process, which is why I posted it.

EDIT 2: Just added source code for this on GitHub:


Edwin said...

Nice stuff. It inspired me to try the same, however I have a hard time trying to reproduce your results.

I am currently trying to implement this in Java using Encog 3. For this I am using the ResilientPropagation trainer. My network layout is as follows: I use three hidden layers using my own implementation for the rectified linear activation function max(0,x), with 300 neurons each. In the output layer I use the sigmoid activation function. Hidden and output layers use biases.

Input and target values are all within [0,1] range.

Training with 300 neurons in each hidden layer takes forever on my laptop. The network produces saturated colors and does not really seem to converge to the image. Also when I reduce the number of neurons to around 30 in each layer I get mostly saturated colors.

Any hints on what I am doing wrong?

Thomas said...

Hi Edwin,

Here are a few suggestions I can think of:

(1) My implementation of the rectified linear neuron allows for a slight slope, so that if x >= 0 then f(x) = x, else f(x) = 0.01*x. This makes it so the derivative never goes completely to zero. Thus derivative is: if x >= 0 then f'(x) = 1, else f'(x) = 0.01.

(2) I found it very important to experiment with initial random link weights. I'm generating them using Gaussian with std dev of 0.3/sqrt(INSTAR_SIZE). So if a given neuron has 100 inputs feeding into it, the std dev would be 0.03.

(3) I'm using a fixed learning rate param of 0.1. Vanilla backprop, no momentum or anything fancy.

(4) Very important to randomize the order in which the pixels are trained. I initially did them in scan order but got very bad results.

(5) I'm using linear outputs, not sigmoid.

(6) Experiment on smaller images first to see if you are getting good results.

Hope that helps! I'm also going to be posting my code for this in the next week or two. My github profile is:

Thomas said...

Just posted the source code on GitHub: