Monday, December 31, 2012

Learning to Generate Mona Lisa, Animated

This is a follow-up to my previous post. It shows the learning process over the course of 500 epochs. The neural network being used has 4 hidden layers of 300 rectified linear units.

EDIT: I just posted this to reddit, and as pointed out by quotemycode, it wasn't very clear what is going on, so I've added some clarification:

The point of this is not to improve on the original image. Of course, that is a nice goal, but that is not at all what I'm doing here. The point of this is to demonstrate the necessity of using deep, as opposed to shallow, neural networks to generate reasonably good results on this task. The reason I did that was because, while experimenting with a standard benchmark for deep NNs, MNIST (digit recognition), I found it could be solved to almost state-of-the-art levels with a very large single hidden layer. The other standard benchmark tasks are much larger scale, and require extensive use of GPUs to get good results. I, however, do not have access to this level of hardware. The advantage of something like this is that I can get feedback in a matter of minutes using a relatively low-end computer. For example, MNIST is 784 dimensions of input times 60,000 training examples. The image I used here is 300x429. This means that I can run through an epoch of training about 182 times faster than MNIST. This fast feedback loop is really useful when experimenting with different deep architectures. So that is why I'm doing this; it isn't about generating better images, it's about having a faster way to experiment with deep architectures. And it happens to generate some neat looking animations in the process, which is why I posted it.

EDIT 2: Just added source code for this on GitHub:

Generating Mona Lisa Pixel By Pixel

I've been experimenting with MNIST for awhile now, and have recently come to the conclusion that it is not a very good problem for highlighting the strengths of deep neural networks. I was able to get near state-of-the-art results (for neural nets; not RBMs), using just a single (albeit large) hidden layer of 6000 rectified linear neurons.

Inspired by a blog post I saw some time ago about about evolving an image of the Mona Lisa, I had an idea for a task that might better highlight the benefits of deep neural networks.

The idea is to use supervised regression to learn a mapping between pixel locations and RGB colors, with the goal of generating an image on pixel at a time. This means that if the dimensions of the target image are X-by-Y, then it is necessary to run the network XY times. If the image is 100-by-100, there will be 10,000 training examples.

All of the locations and RGB values are first normalized to fall into the range [0, 1].

So, as a concrete example, if the pixel in the very middle of an image is beige, then the training example associated with that pixel could look like:

(0.5, 0.5) → (0.96, 0.96, 0.86)

Again, the two values on the left hand side are representative of location, and the three values on the right hand side represent red, green, and blue, respectively.

I've begun doing some preliminary testing of this idea. I'm training neural networks to generate the Mona Lisa, pixel by pixel.

The first picture (above) is the result of training a neural network with a similar architecture to the one I used successfully on the MNIST task. It only uses a single hidden layer.

Now, what happens if we instead use three hidden layers?

The result of using multiple hidden layers is quite perceptibly better.

One drawback is that there is no straightforward way to measure anything like generalization (there are no test or validation sets.) I suppose it might be possible to hide certain small parts of the picture during training to see how well the neural network can fill in the gaps, but this idea seems fraught with difficulties.

However, I'm not overly concerned. Just learning to generate the picture appears to be a very difficult learning challenge, one that already seems to clearly highlight the limitations of shallow neural architectures. So although this task lacks some of the characteristics of those normally found in machine learning, I think there is still much to be gained from further exploration.

EDIT: Just added source code for this on GitHub:

Monday, December 17, 2012

MNIST Features

These features are not very pretty, but oddly enough work very well on MNIST.

Sunday, December 16, 2012

Learning MNIST with shallow neural networks

EDIT: Source code for this experiment:

I've recently been experimenting with the MNIST task using shallow (only a single hidden layer) neural networks. Interestingly enough, when borrowing some of techniques used in deep neural nets, such as rectified linear neurons, and using a large number of hidden units (6000 in this case), the results are fairly good.  It gets less than 1.2% test error after about 30 passes through the training set.

Here are some visual examples of the strength of the link weights connecting the 28-by-28 pixel inputs to the hidden neurons: