Inspired by a blog post I saw some time ago about about evolving an image of the Mona Lisa, I had an idea for a task that might better highlight the benefits of deep neural networks.
The idea is to use supervised regression to learn a mapping between pixel locations and RGB colors, with the goal of generating an image on pixel at a time. This means that if the dimensions of the target image are X-by-Y, then it is necessary to run the network XY times. If the image is 100-by-100, there will be 10,000 training examples.
All of the locations and RGB values are first normalized to fall into the range [0, 1].
So, as a concrete example, if the pixel in the very middle of an image is beige, then the training example associated with that pixel could look like:
(0.5, 0.5) → (0.96, 0.96, 0.86)
Again, the two values on the left hand side are representative of location, and the three values on the right hand side represent red, green, and blue, respectively.
I've begun doing some preliminary testing of this idea. I'm training neural networks to generate the Mona Lisa, pixel by pixel.
The first picture (above) is the result of training a neural network with a similar architecture to the one I used successfully on the MNIST task. It only uses a single hidden layer.
Now, what happens if we instead use three hidden layers?
The result of using multiple hidden layers is quite perceptibly better.
One drawback is that there is no straightforward way to measure anything like generalization (there are no test or validation sets.) I suppose it might be possible to hide certain small parts of the picture during training to see how well the neural network can fill in the gaps, but this idea seems fraught with difficulties.
However, I'm not overly concerned. Just learning to generate the picture appears to be a very difficult learning challenge, one that already seems to clearly highlight the limitations of shallow neural architectures. So although this task lacks some of the characteristics of those normally found in machine learning, I think there is still much to be gained from further exploration.
EDIT: Just added source code for this on GitHub: https://github.com/evolvingstuff/MonaLisa