Friday, March 06, 2015

My favorite machine learning papers from the last 6 months

Compete to Compute
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
Invariant backpropagation: how to train a transformation-invariant neural network
Memory Networks
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
DRAW: A Recurrent Neural Network For Image Generation
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
A New Framework to Probe and Learn Neural Networks
Text Understanding from Scratch
Do Deep Nets Really Need to be Deep?
Effective Use of Word Order for Text Categorization with Convolutional Neural Networks
Why does Deep Learning work? - A perspective from Group Theory
FitNets: Hints for Thin Deep Nets
Exploring Invariances in Deep Convolutional Neural Networks Using Synthetic Images
Deep Fried Convnets
Deep Speech: Scaling up end-to-end speech recognition
Recurrent Models of Visual Attention
Teaching Deep Convolutional Neural Networks to Play Go
Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
An exact mapping between the Variational Renormalization Group and Deep Learning
Advances in Optimizing Recurrent Networks
Very Deep Convolutional Networks for Large-Scale Visual Recognition
k-Sparse Autoencoders
A Clockwork RNN
Random feedback weights support learning in deep neural networks
Learning to Execute
Neural Turing Machines

Wednesday, August 14, 2013

Visualizing Individual Neuron Activations in Deep Neural Networks

Each of the pictures above corresponds to the activations of single neurons chosen from within the hidden layers of a deep neural network trained to generate and image of the Mona Lisa (a task which I've previously described here and here.)

The global inputs to the neural network consist of 2-dimensional tuples of (x, y) coordinates, and this arrangement allows the behavior of single neurons to be visualized completely. Each pixel location corresponds to a different (x, y) input, and the pixel intensity at that location represents the scalar activation of the neuron when the neural network as a whole is exposed to that particular coordinate-pair.

In order to generate each of the images, which when full size are 429-by-300 pixels, the neural network must be run 128,700 times.

Here is the end result:

Monday, December 31, 2012

Learning to Generate Mona Lisa, Animated

This is a follow-up to my previous post. It shows the learning process over the course of 500 epochs. The neural network being used has 4 hidden layers of 300 rectified linear units.

EDIT: I just posted this to reddit, and as pointed out by quotemycode, it wasn't very clear what is going on, so I've added some clarification:

The point of this is not to improve on the original image. Of course, that is a nice goal, but that is not at all what I'm doing here. The point of this is to demonstrate the necessity of using deep, as opposed to shallow, neural networks to generate reasonably good results on this task. The reason I did that was because, while experimenting with a standard benchmark for deep NNs, MNIST (digit recognition), I found it could be solved to almost state-of-the-art levels with a very large single hidden layer. The other standard benchmark tasks are much larger scale, and require extensive use of GPUs to get good results. I, however, do not have access to this level of hardware. The advantage of something like this is that I can get feedback in a matter of minutes using a relatively low-end computer. For example, MNIST is 784 dimensions of input times 60,000 training examples. The image I used here is 300x429. This means that I can run through an epoch of training about 182 times faster than MNIST. This fast feedback loop is really useful when experimenting with different deep architectures. So that is why I'm doing this; it isn't about generating better images, it's about having a faster way to experiment with deep architectures. And it happens to generate some neat looking animations in the process, which is why I posted it.

EDIT 2: Just added source code for this on GitHub:

Generating Mona Lisa Pixel By Pixel

I've been experimenting with MNIST for awhile now, and have recently come to the conclusion that it is not a very good problem for highlighting the strengths of deep neural networks. I was able to get near state-of-the-art results (for neural nets; not RBMs), using just a single (albeit large) hidden layer of 6000 rectified linear neurons.

Inspired by a blog post I saw some time ago about about evolving an image of the Mona Lisa, I had an idea for a task that might better highlight the benefits of deep neural networks.

The idea is to use supervised regression to learn a mapping between pixel locations and RGB colors, with the goal of generating an image on pixel at a time. This means that if the dimensions of the target image are X-by-Y, then it is necessary to run the network XY times. If the image is 100-by-100, there will be 10,000 training examples.

All of the locations and RGB values are first normalized to fall into the range [0, 1].

So, as a concrete example, if the pixel in the very middle of an image is beige, then the training example associated with that pixel could look like:

(0.5, 0.5) → (0.96, 0.96, 0.86)

Again, the two values on the left hand side are representative of location, and the three values on the right hand side represent red, green, and blue, respectively.

I've begun doing some preliminary testing of this idea. I'm training neural networks to generate the Mona Lisa, pixel by pixel.

The first picture (above) is the result of training a neural network with a similar architecture to the one I used successfully on the MNIST task. It only uses a single hidden layer.

Now, what happens if we instead use three hidden layers?

The result of using multiple hidden layers is quite perceptibly better.

One drawback is that there is no straightforward way to measure anything like generalization (there are no test or validation sets.) I suppose it might be possible to hide certain small parts of the picture during training to see how well the neural network can fill in the gaps, but this idea seems fraught with difficulties.

However, I'm not overly concerned. Just learning to generate the picture appears to be a very difficult learning challenge, one that already seems to clearly highlight the limitations of shallow neural architectures. So although this task lacks some of the characteristics of those normally found in machine learning, I think there is still much to be gained from further exploration.

EDIT: Just added source code for this on GitHub:

Monday, December 17, 2012

MNIST Features

These features are not very pretty, but oddly enough work very well on MNIST.

Sunday, December 16, 2012

Learning MNIST with shallow neural networks

EDIT: Source code for this experiment:

I've recently been experimenting with the MNIST task using shallow (only a single hidden layer) neural networks. Interestingly enough, when borrowing some of techniques used in deep neural nets, such as rectified linear neurons, and using a large number of hidden units (6000 in this case), the results are fairly good.  It gets less than 1.2% test error after about 30 passes through the training set.

Here are some visual examples of the strength of the link weights connecting the 28-by-28 pixel inputs to the hidden neurons: