Sunday, December 16, 2012

Learning MNIST with shallow neural networks

EDIT: Source code for this experiment: https://github.com/evolvingstuff/MNIST

I've recently been experimenting with the MNIST task using shallow (only a single hidden layer) neural networks. Interestingly enough, when borrowing some of techniques used in deep neural nets, such as rectified linear neurons, and using a large number of hidden units (6000 in this case), the results are fairly good.  It gets less than 1.2% test error after about 30 passes through the training set.

Here are some visual examples of the strength of the link weights connecting the 28-by-28 pixel inputs to the hidden neurons:





23 comments:

Anonymous said...

Do you publish your code anywhere?

Thomas said...

I plan to within the next few days (need to clean it up a bit first.) Here is my github page: https://github.com/evolvingstuff

goodfellow.ian said...

What training algorithm did you use? SGD? Are you using any regularization?

Thomas said...

SGD, no regularization, fixed learning rate of 0.1. Very simple.

goodfellow.ian said...

What kind of output units and cost function are you using? What size of minibatch do you use? If the minibatch is > 1 example, do you use the mean or the sum of the costs for each individual example?

I tried using minibatch size = 100, output units = softmax, cost = mean across minibatch of negative log likelihood and got a test set error of 2.1%.

goodfellow.ian said...
This comment has been removed by the author.
goodfellow.ian said...

Deleted a comment I made a few minutes ago. I thought I'd gotten closer to reproducing your result, but I was looking at the wrong terminal window and quoting the test error for a different model.

Thomas said...

I'm using linear outputs. I tried softmax at one point with cross entropy, but got better results with just linear. The chosen answer is whichever component of the output vector has the highest activation. Not using minibatches; doing SGD.

One other thing I'm doing that I had failed to mention before - in addition to the connections to and from the hidden layer, there is also connectivity directly between the input and output layers.

goodfellow.ian said...

What loss function do you apply to the linear outputs?

Thomas said...

Sum of squared errors. I realize this is not the preferred method for multinomial classification, as you can't interpret the outputs as probabilities, but it seemed to get me good results.

goodfellow.ian said...

Any chance you could post your code? I'm still not getting the same result after changing the output layer to linear + mean squared error and adding the direct connections between the input and the output.

Thomas said...

Yup, I plan to post the code soon (once I've made it more fit for public consumption.) Curious, what results did you manage to get, and how many epochs did you let it run for?

goodfellow.ian said...

The best I've seen so far with this method is about 1.7 % test error. I'm using the last 10k examples as a validation set and reporting test set error for the epoch that got the lowest validation set error. It quits running if it goes 5 epochs without improving the validation set error. Usually that's around 30 epochs.

Thomas said...

Just put the source code up on github: https://github.com/evolvingstuff/MNIST

(Also edited post with a link to it as well)

goodfellow.ian said...

How should I run the training experiment?

src/com/evolvingstuff/App.java looks like it's hardcoded to load a previously saved model from "data"... which looks like a broken path. I assume that was meant to be "data/already_trained," but how do I run the training itself instead of loading a model?

Thomas said...

Oops. doLoad should have been false. I've updated the repository (you can also just set it to 'false' in your local copy.) When it saves, it saves into the data/ directory. If you wanted to load the saved data, you'd need to copy it from data/already_trained/ into data/. I just didn't put it in there to begin with because it will get overwritten every time it saves a new improvement. Sorry about the confusion.

Thomas said...

To expand/clarify my previous comment, the files in data/already_trained/ correspond to the results of running the program for 27 epochs (111 errors on test set.) That pre-trained data was not meant to be used by default - I just included it so that if someone wants to experiment with an already trained network, the data is there to do so.

I'll need to add a README to the repository soon that explains all of this.

goodfellow.ian said...

Thanks for the clarifications. I started running it again this morning, and it's down to 1.35% test error, pretty impressive.

One comment, which I'm not saying to criticize you, but to hopefully make your life easier, is that your code seems really slow for the task you're running. I saw from your other posts you were looking for a task that's faster to evaluate. I think you would probably be better off speeding up your code, rather than simplifying your task. From a quick glance through your repository it looks like you've written everything in terms of lists of java doubles. If you switched to using some kind of matrix/vector math library it would all run incredibly faster. I don't know what your options are for matrix/vector libraries in java. Some popular options for writing fast neural network code include matlab, lua and a library called torch7 that provides a matlab-like environment, or python+numpy/scipy or python+theano.

Thomas said...

Glad it is finally working! (albeit very slowly)

I very much appreciate the suggestions. I don't take them as criticism; I'm aware that it is painfully slow at the moment. (However, you'll notice that for the matrix multiplies I am using 2D arrays - not as slow as List generics - but of course, it is not anywhere near as fast as a matrix library would be.) The main reason I haven't switched to a faster implementation (yet) is that I have been experimenting with a lot of different approaches. My current impression (perhaps incorrect) is that I have more flexibility to radically change what I'm doing by remaining with a simple, single-threaded paradigm until settling on an architecture I'm satisfied with. For example, many of the things I've tried against MNIST don't even involve neural networks at all.

That being said, I do still believe there is a great value in finding tasks against which to test deep neural networks (or other architectures) that are much faster to evaluate than MNIST. Anything that decreases the delay in the feedback loop is, in my opinion, tremendously useful.

Thomas said...

This looks interesting: http://jblas.org/

Aditya said...
This comment has been removed by the author.
Roberto said...

Hi, just a brief comment on speed-up. I suggest you to use Eigen. You can move to C++ or use a Java wrapper: https://github.com/hughperkins/jeigen

YANG Xuewen said...

Hello, I am a beginner in neural network. Can you explain some questions for me? Thank you.
1. weights[k][i] = r.nextGaussian() * init_weight_range * fan_in_factor; To get the weights, why do you multiply fan_in_factor? Do you think that it will make the weights too small?
2. For ReLU that you used, is it a leaky RelU? ReLU is max(0, x). Why for x < 0, you use slope*x?
3. For the matrix that got using your code, hidden.mtrx and readout.mtrx, how to use these two matrix on new images, for example, a random image in MNIST?
4. Could you suggest some papers you read when you were doing this? I saw you posted some wonderful papers you read recently.
Thank you.