Recently I have started testing my Neural Network software against some real world data sets. The first dataset I tested against was the Pima Indian Diabetes dataset, and my system was able to achieve a fairly good training error (19%), but it did not generalize as well to the test set.
Many neuro-evolutionary algorithms (such as Covenet) use regularization terms in order to minimize the structural resources of the network in order to improve generalization. Presumably, the smaller the number of nodes and/or linkweights, the lower the effective VC-dimension.
However, introducing dynamic network topology adds a host of programmatic complexities and design decisions that I would like to avoid. So I thought of an alternative way to "stress" the resources of the network: random noise.
My hypothesis is that by injecting random noise into the operation of the neural network, it will force the network to form a more distributed and robust internal representation, because it will not be able to be too dependent on the activation of any single element. I also suspect that because functionality cannot become too dependent on any single element, the network will have more flexibility in terms of being able to steadily alter its functionality bit by bit; hence it may be more continuous and "evolvable".
My initial experiments with this approach have met with success; I've found that generalization generally improves substantially when noise is injected into the training process.
Additionally, if, after training, the noise is removed from the processing of the network, its performance further improves. Doing this it has managed to reach levels of generalization that were previously unattainable without the addition of noise.