Visualizations of Deep Learning, Recurrent Neural Networks, and Evolutionary Algorithms
Thursday, July 06, 2006
Train of Thought
I’ve been thinking about a GUT (Grand Unifying Theory) of AI, about what form it might take if such a thing were to be found, and I've gone back over some ideas I've had in the past. Really all I've done so far is come up with arguments against certain approaches. For example, I argued that although one could potentially do analysis based upon structure, behavior, or fitness, both structural and behavioral examination run into seemingly unresolvable problems with only a little bit of thought. The main argument is that examining systems on a low level of detail is bound to destroy any useful ‘meaning’. For example this would be like comparing two stories at the level of letters and punctuation. It loses the important meaning. In other words by doing this we are in no way analyzing the stories in a way at all related to the way they will eventually be analyzed when they are assigned a fitness. That leaves us with just fitness to analyze. I also made the observation (obvious) that two solutions of equal fitness could not be guaranteed to exhibit the same behavior, and even less strong would be their tendency to possess the same structure. There are a number of other metrics we could look at as well, for example, the ‘complexity’ of the behavior. But how are we to recognize ‘complexity' in the first place? And is it necessarily a good thing? For example, Occam’s razor would suggest that we should wish to try simple models first, thus minimizing complexity and then allowing it to gradually increase. However, models of the brain (Neural Darwinism) suggests that the opposite approach is taken by the living brain; it starts with more connections and neurons than necessary and then prunes them away to generate the correct behavior. However, this suggests an architecture so complex that the correct behavior was already there just waiting for discovery. Neural network algorithms using both growth and pruning have been shown (claimed) to be more effective than static architectures. However, there are other ways of getting at the simplest model as well. One is to add noise (as I have done), and another is to combine multiple networks to cancel out noise - these are known as ‘ensemble’ methods, Including mixture-of-experts, boosting, and bagging. Other algorithms allow direct evolution of the architecture, such as NEAT and Genetic Programming, although both of these appear to suffer from ‘bloat’. But, given the ‘junk DNA’ found in nature, this may be an unavoidable aspect of complexity. Rather than increasing the size of a given model, some suggest building directly on top of it with fresh material, such as Rodney Brooks' subsumption architecture, Cascade Correlation Algorithm, and Schmidhuber’s Probabilistic Incremental Program Evolution with Filters. Unfortunately the last few mentioned require a model, i.e. are ‘supervised’ learning methods, and are therefore limited to a vastly smaller domain of possible problems. So, back to the issue of complexity, what method gives us an ‘optimal' level of complexity, and does such a thing even exist? We can look at Wolfram and Langton's definition of complexity in Cellular Automata - the 'edge of chaos' which appears to be a predominant theme in many areas of A-Life. So might think that it would be useful to hold a system near the edge of chaos in order to encourage complex things to emerge from it. But then again this might not be necessary, as many have suggested that evolution pushes systems to within that region of processing anyway without the need for any special encouragement. Also, how do we encourage 'complexity’ even if we wanted to? It is a difficult concept to define and compute, although Langton gave a fairly good definition, which was to maximize the information processing of the system. Deterministic Systems are not very sensitive to their inputs, whereas chaotic systems are too sensitive to their inputs, and are therefore essentially random. So we can analyze systems by looking within populations for uniqueness, or within a single program for the behavior of it’s elements. Another method that has been seemingly successful at pushing systems to the edge of chaos is competition – predation and parasitism – as has been show elegantly by the sorting experiments of Danny Hillis. But the problem here is that not all problems have an obvious way of being converted to a competition. Some tasks are inherently solitary ventures. I guess the point is that for nearly every point of view I take I can come up with a very valid exception, or something of importance that is missing in the model being considered. It makes me think that perhaps there is no simple "Theory of Everything" in the realm of Machine Learning, or that the only situations in which that will be the case are by necessity so impoverished as to be uninteresting. An example: the game theoretic strategy Tit-For-Tat in the Prisoner's dilema - it is simple and elegant, but this is only because the possible range of behaviors is so small (binary), there is no noise, and the system only has access to memory of the previous timestep, nothing more. Anyway, I feel no closer than before, and so that will be all for now...
Subscribe to:
Posts (Atom)