User:Super Quantum immortal/Neural networks

From Wikiversity
Jump to navigation Jump to search

(fix the formating latter)

3.1 Neurons

A neuron has little root-like projections (increase cell surface), and a long axon, it's membrane, has little ion pumps(proteins), that work all the time, and polarizes the membrane as much as they can go. There are 3 types of pores(proteins) , those that interact with certain chemicals from the out side(from typically 10000 synapses), they change conformation, and open, permitting the depolarization of the membrane locally, if the membrane is depolarized beyond a certain threshold, a field sensible type of pore change conformation and open up temporarily, at this point the membrane depolarizes further, the depolarization spreads to nearby region, prompting the opening of the there field sensible pores, at the original spot the field sensible pores has started to close(and get blocked for some time), and polarization resumes, by the pumps. The new region of depolarized membrane, prompts the field sensible pores in a new region around it to open, while in the initial region, the original situation is regenerated, this continuous as a chain reaction and travels trough the axon, forking typically in 1000 synapses. The signal reaches the end of the axon, there, they are packed vesicles with certain molecules, the signal prompts the liberation outside of the molecules, they cross a small distance, and interact with the chemical sensible pores of the target neuron. If the synapses, that are targeting a particular spot on the target neuron, manage to depolarize it enough, the process restarts all over again.

Lastly they are also inhibitory pores, when active, they open and let in selectively a charged particle(favorable electrochemical potential), that increase the polarization of the membrane, hence activation gets harder. Alcohol among other things, activate certain of these inhibitors and deactivate the excitatory 1s, in effect the intoxicated person, has a temporary chemical lobotomy in certain brain parts. 3.2 Methodology-global

the blind men and the elephant

When you want to study a simple system, studying it every part in detail is feasible. Real life problems however, have typically a number of variables well above what our calculation capabilities can handle. The typical precise investigation, would require more time, then the age of the universe, the memory would be greater then the number of particles in the universe, and the minimum theoretical energy required(E(flip 1bit)=K*T*ln(2)), would be greater then the energy contained in the universe.

caenorhabditis elegans

Caenorhabditis elegans (nematode worm) has exactly 302 neurons(yes, 3*100+2 units), it's feasible to study every 1 of them in extreme detail. But a typical human brain has around 100 billion neurons, with an average of 10000 synapses, that's 1 quadrillion synapses, the number of possible recombinations is greater then the number of atoms contained ..... in the known universe. AIs main reason that they are stupid, is because there hardware is no good, with the study of the human brain the problem is orders of magnitude worse. Clearly studying the human brain the brute force way, neuron per neuron, is not going to work, even if we could, it would be a huge meaningless opaque list of numbers.

But we can extract useful conclusions by studying the bulk properties of great numbers of neurons, in an abstract way, this is kind of an average. The same thing is done, in physics, when you have a system with 10¨23 particles, properties like temperature, pressure, volume, electric resistance, are important, here we have 10¨15; synapses, so if you don't believe in this methodology you should stop believing in the validity of temperature, volumes, pressure and many, many others.

Actually this is the only way to have a grasp of our brain, our brain can understand only less complicated objects then him self, so it can only understand a simplified model of it self.

   Maxwell equations
   ∂Ex/∂x + ∂Ey/∂y + ∂Ez/∂z = ρ/ε0
   ∂Bx/∂x + ∂By/∂y + ∂Bz/∂z= 0
   (∂Ez/∂y-∂Ey/∂z)1x + (∂Ex/∂z-∂Ez/∂x)1y + (∂Ey/∂x-∂Ex/∂y)1z = -∂B/∂t
   (∂Bz/∂y-∂By/∂z)1x + (∂Bx/∂z-∂Bz/∂x)1y + (∂By/∂x-∂Bx/∂y)1z = µ0J + µ0ε0∂E/∂t
   pouahhh, whats all those curly things!!!
   Ohm's law
   thats much more beater

3.3 Network

real neural network

Collectively, neurons perform, automated, machine learning. Basically, learning is achieved by repetition, when 2 neurons are exited simultaneously, a metabolic change reinforce(more vesicles,more synapses....) there mutual synapses to each other. This way, when only 1 of them is active, its firings will have greater probability to activate the other 1. After some time of unuse, the synapse degrades and detaches, neurons are actively searching with there axons, like little tentacles, for opportunities to make new synapses with fellow neurons.

Lets take a flat grid of neurons, all neurons are interconnected with all other neurons at random. We activate some neurons, in the shape of a pattern, say "A" repeatedly. All neurons that are active, will have there mutual synapses reinforced, in a "A" shape area of the network. The out puts of the neurons are, automatically inputs for a new round of integration, if the synapses are strong enough, the network stabilizes dynamically, in the shape of the pattern.

Now, we cleanse the network and we present a new pattern, a corrupted "A", pixels are missing, and grain of dust are all over the place. In the pixels, that are now black, but in the learning pattern where white, the neurons, aren't very connected with others, so there firings will not contribute much to the activation of other neurons. In the pixels that are now black, and in the learning pattern where also black, the neurons are strongly interconnected with each other, and most importantly, with those that where black during learning but now white, all of them in the shape of the learning pattern. The firings of the now black/then black activated neurons is enough to activate the neurons that are now white/then black. The process ignores the corruption, and reconstructs the correct pattern "A", from less then perfect data, as long that the data resembles to the already learned pattern.(Hebb 1949)

A network can pack inside, more then 1 pattern, neurons on black pixels that overlap on multiple patterns, try to excite all there buddies from all patterns, but the not yet excited buddies need to receive above a certain threshold of impulses, in order to activate, so normally the most numerous buddies of 1 learned pattern that get directly activated, will manage to excite the remaining pattern buddies. Normally a single neuron can't activate on its own, an other neuron.

Temporal integration is also possible, if the signals between neurons aren't instantaneous as previously. At learning we present a first pattern, the firsts neurons are activated and fire, there connections to various neurons are of varying length but the signal go at the same speed, thus each of there output synapses will fire with varying lags in time. While the signals are still traveling, we present a second pattern, the synapses that fire at this time from the first pattern on the neurons activated by the second pattern will be reinforced. Now the remaining signals from the first pattern plus the new signal of the second pattern will reinforce the appropriate synapses when a third pattern get presented, and so on. At use, we present a couple of patterns that resemble the learning patterns, in more or less the correct timing and order, the network will figure out the next patterns to activate, like a little video.

3.3.1 hash table

click me

A hash table, uses an intelligently crafted mathematical function, that treats any input as a number, and spits out the physical position(a number) in a storage device, where the desired information for the input was stored, plus error avoidance tricks because of collisions. Web sites use this baroc system a lot, because they have very fast query times through big amounts of information. Other then geometrically, neural networks can be seen as a particular case of a "hash table", with the difference that the specially crafted function directly gives the required information, while collisions happen for inputs that are deemed similar.



| `-- # )

\      /

3.3.2 simple model

neural microchip

Simplified network model, that still works. For real life problems, some very convoluted optimizations can be used.

Outs Ins

   y1:= a(1)x1 +a(2)x2+ ...a(n)xn
   y2:= b(1)x1 +b(2)x2+ ...b(n)xn
   yn:= z(1)x1 +z(2)x2+ ...z(n)xn

(x1, x2, x3, .... xn) : "n" inputs, either 1 or 0, from neurons n° 1,2,3, ... n.

= defined here as, if whats on the right, its above a certain threshold, on the left we get 1, if its below, then its 0.

(y1, y2, y3, .... yn) : "n" outputs, 1 or 0,of neurons n° 1,2,3, ... n, after integration.

ai, bi, .... zi n² synaptic weights, setuped during learning.

each line represents one neuron, it takes information about the excitement of all other neurons(including him) and makes a weighted addition. If the result is above the threshold, then it out puts 1, if its below, it outputs 0.

Simplified learning: At the beginning, the weights have an initial none 0 value. We present an input, if neuron n° m has input of '1', then all its synaptic weights with neurons that also have an input of '1' will be incremented up, the rest will be incremented down. If neuron n° m has input of '0', then all weights are incremented down. The weights can't get negative.

And say hi to Rose.

book funel

The complexity of the synaptic weights is very high, (n² decimal numbers) allowing the storage of a big number of patterns (1 pattern, is n "0 or 1" numbers). But still this isn't very clear of how multiple patterns can get crammed inside. With 2 paterns, when we present partial patern 1, the neurones that are active have colectively strong synaptic conections with the neurones of patern 1, thus activating them. Only at the intersection of the 2 patern, the active neurones have strong synaptic conections with neurones of patern 2, by design of the network the intersection of 1 and 2 paterns, is not enough to reach the threshold of patern 2 neurones. Same thing can happen with an arbitrary n° of paterns. The n° of paterns that can be crammed in a net, depends on its design parameters and the nature of the paterns, its not automatic capasity.

Of course, this is a simplified model for a simplified educational explanation. In a general case, we have a complicated correlation, between, number of neurons, number of patterns, increment size, deincrement size, threshold value, real statistical distribution of patterns, none linear weighted addition, etc... The basic idea remains valid and adapts to the whatever specificities of the other networks.


3.4 Data classification

number's classification

The simple example can only store data, we are interested here in animal learning, so the network, should be able, to connect, a specific, input with a specific classifying out put. We compartmentalize the network in to input and output neurons(they don't overlap). At learning, the raw patterns will be presented to the input neurons and a second classifying pattern on the output neurons. At use, we only present the raw patterns, if they are recognized successfully, there corresponding classifying pattern will light up.

Example, we have raw patterns of numbers, we use 10(0-9) output neurons, at learning we present 0s to the inpout neurons and activate the first neuron in the output neurons, when we present 1s, we activate the second output neuron, and so forth. When we present an 0 alone, the first neuron, will light up, when we present a 1 alone, the second will light up, and so forth.

3.5 No free lunch

no free lunch

It's not straight forward, to set up a useful network, specificities in there learning complicate there use. Lets set up a graph, on the y axis, it's the error, of the network, the difference between the desired output and the actual out put, we linearize it in some way, for the purpose of the graph, on x axis, it's simply the learning cycles. We present to the network a body of patterns in each learning cycle, at the same time, we graph it's performance. At the beginning, no learning has occurred, so it will simply give some random result, error very high, as learning progresses, error goes down, until it flats out at some minimum, depending on the complexity of the patterns and the number of neurons. This is intuitively, already known by people, the more you repeat something, the beater you get at it. We start again, but this time we also check it against a body of patterns that resemble to the learning data, but are not taken in to consideration while training. At the beginning it will give something random, not a surprise, as learning progresses, the error on the extra patterns will also go down, but remain higher when compared with the learning data. Again not a surprise, people know that already, you are efficient if tested with exactly what you learned. At some point, the error in the extra data, starts to rise, despite the fact that in the learning data it still gets beater(don't believe me? Check this out your self, 19/09/2009 files or up to date files). The network has overlearned.

left: what you knew, right: what you didn't knew

An actual reproduction of the second curve, could be in an experiment like this: You take a big number of random groups of, say 100 people. Each group learn some class of task, exactly the same. At various points in there learning progression, we present them once with a group of similar tasks, but not the same. There results are averaged, averaging away individual variability and the group is discarded. This way we can buy pass the impossibility not to take in too account the unknown patterns while learning. This is applicable as a inexpensive experiment on a website, visitors would be distributed randomly in the various groups.

pop out and full size Stupid OCR your browser does not support the video tag, install

Its a prime example of more is less. Neural networks are supposed to generalize, from imperfect data, on other imperfect data. The catch, is that at some point, the network start integrating as being part of the meaningful information, the imperfections in the learning data, and when in use, it will also try to take in to account the imperfections in the presented patterns, this will give excellent recognition for the learning patterns them selves, but will do very poorly on patterns it has never encounter. IN A PERFECT WORLD, this would NOT be an issue, the crux of the problem, is that raw data are never perfect. This problem is quite serious, as the results can get completely nonsensical at the extremes of the spectrum. The quality of the error is also different, depending where you are on the curve. Before the minimum, for activation, the network wants inputs, that are rather intense(lots of neurons), but precision doesn't mater much. After the minimum the network tends to output too much its learning patterns(when not really supposed too), it over-recognizes or over-extrapolates.

3.6 Neural anti-aliasing

Aliasing in small image, clic for a beater view

"In order for a complex strainer to drain H2O out and not the noodles, it is necessary and sufficient that the holes diameter be sensibly smaller than the 1 of the noodles."

"In order for a complex strainer to drain noodles out and not H2O, it is necessary and sufficient that the holes diameter be sensibly smaller than the 1 of H2O"

The real world, is very complicated, infinitely complicated, theres no way you will fit all its complexity in a network. Because of this, theres no other way but to sample the real world, however this sampling, when done incorrectly creates artifacts.

This problem in image processing is aliasing, image artifacts appear because of incorrect sampling of the real world. Anti-aliasing, are sophisticated techniques that blur the image of its meaningless details. Similarly but in a more abstract level, the solution for a network is to "blur" the meaningless information. This is why i didn't bother with thumbnails where proper anti-aliasing is applied, browsers just rescale quickly and dirtily, if you recheck the images you'll notice that all of them have artifacts.

The solutions, can be described, globally as filtering the data, in order to retain what is meaningful on a case by case basis. A universal neural net, is simply impossible networks are specialized in filtering and reinforcing certain aspects of the learning data. Exact solutions(overlearning eliminated) are theoretically possible, but this is self defeating, since the whole point of having neural networks, is to use them on patterns that can not be solved exactly(the universe again), only very simple networks can be solved, very insufficient for real life problems. Only practical way to setup a real life network, is stochastically(trial and error). The way this filtering is balanced is decided by the engineer of the network, before hand, in our case, natural selection. Here, eugenicists are doomed to epic failure, at best eugenicists will simply find them selves redoing the job of natural selection, at worse there would be a natural selection of eugenicists.

The filtering in the networks, can be done in various ways, tweaking at the pores, pumps, axonal resistivity, network interconnectivity, hardwired circuits etc, some random examples:

   The simplest trick is to simply stop the learning at the minimum, in effect filtering out small details, that are primarily noise. However, a real learning graph, is not smooth, it's full with local minimas, you can never be sure if you are in the global minimum, or not, so you have to do some exploration, something not feasible for the brain of an animal. So the brain is hard wired by natural selection to make you stop learning, at some arbitrary time, that on average helps reducing overlearning according to natural selection, we experience this as boredom.
   Varying learning speed with age. Children essentially start with a blank slate, so they hyperlearn, to shovel quickly information and go as fast as possible to a none 0 level. Learning speed gradually diminishes as the networks get filled up(not causal), old people's brains have a life time of learning, so there learning levels are very low, any more important learning probably results in over learning. Maybe, this is why we have childhood amnesia, as some kind of side effect of the mechanism that diminishes learning speed. In short, buckets at first, tea spoons at the end. This is why old people and new technology makes 2, the assumption of natural selection was that by the time you are old, you will encounter nothing new. Maybe, diseases like Parkinson and Alzheimer, are simply part of the natural continuing degradation of learning mechanisms with age, and simply its new for natural selection that so many people go beyond 80. This is rather depressing, it would mean that we have a hardwired expiration date.
   An other way is to play with the base rate of synaptic decay and reinforcement. We experience decay as forgetting things, the hope here is, that noise is usually erratic, so they will be usually decayed away, while meaningful data usually repeats it self. If to much decay, learning get inhibited. Reinforcement rate, if too slow, it's just a waste of time at best, combination with inappropriately high decay rate, its just damaging.
   Vary the learning data, in practice, this mean a lower reinforcement rate(and lower decay rate for compensation), in order to spread learning in time. So that hopefully, meaningful data is repeated.
   Shovel quickly(overlearn), the learning data, in to a first network(hippocampus) and use it's out put, for properly train the actual network, this is why we dream. It could be, that overlearners and overfilterers, perform there final training much faster, so they sleep less. Related, children(learn a lot) sleep a lot more then old fossils(learn little).

3.6.1 outsourcing

   A strategy, to increase efficiency, is to have more then 1 network, for a class of tasks, each filtering more or less hard the data, and use, the 1s that seem more appropriate depending the situation. In our case, this corresponds to our 2 hemispheres, 1 is more "overlearning", and the other is more "filtering", we switch from 1 to the other depending the problem encountered. The dominant hand correspond to the hemisphere that is the most overlearning in a given brain.
   A related strategy, its to have variance in the population, so that they specialize in different aspects of data processing, at the expense of others, and task are distributed according to competence. This variance is wider then strictly necessary, to hedge against environment changes(remember evolution of sex), random variance to adapt to change. 2 major groups of specialization, correspond to women and men. Men being rather filtering, beater hunters, in an imperfect world, while women are rather overlearners, mothering is rather predictable and repetitive. In general, theres a spectrum of filtering, respectively: standard idiots, dyslexic, neurotypical, gifted, paranoid, psychotics. Idiots filter too much, low efficiency everywhere. Dyslexic, filtering cranked up, focusing on big picture of things, low efficiency with details(bqpd), over represented in CEOs of performing companies, leader... Neurotypical, thats the majority of little girls and boys, polyvalent but excel nowhere. Gifted, focusing on details, big picture suffers. Paranoids, start to seriously overextrapolate, certain of, abortion is killing, copying is stealing, religious fundamentalists, anti pornography, eurosceptics, AIDS denialists, global worming denialists, stem cell research is killing.... In psychotics, there overfitting is way too high(over-recognization), completely detached from reality, not of this universe any more, "generalization" from prior learning takes overwhelming precedent over raw input(Don Quixote). The turning blades of a stupid wind mill, of course not, its a giant. The spectrum is just an example, to give a ruffle idea, its not meant to be taken too seriously. In general overlearners are, girls, athletes, mathematicians, lawyers, economists; overfilterers are males, artists, hunters, leaders. The mechanisms that control learning intensity are so sensible that can account for a major part in individual differences(plus others!!!), for a microscopic genetic difference. Our brains are way too complicated, having overly detailed in circuitry changes would require a colossal amount of information.

3.6.2 overfiltering

Lets assume that you traveled to an alien civilization where letters were written as:


or beater, a realy vicious variant

_____ ___ ___ ___ _____ ___

In such a alien civilization, you would be considered dyslexic. They would look at you with disbelief. Why do you keep confusing "____ ____ ___ ___ _____ " and "____ ___ ___ ___ ______"? It's so simple. For this task you are overfiltering.

3.6.3 overtrained

(o< //\ V_/_

   Performance in 1 task, correspond with under performance in some other task, universal network is impossible. A little uncontroversial example. When you see letters, you always try to reed them, you do this your entire life. The circuits have overlearned, that stuff resembling text must be read, how many 1000s and 1000s of words have you read? Probably for the first time in your life, from the following table, say aloud the colors of the letters as fast as you can, don't read the words. For this task you are overlearned, but for reading you are excellent.

   children only see the dolphins dolphin/couple ambiguity ilusion
   want some flied pork? flied pork set at a restaurant
   A less uncontroversial example, go and do a test seriously.
   Accents of foreign languages, are the manifestation of overlearning in our native language. Engrish, is the sometimes weird english of the Japanese, they have trouble distinguishing 'r' and 'l'. In general overtraining for circuits can be seen intuitively like an "accent" for that circuit. Try to pronounce Eyjafjallajökull.
   Corporal punishment, is really not good for learning, on top of other complications, it interferes with the learning rate painstakingly balanced by natural selection. Children really need to learn things at a certain rate, with repetitions, for optimum performance. Proponents of corporal punishment aren't aware of the "overlearning" limitation of neural networks.

3.6.4 prefilter science map

   Prefiltering the data, and then present them to the network, it might sound very artificial, but we actually do that regularly, it's education. Our scientific models, are prefiltered data, ready for use, painstakingly filtered through multiples generations.
   Paradoxically and counter intuitively, for importantly, overlearning type of people, increase in knowledge doesn't necessarily correspond to beater understanding. Scientific development, is a rather recurring source of this anomaly, a particularly ungrateful 1. At the beginning, because of the primitive state of science, data where naturally prefiltered, there overlearning capacity was masked, so they weren't doubters. Today, because of massive and painstaking increase of scientific knowledge, more details where added. There natural prefiltering dropped, they attempt to grasp the extra details with there overfitted every day experience, and there doubts for the theories appeared or increased. Applies for certain: creationists, HIV-cause-aids doubters, global warming doubters, moon landing doubters, overpopulation problem doubters, eurosceptics.....
   Illiterate people, because of there illiteracy, are cut off from important sources of information, in effect there illiteracy acts as a prefilter, and indeed they become less intelligent then what they could have. On top of it they tend to overlearn the little data they can get.
   children(normal at birth) of retarded(really, not an insult), become them selves retarded. There parents being retarded aren't capable of presenting to them a normal flow of information, so in effect parental retardation acts as a heavy prefilter on there child. And probably overlearn the little they get.

3.6.5 skool

   Rote learning, severe case of overlearning. Damages the ability to learn more, with rote learning all lessons are arbitrary information, you can't generalize from prior experience. Each lesson tend to be on a new curve, not a composite life long curve. It gives the illusion of good performance at the beginning, but when learning set becomes too big, performance collapses. This form of child abuse is given only by lazy/incompetent teachers.
   And of course education at skool is just a travesty, same thing for adult training. Too low grades are as bad, as too high grades, on top of the fact that "skool learning" often corresponds to just rote learning. Universities are just orders of magnitude worse. No skool that i know of ever took overlearning in to account(apart the dead rock obvious), only the gradeless skools are approaching this. Skool learning is child abuse, its just for the test learning, they are diploma factories. Original goal of skools, was obedient and efficient workers, not education. Skool is the place where coolness go to die. Skool is a punishment. For those claiming that educational standards have fallen, actually old standards where to high to begin with(overlearned), they think that only real learning is rote learning. At least our brains do there best they can to cope with the situation, by forgetting 90+% of our lessons(in the first year). In some places the situation is really bad, with nothing but rote learning, if you are in such a place, you are in an underdeveloped country. I just recommend radical unskooling(like homeskooling, but child lead).
   I really, really hate school, and I mean in the literal sense of "counting the days until I'm finished like a prisoner counts the days until his release." Have I ever mentioned that?
   all the reasons i DO go to school 1. it illegal not 2 2. i go to get a job 3. there isnt a 3
   I think amount of kids that hates school are every kid who isn't homeless

3.6.6 recruitment catbert

   A single circuit that can do everything, is impossible, a person that is efficient in everything is also impossible. This means that any recruiting evaluation, should be as close as possible, to the real task, use of proxies is perilous. Needlessly hard recruiting evaluations, simply measure the candidates capacity to pass the evaluations them selves, not what they are supposed to evaluate. For beater results, preselective tests of NORMAL difficulty and close to everyday conditions, then a TRULY RANDOMIZED selection of needed personnel among the remaining candidates. What amounts to a literary exam for hiring truck drivers, simply because they are too many, is NOT a good idea. The worse of the worse is recruitment by the state, because of corruption worries, recruitment is done only with elitistic exams.
   During training, usurping the training method it self, to achieve some kind of arbitrary quota doesn't fly either. It damages real training it self and imposes a selection of personal talents irrelevant to the tasks requested. Training should be done under normal conditions, if theres a divergence between available posts and issued certificates, thats the recruiters and recruits problem. Before complaining, keep in mind that, after removing unnecessary difficult-iers, training costs become substantially lower, so this is by no means cruel. This would increase real competition.
   Even if we ignore overlearning, an other issue, is the overspecialization that is generated. For the same reasons that sex exists, any organization should avoid overselecting. This will, indeed give efficiencies lower then the theoretical maximum, but will be beater equipped to survive in the long run.

3.7 Superman

superman logo

Networks on there own, can only associate, an arbitrary input, with an arbitrary output, with generalization alone, we don't go very far, a way is needed, to decide, what to present to the networks, other wise it's incomplete. In our little experiment, we essentially supervise the network, and decide, whats meaningful, whats an error. Our brain too, has a supervisor, this of course is an other neural network, you already know it actually, you just gave it an other name, you usually refer to it as your pleasure/pain center, this is where your emotions come from. In effect, the supervisor it self, is the direct holder of a program.

Note, a little bit of xenopsychology: Emotions are a natural requirement for any practical functioning of a neural network, so, extra-terrestrials and efficient AIs should have them too. Various gods(example the christian god) have emotions (love, anger, etc), this would imply that they too ... have a supervisor and by extension neural nets? With all the classic issues of neural nets? Those nets and supervisors evolved too? As an atheist, i'm not going to develop the subject further, but interested parties can see that theres quite a philosophical discussion to be held.

The actual design in our brains works a little differently then with our simplified experiment. In our brain the raw patterns are presented to the networks, they get associated with some out put(completely random at first, but later it generalizes from prior learning). Then, in the real world something happens, that something is presented to the supervisor, if the supervisor decides, that what happened was good, it will reinforce the synapses responsible of what just happened(easy, they are the only 1s active), next time, similar input will give similar output with greater probability. If it decides that what happened was bad, it will deinforce the currently active synapses, so that next time this response gets less probable with similar input. Its the supervisor, that is very responsible, in inducing correct learning, by implementing the various tricks in order to stay close to the learning minimum. In relation to the data classification model, this would correspond, in associating inputs with great number of random outputs, deemed helpful.

A little example, a baby, in it's bed, it's sensory system gets in information, it decodes, and present it to the network, at this time, it's rather empty, so it will give a random response, the random response, will move at random, the hands of the baby, now lets assume that something soft is hanging, and that the baby touches the soft thingy. "soft thing touched" signal will be sended to the supervisor, it will decide that it's OK, and reinforce the active synapses in the network, next time, when a similar sensory input(fluffy looking thing hanging), is presented to the network, the same neuronal pattern will reactivate(doing a generalization), and the baby, will move it's arm to touch it. Similarly, if the baby was confronted with a cactus, the supervisor will decide that what happened was bad, and degrade the synapses, so that next time a cactus like thing will not be touched. Here, its about absence of synapses, that give the signal "touch", not the presence of "i should not touch".

As we get older, learning accumulates in this manner, until we get very efficient, the networks in adults generalize, in new situations, from the years of accumulated learning, the purpose of our childhood is to do this accumulation. All brain decision making follows this basic algorithm. In particular, the networks don't try to replicate a "pleasurable" behavior, the "replicate behaviors that are pleasurable" is it self a reinforced behavior. Explicit memory and explicit signaling from the supervisor are part of the inputs, and lead to the serial processing of information as we experience it directly.

The main decision loop: sensory input plus memory plus current supervisor state, are presented to the network, from past learning it generalizes in the new experience(if it can't it simply gives something random), sends output to muscle/memory/supervisor/others, the supervisor, gets the response from the environment, and decides to reinforce or deinforce the synapses that are currently active, and the cycle repeats. Plus all this, a control system insures that the brain works with a cloak rate in the order of 10Hz.

If too many neurons are simultaneously activated, the networks consider that many patterns are present simultaneously. In humans that's experienced as an epileptic seizure, resulting from a failure to maintain the clock rate. In the reverse, if too little neurons are active we enter a coma.



3.8 Divide and konquer

Simple animals do use the very simple, single networked brain, like for example Caenorhabditis elegans (the nematode worm), but we can do a little bit beater. The brute method, of just throwing a shovel of neurons gives very average results. Assembling a great number of networks that are individually optimized in certain aspects of inputs is more efficient. Starting with a "shovel of neurons" brain, evolution, can proceed in gradually carving up chunks with particular properties, with particular connections with other chunks, with various degrees of hard-wire-innes. The hard wired stuff are there for the issues that come about often, the supervisor doesn't lose in importance, it is there to handle the unforeseen situations. 3.8.1 example circuits

clic me the edge detection disks are fooled in to activating.

confused face recognition circuit a double face

For example the edge detection apparatus in the visual field, consists of a disk, connected to a single out put, its center is excitatory, while the periphery is inhibitory, when the entire disk is illuminated, the impulses cancel each other out, when the center is illuminated and the periphery not illuminated, a signal gets through. With intermediate illuminations, intermediate signals get through, disks are overlapping so that the outputted image is smooth. A more complicated example is predicting the future, involving: making a prediction(generalizing), keeping copies, then comparing them with what really happened, and then adjusting the synaptic weights of the predicting network to what really happened. Multiplication, is done just with explicit memory, additions however, have a special circuit that does that.

Among other differences between, women and men. Brains of women are more interconnected, optimized for multitasking. Brains of men have more important processing centers, optimized for serial tasks. particularly interesting, the memory system:

An interesting example of circuit specialization is the memory system, overwhelmingly it expects that meaningful information is in images. A nice trick for memorizing is to always translate information in to a travel, with images, experiences and events(mnemonics). And remember, for these tricks the Internet is your friend. For example, try remembering, the following random number, just by reading it (standard base 64, case sensitive).


You where lousy. If you assign meaningful images to the characters, and place them, in a smart way in a imaginary or real travel, imagine it happening at your own back yard or in your room. You can cram the whole thing, even with normal memory. The trick, is to imagine as vividly you can, the stuff of the travel, as if they where really happening in front of you. You try to add as much little details, etc... pop out and full size your browser does not support the video tag, install pop out and full size your browser does not support the video tag, install

Can you recall the whole thing now? Congratulation, you just memorized an illegal 128bit number, don't tell MPAA, or they'll have you lobotomized to protect there property. in general

All these component networks, are more efficient, but still do smart mistakes, in the visual field , these correspond to optical illusions, but the equivalent of optical illusions exist in all networks, no exceptions, including the thought processing networks. 3.8.2 Sensory/motor/other secondary supervisors

The sensory circuits, motor circuits, and others, take inputs from the out side and follow orders from there supervisor. as usual the synapses simply morph to keep the supervisor happy. These supervising circuits take in a variety of additional information, functioning silently with unsuspecting intelligence behind our back, with out any warning. I'm mentioning some examples for illustration, and for not leaving the wrong impression that sensory/motor systems are wired like a computer, they are still neural nets. We are interested here in decision making, not how sensory/motor processing works. Sure they are not trivial, but theres nothing specially magical with them, we can easily perceive intuitively there mechanical nature, this is not the case with decision making.

   If you get a lesion in the motor cortex, as is, the impulses will not produce the desired movement. The local supervising circuit will keep intervening until expected and actual movement are the same, or until higher order orders from the frontal lobe become reasonable.
   If the motor supervisor is knocked out, you get parkinson's.
   Sensory supervising circuits, take in to account visual information as well. A simple association, the system "sees" that some sensory activation is occurring, and the vision contributes in the activation. In the rubber hand illusion the sensory system, is tricked, in to believing a visible rubber hand is the hand is supposed to process information from( right position, etc...).
   If you hit something, the system is smart enough to take in to account that you are exerting these forces, so it reduces there apparent feeling. This is why you can't tickle your self, it's smart enough to reduce the apparent feeling, this way you can concentrate on potentially more urgent things that really come from the out side world. More annoyingly, when you hit your kids, the same mechanism is at work, so you feel less force on the striking hand, hence systematically you underestimate how painfully it really is.
   The famous, mirror neurones, they deduce what motor out puts should be produced by looking at motor actions of others. A probable guess, when a motor signal is send, it is stored, its corresponding visual effect arrives a bit later. On a circuit, it is presented as input the visual signal, and as output the motor signal. At use, the circuit can deduce what motor signal corresponds to a given visual signal of a movement (even if its on an other creature).

Secondary supervisors, can perform neural training as initially explained (input/output pairing). Simply because the problems they try to solve are much easier, then fully blown high level decisions. 3.8.3 all together

pop out and full size your browser does not support the video tag, install

A very simple carving, is a sensory layer connected unidirectionally to a behavioral layer. The sensory layer takes in raw information, and outputs information of higher order about the data, but as such it doesn't take political decisions. Example, raw data gets in meaning nothing and a pattern meaning "this is a straight line" goes out. The behavioral chunk, takes in the already sort out raw data and generalizes what the animal should do, it does all the cool stuff. Example, sensory layer says "food-like thing at 70°", behavioral layer gives order to muscles, "take a bite at 70°". Here the supervisor need to output only to the behavioral layer, while the sensory layer is very hardwired.

The sensory layer in the model, corresponds, with, the back side of our brain, the behavioral layer corresponds RUFFLY to the frontal lobe, this part, got favored, during the brain size increase of our recent evolution. The model is really simple, it's just a toy, the real visual part alone has at least 10 layers, this is in order to demonstrate the principles behind our brain, 100 billion neurons need some simplifications, in order to see whats going on. The layers in our simple model, are decomposed in a great number of specialized individual networks.

In the sensory part the outputs are plugged in the inputs of the next networks , and gradually decoded as they advance. For example in the visual part, various layers detect movement, tilted lines, edges, plus others. A note about the hearing apparatus, apart volume detection the ear decomposes the sound in its component frequencies(the snail-looking thing does that), the hearing center, takes in the specter of the noises and generalizes from that in to patterns. As the impulses moves further to the front in the sensory layers(vision, hearing,smell, touch, other), the raw signals gets decoded, in to, that percents of higher order concepts (movements, tilted lines....) at these specific points in the sensory field. pop out and full size His answer about his paralyzed limb is of the same nature as his reaction to the mirror, he is not in denial, In his universe, there is no left. your browser does not support the video tag, install

All this, plus explicit memory plus current supervisor state, gets presented, as a "single" pattern, to the frontal lobe, that too has multiple networks that are specialized, in certain aspects, of decision making, hemispatial neglect, gives a good idea of these. This disorder is caused, by a specific lesion in the brain, just behind the frontal lobe, example of patient, he still gets sensory input from all his sensory field and has normal intelligence, he will eat, only the food on 1 side of his plate, even though he complains that he is hungry(<rudely>and the hospital is cheap, and the nurses don't do there job....</rudely>), if you turn the plate around, he will eat the other half; he will shave only 1 half of his face; and no, you can't cheat with mirrors, somehow he can't think about the other side. An other little example is thinking about the future, its located just above your eye browses. But the first layer concerned here, is probably about attention(whats broken in hemispatial neglect), special stuff like, bright lights, faces, movements, sudden noises, etc get singled out for further processing.

Probably, ruffly the decision part of the brain is decomposed in a first "whats important" layer. Decision making properly speaking could be organized in a similar way as the decomposition of sound in the inner ear. Very high level of abstract decision making layers(example, predicting the future) at 1 end, passing gradually through a great number of lowering abstraction layers, to a lowest abstraction of decision making at the other end. Damage here, can leave you in a vegetative state. In humans probably layers are very interconnected, while in animals less so. The whole thing would be set up like a decomposed spectrum. The highest abstraction layers would correspond to "big wave lengths" and the lowest abstraction layers would correspond to "small wave lengths". The synaptic properties of each layer are weighted accordingly. The patterns that are exchanged, are probably randomly selected stuff, due to the way the networks learn, but they have a specific meaning. Presumably, if an adult layer was transplanted successfully to an other brain, it would not work, like connecting a small endian electronic component in to a big endian machine. For example: input gets in, "alarm clock rings", the highest abstraction layers generalize, "lets get out of bed". This is plugged to lower abstraction layers generalizing, "lets move legs/hands this way". In turn this is plugged to a lower still abstraction layer, the motor center, that in tern, give orders of the style "muscle fiber n° 4564864 contraction 15,78%, muscle fiber n° 65488731 contraction 20,51%.....".

When the frontal lobe, according with previous learning, will generalize a response, it will get plugged in backwards, in output networks(memory, motor center, supervisor, others).

The supervisor, is RUFFLY the limbic system. When it wants to reinforce the synapses it will signal to the neurons to increase the easiness, that inputs go through them(it amplifies the inputs). The neurons learn only by simple repetition, but now the more intense inputs will have the indirect effect of reinforcing the synapses. If it wants to degrade them, it signals a decrease in the easiness that the inputs go through, and indirectly the synapses degrade. All synapses automatically degrade over time on there own, by having there activation throttled the concerned synapses are let to degrade. The supervisor, gets inputs from the the senses and from the frontal lobe, but these last 1s aren't orders. They are used in the same way as the sensory inputs are used(thought crimes).

Each neural circuit, doesn't have any understanding of what the patterns are about of course, they are machines, all they know, is that when pattern n°1873496 is presented, they must give back the associate pattern n°9348734, that is stored in it's synapses. For them, some seemingly random looking pattern gets in, and they must respond with some other random looking pattern, simply because there supervisor order them too, this stays true for the networks in the frontal lobe too.

All this, as you can see, is already an unsortable mess of complexity. We are interested in decision making, in order to understand the big picture. 100 billion neurons is unhumanly too hard, as complained previously. So we simplify our lives by sticking to the simple approximate supervisor/neural_net model. 3.9 The smallest positive integer not definable in under eleven words

(\ /)

|> <|

What makes the senses "feel" the way they do?

First we'll make a tiny observation, if we ask a human, what color are apples, he will respond red and green, if we ask whats the difference, he will not be able to respond, he will just say that they are different. From that we can conclude, that humans, can't decomposed, a color, further, because, the wiring, doesn't allow it, colors are coded as special neuronal patterns, in a similar way as in computer color codes, the colors correspond to certain string of bits. 16.777.210 other colors (because we can distinguish around 10M colors, and 1bit less would leave us with 8M) BLACK 000000000000000000000000 WHITE 111111111111111111111111 RED 111111110000000000000000 GREEN 000000001111111100000000 BLUE 000000000000000011111111 YELLOW 111111111111111100000000

The networks of the brain are allowed to manipulate them, but doesn't has direct aces, to the underlying encoding, in a similar way, that an image editor, permits you to manipulate colors, but not, the underlying coding. If you don't need to mess with the bits, in your image, why an animal would need to mess with the neuronal encodings in his brain? This is a very reasonable theory, that seems to pop up almost on her own.

A partial outline of this encoding: Green is somehow uncoded as green, red is somehow uncoded as red, but yellow is the encodings of red and green, theres no way you can guess that just by looking at yellow. A daltonian, would never know he is daltonian, just by naively looking at things. Similarly, cold is somehow coded as cold, hot is somehow coded as hot, but burning is the encodings of hot and cold, theres no way you can guess this just by burning your self.

A little fictional experiment. We build a robot, with the above specifications(not unreasonable design) in it's sensory system and it's capable enough to hold a simple conversation. Actually i think that any design will have the kind of the above limitations. With the apples, it will say, "green" and "red", same as the humans(figures). We ask whats the difference? because of the way we built it, it will say "can't answer"(command not found), in it's way, same as humans again. If we ask about red and green, it will say, "theres something special" in it's way, same as humans. This looks like a very boring experiment, we built the robot in a way that can only answer in these ways to the experimenter requests. But, on more careful inspection, the robot behaves, qualitatively like a human. They both, are speechless when asked to compare greennes and rednes.

Hmm, could it be that, what's happen in the robot that we build, is whats going on in humans about our experience of colors? From, the outside, the behaviors are qualitatively identical, and basically the question seems to be answered. From a practical point of view, if between two things, we can not find a difference, odds are good, that these 2 things are identical, in other words, humans and the robot really share the same basic design.

But this doesn't look good enough, this explanation seems, to be in the right direction, but it has something laking. Simply referring to the sensory model and brushing away our conscious experience as a "bug" of our sensory system, seems like a dishonest trick. It's like the theory, of the invisible pink unicorn that created the universe 73 seconds ago, as is, like if it was 15 billions years old, including your entires life memories. This inconsistency is normal, since we got outside the constraints of the experiment, we invaded the experiment by presenting the results to us, and complaining. Up to the point that we didn't do that everything were identical with the robot. We explain to that stupid robot, why it can't answer the stupid question, we explain the limitation in it's sensory system. If we think carefully, it's most probable answer(if it generalizes as us) would be ... the same reaction as us, since it can't verify directly what we tell it and has no reason to take for granted what we tell it. Now, again the experiment is complete, if we want to include our latest reaction, we have to make an other completer iteration. Its the beginning of a recursive infinite loop. The robot, taking part in an infinite series of experiments, taking the place of the human in every new iteration, while it's place is taken by a new naive robot, and the human takes a step back observing the new, more complete experiment. In every iteration, the first robot repeats the behavior, of the human, while the human finds the new results incomplete. The infinite series of experiment, is a complete explanation to whats going on. In other words this is a self referential paradox, it's just obfuscated.

All observations are explained and seem trivially reapplicable, theres nothing else to ponder on, and thats it. If you are not convinced. How are we going to prove to the robot, it's internal workings? When you answer, apply your solution on your self :p. Alternatively, you can consider, that if we try to understand our brain, we can only do this while we are using it. Some form of self referential paradox will manifest at some point, and should be taken in to account. It's like if we try to smell our noses, hear our ears, taste our tongues and see our eyes. If we try to understand our selfs, and our theory doesn't contain any self referential paradoxes, our theory must be wrong. This is not, some cheap trick, self referential paradoxes, are a natural consequence of a theory that tries to understand our selves.

The exact same thing happens with all senses and emotions. With consciousness, the networks, that do the thinking are little black boxes, of which the internals are not accessible from the perspective of humans, and again the same behavior insures. It's the same thing every single time, once you understood 1, the others are boring.

And yep, thats right, this is what is implied here, we are literally, machines, this is no figure of speech, or a metaphor or an approximation or what ever as usual, "we are machines" in the whole sense of the words. Ohh yea, we are "machines" we knew that, no no, we ARE machines. In a fostering experiment of chimps, in which they are razed as human children and learn sign language; a chimp got a nervous break down when he learned he wasn't human. Referring to this incident here, should leave no doubt in your little mind about what i'm saying here, consciously or not. Frankly guys, WHAT!! did you expect? you just believed in Santa Claus. Your little life has always been, nothing more then a "nihilistic punch line(Natalie Six)", just get over it.

   Human: Human beings have dreams. Even dogs have dreams, but not you, you are just a machine. An imitation of life. Can a robot write a symphony? Can a robot turn a... canvas into a beautiful masterpiece? Robot: Can you? Human: (Doesn't respond, looks irritated) (I, Robot)
   We're all puppets, [...]. I'm just a puppet who can see the strings.(dr manhatan)

(Did you counted how many words, they are in the title?) 3.10 Programing


A classic computer program is composed of a sequence of instructions, that the CPU is wired to understand: additions, multiplications, moving information in memory etc. To keep to the computer analogy, the hard-wire-iness of the upper brain(contrast, tilted lines, summation...) would correspond to the instruction set of the CPU, while the program properly speaking would be the synaptic weights reinforced by the supervisor. The supervisor is indirectly the holder of our program, we are nothing more then robots, a robot with out a program would have a pointless existence, it will not know what to do. The analogy of program/supervisor go as far as sharing the "worse is beater" design philosophy. A design that is simple to the point of sacrificing correctness, consistency and completeness, ..... actually does beater in practice then a "good" but overly complicated design( perfection doesn't pay). Our program, is filled with horrible approximations. Notice, that theres a considerable decrease in complexity, between the top of our brain, and what the supervisor is doing, the supervisor is very hardwired, by natural selection, the programmer properly speaking. A myriad of rules, of what behavior have to be reinforced or not, under what circumstances, at what rates, and how they interact, all carefully and meticulously adjusted. This way of programing contrast sharply with classical programing, it is very fault tolerant, progressive and lends it self very well to the brute force approacher of natural selection.

The supervisor is not completely hardwired though, it too, is capable of some learning, phobias and corresponding pleasurable associations, it learns if something should be feared or craved for. This can only be done, if it has it's own simpler little supervisor(with corresponding reduction in complexity), lets call it the hypervisor. Can not say how many layers of supervisors are packed up, but they are at least 2. Actually, from a very early conjecture(early guess, not sure if its really true) that suggests, that the serotonine system is in reality slave to the dopamine system, it would seem that we could have 3 layers of vaguely identified supervisors, but we can forget this here. Ruffly, the supervisor decides stuff like, bunnies are hot, cockroaches are not; the hypervisor, stuff like, pain is not good, food is good. And of course they run in to the same problems of overtraining, overfiltering and striking the right balance in there learning strategy. This setup of the supervisor, hints about the evolution of the brain, the hypervisor, is the brain, of our "worm" ancestor, our supervisor is the brain of our "dinosaur" ancestor, it also hints how embryologically the brain gets bootstrapped, from a mass of neurons, the wiring of the "simple" hypervisor get set up, that in turn sets up the wiring of the supervisor, that in turn sets up the top brain, with each level, complexity increases, considerably. Similarly our brains are the supervisors for our computers, it seems that we shouldn't fear that our robo-overlords will get rid of us any time soon. Weeeeell, we could imagine, our ever smarter machines, to start trying to trick us and manipulate us, in an analog way we try to trick and manipulate our own supervisor; in the sense that we consider our supervisor stupid, and try to manipulate it for our common good, and not like in a dominant/dominated relation. Probably, the very first "brain/supervisor" evolved as a simple pacemaker(on the lips of the chimney), for synchronizing contractions in our sponge/sea-squirt ancestor. In other words its most direct correspondence to humans should be the breathing reflex. This "brain" would become gradually more sophisticated, so that at first it simply modulates its frequency according to events from its environment, and then the fist top layer will be added, it self becoming the first supervisor properly speaking. If we compared the brain with a country, the posterior part would be the civil service, the frontal lobe, would be the government, the supervisor would be the parliament and the hypervisor would be the voters(sorry if you find it offensive).

Of course there's no reason that this evolution couldn't be replicated with artificial neuronal networks, people working on AI, should get inspired by this multilayered design and strategy of natural selection. The problem can be decomposed in to small tractable problems, starting from the intelligence of less then a worm and incrementally working there way up, like Russian dolls. The brute method of solving the thing in 1 step, just doesn't work.

The supervisor as a simpler brain, then the brain, performs quite complex manipulations, for the training of its supervised nets. Like taking direct real world inputs, making copies of patterns and using them later or elsewhere, etc. Secondary supervisors, can have an algorithm for deducing the correct output/input pairing . They don't compute the matrix of synaptic weights or something, they harvest the pairs from the brain by various means. Example of the usual underlying strategies. If a net needs to be trained to deduce what motor output should correspond to a specific visual input of the same movement. This can be achieved by copying normal motor outputs and corresponding visual input, and using the old "visual input" as new neural input, and old motor output as new neural output, on the net of interest. This way, the trained net, can generalize, the necessary motor outputs, by simply seeing movement of body parts resembling his. Now, it can issue a similar motor order, if the decision to do so is taken.