User:Graeme E. Smith/Objections to the HTM model
Objections to the HTM Model
Neural Constraints and the Temporal Limitation
Graeme E. Smith, GreySmith Institute of Advanced Studies
In Towards a Mathematical Theory of Cortical Micro-Circuits Dileep George, and Jeff Hawkins discuss a tentative model of memory micro-circuits based on the HTM or Heirarchical Temporal Memory Model. Although a proper histopsychological discipline has yet to be inaugerated, some pioneering work has been done, that suggests that this model is too advanced for the constraints that the neocortical micro-circuits are placed under by the fact that they are made up of Neurons. In this article some objections are raised to the Temporal aspect needed to implement Markov Chains. It is not that there are not circuits in the brain, and even some in the cerebral neo-cortex that implement Markov Chain like behavior, But instead, that this behavior is not the base case from which all Neo-cortical function is derived. An alternative interpretation is suggested for a new Model.
Jeff Hawkins, developed the theory of a Hierarchical Temporal Memory, and Dileep George, suggested the use of Hierarchical Bayesian Inference, and Markov Chains as a way of mathematically modeling the circuit implied. While I am sure that this is a useful model, I am also sure that it is too advanced a model for many of the neural circuits I have characterized in my work on the Neo-cortex, if only because it implies a temporal chaining of neurons that I find difficult to support under my Neural Constraints model. While it is easy to implement markov chains in a computer environment, which has an implied temporal component, I find that temporal chaining is problematic in a neural circuit. If only, because the temporal cues, that drive the neo-cortex do not do so at a sufficiently high frequency, to be incorporated into the micro-circuit operation at the individual transaction level.
Further I am, I find reluctant to accept the assumption that the Bayesian inferences are coincident detections in the neo-cortical circuit. The concept of a coincident, I feel, is too discrete a function for what these neural circuits must be doing under the constraints I am exploring in my own work.
Yet I do not want to discourage the ideas of these two researchers, since they are nearly correct as far as I can determine without completely understanding the Mathematics involved. Instead I want to propose a variation on their model, that would do away with my objections, and still use many of the same mechanisms in order to achieve its goals. The new model would lie in the same family of mathematical functions, but be implemented slightly differently.
Like many, computer based technical people who have looked at the Neural Network theory, I originally attempted to characterize neurons using digital gates, or transistors only to find that the model didn't quite fit. For one thing few digital gates, or transistors have thousands of inputs, and it quickly became obvious that locally at least, neurons used a more analog signal then could be implemented digitally at the gate level. I am currently exploring the idea of arrays of CAM cells, as a simulation mode, but have not yet completed a feasible circuit for testing this model. A Two dimensional CAM cell array, might easily simulate a locally analog implicit memory neural circuit, which is the basis under which I hope to implement my digital/analog hybrid simulation. However, to get where I am, I have had to deal with the constraints imposed on neural architecture by neural network limitations.
The first intimation of why the neural circuits were constrained came from Jerry Fodor, in his The Mind Doesn't Work That Way: Scope and Limitations of Computational Psychology a book I found in the dustier stacks of my local University Library.
In a very obscure location in the book, he discusses the fact that discrete memories can't be directly implemented by neural networks based on neo-cortex circuits. Being familiar with Neural Networks, I accepted this judgment because his arguments about the nature of storage within the Neural Network had merit. Scientists that had attempted to find orderly storage at the neural level even in artificial neural networks were sadly disappointed. He then went on to suggest that perhaps the answer lay in J.C. Eccles Column Architecture, a micro-circuit involving hundreds of individual neurons arranged within a circumference of some point that have been noted to activate the same way despite the laminae levels that were stimulated. While the Column is under study and there is a push to synthesize a Column Chip, in Switzerland. This approach seems doomed, not because the column isn't important, but because recent research has suggested that it is dependent on external organs for some of its function, and no attempt is being made to simulate the external organs yet in what little I have seen of the work being done to characterize the column.
To add insult to injury, the best model of columnar activity, that I have seen, is based on the idea of a sub-component not described by Eccles, called a mini-column, which seems loosely associated with the concept of a Neural Group, an architecturally undefined concept where groups of neurons within a column seem to vote on which one of them will fire, without actually firing themselves. One of the problems I have with HTM as a model, is therefore the fact that this intermediate level of complexity is missing. The best treatment of Mini-column architecture that I have seen is done by David LaBerge, in his work on the Attention System. In an article Attentional Control, Brief and Prolonged LaBerge, suggests an Axis Model for Laminae V Neurons that seems to link Thalamus inputs to pre-activation of Neural Groups.
Without getting into why I feel this is an important step in understanding Neo-cortical micro-circuits, I have suggested that this mechanism defines a separation of function between implicit memory, and explicit access to implicit memory, which suggests that discrete memories are indeed not the base function of the neo-cortex, which is why I disagree with the characterization of Coincidence detection as the mode of operation of the Bayesian function. If you can't isolate a coincidence, how can you detect it? However Similarity detection at the signal level is entirely plausible even in a naive system that has not consolidated isolated memories back into the cerebral cortex.
Adding to my argument against temporal coding, is the fact that Eccles has clearly stated that Laminae I is the most prevalent internal layer for transport of information between cortex areas. Yet no connections have been defined for Laminae I and the Reticular Activation System, thought to be linked to the distribution of Brain Waves throughout the cerebral cortex. If the central clock system for the brain is not involved in defining time, one has to ask, where is the temporal connection at the micro-circuit level? Certainly there is no such involvement, (I can see) in the top three layers of the Neo-Cortex, and so, the base function, is not likely to be linked to time. There is, however an involvement in the Laminae V connection, of Dr. LaBerge suggesting that a Temporal component might leak into the explicit memory via that link, but it would be at a lower frequency than is needed to implement HTM, and seems instead to be linked to Low Level Attention functions.
The HTM circuit Model
Essentially the HTM model, has resulted in a micro-circuit prediction, that is fairly detailed. However I have problems with the characterization of some of the areas of the prediction if only because I don't feel that evolutionary evidence supports them.
According to HTM, Laminae IV is the coincident detection layer, because in the higher sophistication brains, that have a Laminae IV. it is directly connected to the sensory input via the Thalamus LGN. I think that this is a little naive, in that it assumes that the circuits in Laminae IV have the sophistication to detect coincidences. In earlier animals, we can expect to find a three laminae cortex, not a four laminae cortex, does this mean that coincidence detection is outside the capacity of these earlier brains? Perhaps, however what does the earlier three laminae cortex offer that is stable enough to be conserved over a number of phyla and be built on, for the 6 layer Eccles-like cortex? HTM does not say. Consider the Olfactory Bulb which is admittedly limited in humans but which is well preserved across phyla. It consists of what I assume is the earlier three laminae cortex tissue.
According to my admittedly naive histo-psychological seminal work, Laminae IV is a Processing layer, but what is it processing? My best guess is that it is processing which of the micro-circuits to associate with which signal. In other words it is a distributive function not a detection function. That it has the added cachet of linking a vertical micro-circuit to a specific signal, is not in my opinion germane because at this level of processing the micro-circuit is redundant and interchangeable with other similar micro-circuits. It is hoped that the distributive function approximates for the same type of signals so that clusters of similar signals gather in similar places, but this is the best we can do with this simple a neural circuit component.
HTM suggests that the connections for Laminae IV terminate in Laminae III. But if Laminae IV is the coincidence detection layer as they suggest, then where does coincident detection go, in Agranular Tissue? What does a 5 layer cortical tissue that is without coincident detection mean, in HTM? It has been noted that there is some cross pollination of Laminae III by LGN signals, but without the granular cells that dominate Laminae IV, how could Laminae III do coincident detection? Further there seems to be a dearth of LGN signals in Agranular tissue, suggesting that Laminae III does not take up the slack.
I suggest that in fact coincidence detection at this level of organization is too complex for the circuits involved, as I will discuss in my own model.
Eccles described Laminae I back in 1983 in his paper The Horizontal (Tangential Fiber) System of Laminae I of the Neo-cortex He did not describe connections to the thalamus. However in the spirit of support that I feel for this approach, I will not quibble about it, there might have been further work since that discovered such a connection. However, my reading, albeit slightly naive since I have not studied the thalamus, is that the undifferentiated fibers are actually, fibers that do not connect to any specific nucleus of the thalamus, because they actually project through the thalamus, possibly without connection. In other words the undifferentiated fibers are related to the thalamus more or less because they have to pass through it to get to where they are going, much like the tourist that passes through a city on the outer ring road, they see the edges of the city, but do not relate to it, because they are not trapped by any of the tourist traps in the center of the city.
Laminae II/III are often bunched together, because they are mostly made up of pyramidal cells. Marr, in his 4 layer model, described in A Theory on the Neo-cortex written in 1970, noted also that there were basket cells especially in Laminae III/IV. He assumed that these had the function of dividing the mathematical calculations of the other cells, but today we recognize the pattern of multiple inhibitive inputs to a soma as a special type of shunting that does not affect the input synapses of the cell. Shunting the cell at the soma shuts off all output from the cell without as I have said, adjusting the inputs. The outputs of laminae III project into deeper structures in the brain. The outputs of laminae II seem to have a more local effect, which, in turn might support the idea that smaller pyramidal cells, have shorter axions.
many of the pyrimidal cells in the Laminae II/III area are almost granular in size, but still obviously pyramidal in shape. These small pyramidal cells do not show up well in most staining studies, unless the stain used is a pigment stain. However the laminae are clearly visible suggesting a plausible change in Neuro-Transmitter between Laminae II and III. Both laminae II and III project their Apical Dendrites to Laminae I and are thus sensitive to data from feedback mechanisms from other areas in the brain. Laminae II which is sometimes called the external Granular layer, may have granular components similar to laminae IV.
It is thought by some biological computational researchers that laminae II and III define a positive or confirmational model where Laminae II proposes a theoretical model of the signal from laminae I, and Laminae III confirms that model and only then fires. If so this would be very similar to the operation of a Static Ram device where two flip flops are in series, and it is only when confirmation of the first flip flop is detected by the second, that it outputs the result.
Now the HTM theory would have you believe that this is Markov (sequence) detection, circuit, but there are a couple of problems with that theory. One is the assumption of a temporal signal on laminae I, which I don't feel is accurate, 2 is the assumption that that signal would allow the simple circuit described to distinguish between events in the sequence, which I suggest, requires a higher frequency signal than is experienced anywhere in the Reticular Activation System, and Three is the assumption that laminae IV is the coincidence detection layer, and therefore that what this layer is processing is the sequencing information. Having already dealt with laminae IV, and made my objections to the interpretation of how Laminae II/III work, lets go on.
Now we get to the idea, that Laminae V, calculates the Markov Chain. The problem I have with this is the sophistication needed to partition the laminae II/III outputs to get Markov Chains, I simply don't see it. However what I do see is a natural voting mechanism if I take into account Dr. LaBerges Mini-column model. What this natural voting mechanism would do, is offer a mechanism by which Neural Groups could form. Essentially, output from Laminae II/III micro-circuits that I have suggested stood alone at a previous stage in evolution, that project to Laminae V, would terminate on the Axial Pyramid for a particular Neural Group. It would pre-activate that neuron, allowing it to in turn stimulate the centroid of the Neural Group, which would eventually if the pre-activation was large enough fire the Neural Group.
The Alternate Model
If the Neo-Cortical circuits are indeed as I have suggested limited by constraints, not to act in the way the HTM model suggests, then what we need is a simpler model, that operates within the constraints of Neural Circuits. What I am proposing is what I call the LHM model. LHM stands for Limited Heirarchical Memory. In LHM, what we work with is similarity detection rather than pattern detection, and we work with Transitions rather than Markov Chains. Further because the neo-cortex is highly one might say massively parallel, we work with transition clusters, rather than transitions themselves. Each similarity detection is linked to a transition cluster, in much the same way that the Markov Chains were linked to the Bayesian coincidence detection in the HTM model.
The Bayesian Function can be thought of as being equivalent to the Neural Group function statistically capturing the voting effect. If this is true, then the cloud of Transitions caused by the micro-circuit, can be defined as a statistical distribution of Transitions between Hierarchical Layers. This means that instead of having a single array of transitions we will be looking at a higher dimensional array.
The Attention System, can partition this higher dimensional array using Brain Waves, if the conjecture that Functional Clusters are linked together by frequency, and that frequency is controlled via a prefrontal cortex-Thalamus loop that ultimately helps the ACC to select between equivalent options is accepted. To understand this conjecture, you have to see the ACC as a switching center that operates by triggering suppression of neorcortical micro-circuits possible by activation of the basket cells. Through the involvement of the Ventro-Lateral PFC, it is thought that data passing through the ACC is detected for frequency components that act as tags, linking areas of interest. By switching out anything but one particular area of interest, the ACC can focus attention on that area. However because the senses are a stove-pipe configuration, the same area of interest in the environment may have components spread across the cerebral cortex. These areas must all resonate at the same frequency in order to be linked. The resulting cluster of disparate locations across the cerebral cortex, is therefore called a "Functional Cluster" according to Dr. Edelman, who has written a number of books including The Remembered Present, in which this idea was elaborated on. (The ACC was not mentioned nor was the Ventro-Lateral PFC.
I represent the outputs of the functional cluster of neurons with a data element that is highly parallel, redundant, and Organizationally challenged, which because of its nebulous nature I have chosen to call a Data Cloud, to stress the fact that it does not happen to have the organizational characteristics of a matrix in that Neural Groups are interchangeable, and individualized to the point where we can only approximate mapping them before the gyrus/sulcus level. In other words, the mathematical rigor of a matrix, is missing, but the same data can be represented in different ways in each version of the Data Cloud. The idea of a cloud of transitions, is therefor appropriate even though we will have to implement it as a matrix, we need to separate in our minds the fact that it is a representation, and that therefore the indexing mechanism of the matrix, cannot be used to sub-divide the matrix because then the model would exhibit levels of order that the micro-circuits in the brain can't. While we don't know exactly how implicit attention works, the concept is that this cloud of data, presents first to the Limbic System, (At least in humans) where it is evaluated for impact on the organism, and then to the Basal Ganglia where Reinforcement learning takes place, before being labeled with a system of frequency tags, that make the ACC capable of partitioning the transition cloud without needing an index to it.
The implicit partitioning then directs the processing within the cerebral cortex, to areas of high Emotional, Instinctive, or Reinforced partitions within the original data cloud, by suppressing all but the highest priority functional cluster. This is all done without temporal coding if I am correct. What we end up with is not a Markov chain, but more a Markov Cluster. Something that seems related mathematically, but for which I have no idea of how to calculate.
Once we have narrowed the data cloud significantly, the next step is probably the creation of a Thalamic Link to each of the active Neural Groups in the Makov Cluster. The idea being that if we can pre-activate a specific neural group, the contents of the signals that trigger that group, are actually, immaterial to the firing of the Neural Group. However they are not immaterial to the base implicit memory circuits, since feedback between the Neural Group and other areas in the brain, is activated, simulating the similarity/transition elements that made up the original Functional Cluster and thus recreating a slightly less accurate version of the Original Data-Cloud. Here we have an interesting step, essentially the cloud of transitions is represented by an array of addressable Neural Groups. I call this concept a chunk for historical reasons. If we think about it, we can see that while it doesn't contain the exact same contents, we can say that the array of addresses, is equivalent to the data cloud, if only because when it is rehearsed by sending it through the Thalamus, and selecting it again via the ACC, it presents a similar data cloud consisting mostly of the same data. Dr. Edelman has noted in The Remembered Present" that the contents because they involve restimulation of the implicit memory is recategorical, in the sense that anything that falls within the Neural Group categories, will be stimulated even if it was not part of the original memory. This means that the implicit data cloud, generated from any explicit addressing involving this thalamic connection, is updated by information found out since the original signals were laid into memory. I see the chunk representation as being a representation of the implicit functional clusters data cloud, in much the same way as matrices are representations of simultaneous equations. Architectonic theory however suggests that there are at least three areas of memory that are addressable in this manner, the core, the belt and the Association areas. However Architectonically speaking each of these areas has a distinctly different micro-circuit. Further, each sensory cluster has its own unique core, belt and Associative areas associated with it. What this means is that perception processing is a stove-pipe process where each sensory modality is processed separately, in at least three different steps that require three different micro-architectures.
Staining studies show that each of these areas is unique in some way in the processing it does. What we have achieved so far, is a description of the core function, and we still haven't described anything like a Markov Chain based on a coincidence detection scheme as required in the HTM.
But this is where we might get an idea of how such a scheme could connect to the previous processing, because the BELT areas, seem architectonically to implement a coincident detection scheme based on the discrete memories of Functional Clusters. The circuit, here unlike other neo-cortical circuits, seems to me to imply a connection between the core and the belt areas, that might allow the factoring out of coincidentally common patterns of knowledge in the core.
The way this is likely to work however, is that the micro-circuit in the belt area, has dual inputs, one from the implicit memory, and one from the core that allow it to activate neural groups that include implicit memories of past coincidences with current contents of the core. This means that a second level of Bayesian processing is required in the neurally based model in order to get to coincident detection levels.
One might wonder if the belt, allows us then to link the coincident detection mechanism with Markov Chains, but unfortunately we still don't have a method of temporalizing the data. There is literally no way, at this level of processing to tell if a memory was from yesterday or last week. However there are indications of a direct link between the belt areas and the Associative areas which research has noted seem to associate memories with different aspects of the data they store. I think that the best example of how this might work is Hofstaders Slip-net concept. He used this concept to define a program called copycat, that is used to find anagrams of letter combinations. (A toy application). What he suggested is that the data was not so much temporally coded, as associated with a pseudo-sequence by linking it to the event that came before and the event that came after. While there was no sequence per-se the precedence allowed for processing into a sequence in a later processing step.
In essence each node in the graph, has both a previous and a next entry pointing to another element in the sequence. These links are forged by the Associative areas (As well as a number of other links associated with other aspects that are important to the organism.) In other words, after association we can have Markov pseudo chains, but not before. Because they are pseudo chains multiple chains can have the same elements. The partitioning of the memory into particular Markov Pseudo-chains (and the transfer between each level of perception) is based on the partition of the data cloud arising from the combination of these three processing steps, and selection within them as to what elements go on to the next step. The sheer complexity of this neural model, suggests that HTM captures something much later in the processing, than suggested by the authors.
Although histo-psychology is in its infancy, and none of the assumptions made in this article are actually proven, I believe I have made a case that HTM is not the base case for neo-cortical function, by showing that the neo-cortical micro-circuits have to be combined, with Attention from outside the cortex, and go through a number of processes, to achieve the sophistication of the HTM model. Without fully understanding the nature of Bayesian Inference and Markov Chaining, I have suggested that there might be a similar process, I call LHM but not depending on coincident inference operating in the implicit memory system, and that the results of the output data cloud I have characterized for you, almost require that once the implicit memory is available, it has to be processed again to first form it into an explicit chunk of memory and then to factor out the coincidences, and then and only then to associate the incidences into pseudo chains, that can be modeled by Markov Chains. Because this LHM model more closely follows the architectonics of the brain, I suggest that the HTM model is too sophisticated for the Architectonics of the Neo-cortex micro-circuits but instead suggests a more complex structure consisting of at least 4 different types of attention, and three different zones of micro-circuits.
- George, Dileep; Hawkins, Jeff Towards a Mathematical Theory of Cortical Micro-circuits, http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pbci/1000532
- Fodor, Jerry The Mind Doesn't Work That Way!: Scope and Limitations of Computational Psychology,(2001) Mit Press ISBN 0262561468