You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/abstract-neural-network.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,17 +7,17 @@ bibfile = "ccnlab.json"
7
7
8
8
Perhaps the most fundamental shared feature of all ANN models is the distinction between [[activation]]s and [[weights]], where activations are dynamic state variables representing neural activity that are communicated to other neurons over weighted synaptic connections. [[Learning]] happens by changing the strengths of these synaptic weights, typically under the influence of the local activation state.
9
9
10
-
After providing an overview of these models, we consider how they relate to neuroscience in general and [[Axon]] more specifically. In summary, Axon has a number of neurobiologically-motivated properties that differ significantly from most ANN models, which have potentially important functional implications.
10
+
After providing an overview of these models, we consider how they relate to neuroscience in general and [[Axon]] more specifically. In summary, Axon has a number of neurobiologicallymotivated properties that differ significantly from most ANN models, which have potentially important functional implications.
11
11
12
-
Historically, early developments included Frank Rosenblatt's two-layer _perceptron_ model ([[@Rosenblatt59]]; [[@Rosenblatt62]]), which was critiqued by [[@^MinskyPapert69]], leading to a widespread rejection of this framework (which was clearly misguided, in retrospect). The ability to use _hidden layers_ via the [[error-backpropagation]] framework developed by [[@^RumelhartHintonWilliams86]] reinvigorated interest in these models. However, after a subsequent decline in interest (and focus on more rigorous statistically-based frameworks), the remarkable performance of the [[@^KrizhevskySutskeverHinton12]] "AlexNet" deep neural network, running on a [[GPU]], triggered a resurgence of interest that has grown exponentially to this day. See [[@^LeCunBengioHinton15]] and [[@^Schmidhuber15a]] for historical reviews, and [[@^RumelhartMcClelland86]] for the foundational insights that remain highly relevant to this day.
12
+
Historically, early developments included Frank Rosenblatt's two-layer _perceptron_ model ([[@Rosenblatt59]]; [[@Rosenblatt62]]), which was critiqued by [[@^MinskyPapert69]], leading to a widespread rejection of this framework (which was clearly misguided, in retrospect). The ability to use _hidden layers_ via the [[error-backpropagation]] framework developed by [[@^RumelhartHintonWilliams86]] reinvigorated interest in these models. However, after a subsequent decline in interest (and focus on more rigorous statisticallybased frameworks), the remarkable performance of the [[@^KrizhevskySutskeverHinton12]] "AlexNet" deep neural network, running on a [[GPU]], triggered a resurgence of interest that has grown exponentially to this day. See [[@^LeCunBengioHinton15]] and [[@^Schmidhuber15a]] for historical reviews, and [[@^RumelhartMcClelland86]] for the foundational insights that remain highly relevant to this day.
13
13
14
14
The current, widely used [[large language models]] (LLMs) represent the most impactful result of this line of work, used daily by millions of people in the form of ChatGPT and various other related models. Interestingly, the core mechanisms of this model can be directly traced back to the [[error-backpropagagion]] models of the 1980s, with major advances deriving from tremendous improvements in the size of both the models and the amount of data that they can be trained on, resulting from major improvements in [[GPU]] technology.
15
15
16
16
The ability to train an LLM on essentially the entire corpus of human-generated text on the internet (i.e., the _big data_ approach) derives from the fact that the LLM is based on [[predictive learning]], which is a core feature of the [[Axon]] model and, by hypothesis, the mammalian [[neocortex]]. By contrast, the AlexNet model required a large corpus of human-labeled images (the _ImageNet_ corpus; [[@DengDongSocherEtAl09]]) to provide the category labels that drive the error signals in that and many other error backpropagation models. Specifically, an LLM learns from the error signals generated by trying to predict the next word in a stream of language input. It is remarkable that this simple principle can lead to the amazing abilities of such models.
17
17
18
18
In general, the progress of ANN models is consistent with the [bitter lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) articulated by Rich Sutton, which is essentially that _big data_ consumed by relatively simple, general-purpose models ultimately prevails over attempts to develop more complex, bespoke algorithms and representations. This is an instance of the [[bias-variance tradeoff]], which is a basic statistical principle governing the tradeoff between building in stronger biases into a learning system, vs using more general-purpose, unbiased models. Stronger biases are beneficial when data is scarce, to reduce the amount of variance in learning outcomes (which is directly associated with the ability of the model to generalize to novel inputs). But when data is plentiful, biases are unnecessary and even harmful.
19
19
20
-
In this context, Axon is somewhere in between. The neocortical [[error-driven learning]] mechanisms in Axon are effectively a biologically-based version of the primary general-purpose learning mechanisms in most ANN models. However, the [[Rubicon]] model represents a strongly [[evolution|evolutionarily]]-shaped set of subcortical brain areas that drive a more strongly biased form of goal-driven learning. This system enables the simulated organism to learn new skills with significantly fewer learning trials, consistent with stronger biases.
20
+
In this context, Axon is somewhere in between. The neocortical [[error-driven learning]] mechanisms in Axon are effectively a biologicallybased version of the primary general-purpose learning mechanisms in most ANN models. However, the [[Rubicon]] model represents a strongly [[evolution|evolutionarily]]-shaped set of subcortical brain areas that drive a more strongly biased form of goal-driven learning. This system enables the simulated organism to learn new skills with significantly fewer learning trials, consistent with stronger biases.
21
21
22
22
One of the most important differences between the Axon framework and the vast majority of ANN models is that these models use strictly [[feedforward connections]], such that information only flows in one direction throughout the network ("forward"). This constraint enables error gradients to be efficiently computed, and also greatly simplifies the activation dynamics of the models. Whenever you include [[bidirectional connectivity]] in a network, which is a core feature of Axon, the resulting _positive feedback loops_ create significant problems for both error backpropagation and the overall behavior of the network.
23
23
@@ -33,7 +33,7 @@ The following are some of the key advances in modern ANN models relative to the
33
33
34
34
Interestingly, the exponential decay phenomenon is at least partially mitigated in the brain despite the presence of saturating nonlinearities, by virtue of robust [[inhibition]] that keeps most neurons well below the saturation levels of activity. Indeed, neurons in awake behaving [[neocortex]] are characterized as being precisely balanced between inhibition and excitation ([[@ShadlenNewsome98]]; [[@OkunLampl08]]; [[@IsaacsonScanziani11]]; [[@RubinAbbottSompolinsky17]]). This balance right around the threshold of firing also makes them more chaotic and contributes to the Poisson noise observed in spiking neurons, which could also potentially amplify responses to the temporal differences that drive learning in the [[GeneRec]] and [[kinase algorithm]]s.
35
35
36
-
***Shortcut connections and the importance of _residual_ error.** The original insights of [[@^Schraudolph98]] about the importance of focusing learning on _residual error_ signals (e.g., by subtracting the mean) was amplified and elaborated in the _ResNet_ architecture of [[@^HeZhangRenEtAl15]] which includes pervasive _shortcut_ connections between deep layers. The key idea is that these shortcut connections allow higher layers to automatically benefit from all of the knowledge represented in the lower layers, so that they can focus on learning the _residual_ information beyond this more "basic" level. An early influential idea along these lines was the _cascade correlation_ algorithm of [[@^FahlmanLebiere89]], which involved progressively adding new units as a function of residual error. Other widely-used _normalization_ mechanisms (e.g., "batch norm") also provide a centering, residual-error focus.
36
+
***Shortcut connections and the importance of _residual_ error.** The original insights of [[@^Schraudolph98]] about the importance of focusing learning on _residual error_ signals (e.g., by subtracting the mean) was amplified and elaborated in the _ResNet_ architecture of [[@^HeZhangRenEtAl15]] which includes pervasive _shortcut_ connections between deep layers. The key idea is that these shortcut connections allow higher layers to automatically benefit from all of the knowledge represented in the lower layers, so that they can focus on learning the _residual_ information beyond this more "basic" level. An early influential idea along these lines was the _cascade correlation_ algorithm of [[@^FahlmanLebiere89]], which involved progressively adding new units as a function of residual error. Other widelyused _normalization_ mechanisms (e.g., "batch norm") also provide a centering, residual-error focus.
37
37
38
38
Shortcut connections are a prominent feature of the brain and are often used in Axon models. Furthermore, the ubiquitous pooled [[inhibition]] in the brain, which is also essential for Axon, provides a form of normalization and dynamic range centering.
Copy file name to clipboardExpand all lines: content/acetylcholine.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,11 +17,11 @@ In general, the ACh system is dauntingly complex at a biological level, with mul
17
17
18
18
### Volume transmission
19
19
20
-
Like other neuromodulators such as [[dopamine]] ACh is typically released via volume transmission over broad neural areas, delivering the ACh signal across all neurons within volumes of neural tissue, with huge numbers of release sites per neuron. Furthermore, individual neurons project to diverse, functionally-related targets, effectively binding together distal neural areas under a common neuromodulatory signal ([[@AnanthRajebhosaleKimEtAl23]]).
20
+
Like other neuromodulators such as [[dopamine]] ACh is typically released via volume transmission over broad neural areas, delivering the ACh signal across all neurons within volumes of neural tissue, with huge numbers of release sites per neuron. Furthermore, individual neurons project to diverse, functionallyrelated targets, effectively binding together distal neural areas under a common neuromodulatory signal ([[@AnanthRajebhosaleKimEtAl23]]).
21
21
22
22
### Rapid phasic responding
23
23
24
-
As with most neuromodulators, there are both tonic and phasic aspects of ACh release. There is clear evidence for rapid, phasic, behaviorally-correlated firing in BFCNs ([[@LaszlovszkySchlingloffHegedusEtAl20]]) consistent with the Rubicon model.
24
+
As with most neuromodulators, there are both tonic and phasic aspects of ACh release. There is clear evidence for rapid, phasic, behaviorallycorrelated firing in BFCNs ([[@LaszlovszkySchlingloffHegedusEtAl20]]) consistent with the Rubicon model.
0 commit comments