Skip to content

Commit 15969bc

Browse files
committed
version 2 of Jang et al, and mass-fix of ly- to not use the hypen (happy Jake? :)
1 parent 9f59b1d commit 15969bc

53 files changed

Lines changed: 225 additions & 225 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

content/abstract-neural-network.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,17 @@ bibfile = "ccnlab.json"
77

88
Perhaps the most fundamental shared feature of all ANN models is the distinction between [[activation]]s and [[weights]], where activations are dynamic state variables representing neural activity that are communicated to other neurons over weighted synaptic connections. [[Learning]] happens by changing the strengths of these synaptic weights, typically under the influence of the local activation state.
99

10-
After providing an overview of these models, we consider how they relate to neuroscience in general and [[Axon]] more specifically. In summary, Axon has a number of neurobiologically-motivated properties that differ significantly from most ANN models, which have potentially important functional implications.
10+
After providing an overview of these models, we consider how they relate to neuroscience in general and [[Axon]] more specifically. In summary, Axon has a number of neurobiologically motivated properties that differ significantly from most ANN models, which have potentially important functional implications.
1111

12-
Historically, early developments included Frank Rosenblatt's two-layer _perceptron_ model ([[@Rosenblatt59]]; [[@Rosenblatt62]]), which was critiqued by [[@^MinskyPapert69]], leading to a widespread rejection of this framework (which was clearly misguided, in retrospect). The ability to use _hidden layers_ via the [[error-backpropagation]] framework developed by [[@^RumelhartHintonWilliams86]] reinvigorated interest in these models. However, after a subsequent decline in interest (and focus on more rigorous statistically-based frameworks), the remarkable performance of the [[@^KrizhevskySutskeverHinton12]] "AlexNet" deep neural network, running on a [[GPU]], triggered a resurgence of interest that has grown exponentially to this day. See [[@^LeCunBengioHinton15]] and [[@^Schmidhuber15a]] for historical reviews, and [[@^RumelhartMcClelland86]] for the foundational insights that remain highly relevant to this day.
12+
Historically, early developments included Frank Rosenblatt's two-layer _perceptron_ model ([[@Rosenblatt59]]; [[@Rosenblatt62]]), which was critiqued by [[@^MinskyPapert69]], leading to a widespread rejection of this framework (which was clearly misguided, in retrospect). The ability to use _hidden layers_ via the [[error-backpropagation]] framework developed by [[@^RumelhartHintonWilliams86]] reinvigorated interest in these models. However, after a subsequent decline in interest (and focus on more rigorous statistically based frameworks), the remarkable performance of the [[@^KrizhevskySutskeverHinton12]] "AlexNet" deep neural network, running on a [[GPU]], triggered a resurgence of interest that has grown exponentially to this day. See [[@^LeCunBengioHinton15]] and [[@^Schmidhuber15a]] for historical reviews, and [[@^RumelhartMcClelland86]] for the foundational insights that remain highly relevant to this day.
1313

1414
The current, widely used [[large language models]] (LLMs) represent the most impactful result of this line of work, used daily by millions of people in the form of ChatGPT and various other related models. Interestingly, the core mechanisms of this model can be directly traced back to the [[error-backpropagagion]] models of the 1980s, with major advances deriving from tremendous improvements in the size of both the models and the amount of data that they can be trained on, resulting from major improvements in [[GPU]] technology.
1515

1616
The ability to train an LLM on essentially the entire corpus of human-generated text on the internet (i.e., the _big data_ approach) derives from the fact that the LLM is based on [[predictive learning]], which is a core feature of the [[Axon]] model and, by hypothesis, the mammalian [[neocortex]]. By contrast, the AlexNet model required a large corpus of human-labeled images (the _ImageNet_ corpus; [[@DengDongSocherEtAl09]]) to provide the category labels that drive the error signals in that and many other error backpropagation models. Specifically, an LLM learns from the error signals generated by trying to predict the next word in a stream of language input. It is remarkable that this simple principle can lead to the amazing abilities of such models.
1717

1818
In general, the progress of ANN models is consistent with the [bitter lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) articulated by Rich Sutton, which is essentially that _big data_ consumed by relatively simple, general-purpose models ultimately prevails over attempts to develop more complex, bespoke algorithms and representations. This is an instance of the [[bias-variance tradeoff]], which is a basic statistical principle governing the tradeoff between building in stronger biases into a learning system, vs using more general-purpose, unbiased models. Stronger biases are beneficial when data is scarce, to reduce the amount of variance in learning outcomes (which is directly associated with the ability of the model to generalize to novel inputs). But when data is plentiful, biases are unnecessary and even harmful.
1919

20-
In this context, Axon is somewhere in between. The neocortical [[error-driven learning]] mechanisms in Axon are effectively a biologically-based version of the primary general-purpose learning mechanisms in most ANN models. However, the [[Rubicon]] model represents a strongly [[evolution|evolutionarily]]-shaped set of subcortical brain areas that drive a more strongly biased form of goal-driven learning. This system enables the simulated organism to learn new skills with significantly fewer learning trials, consistent with stronger biases.
20+
In this context, Axon is somewhere in between. The neocortical [[error-driven learning]] mechanisms in Axon are effectively a biologically based version of the primary general-purpose learning mechanisms in most ANN models. However, the [[Rubicon]] model represents a strongly [[evolution|evolutionarily]]-shaped set of subcortical brain areas that drive a more strongly biased form of goal-driven learning. This system enables the simulated organism to learn new skills with significantly fewer learning trials, consistent with stronger biases.
2121

2222
One of the most important differences between the Axon framework and the vast majority of ANN models is that these models use strictly [[feedforward connections]], such that information only flows in one direction throughout the network ("forward"). This constraint enables error gradients to be efficiently computed, and also greatly simplifies the activation dynamics of the models. Whenever you include [[bidirectional connectivity]] in a network, which is a core feature of Axon, the resulting _positive feedback loops_ create significant problems for both error backpropagation and the overall behavior of the network.
2323

@@ -33,7 +33,7 @@ The following are some of the key advances in modern ANN models relative to the
3333

3434
Interestingly, the exponential decay phenomenon is at least partially mitigated in the brain despite the presence of saturating nonlinearities, by virtue of robust [[inhibition]] that keeps most neurons well below the saturation levels of activity. Indeed, neurons in awake behaving [[neocortex]] are characterized as being precisely balanced between inhibition and excitation ([[@ShadlenNewsome98]]; [[@OkunLampl08]]; [[@IsaacsonScanziani11]]; [[@RubinAbbottSompolinsky17]]). This balance right around the threshold of firing also makes them more chaotic and contributes to the Poisson noise observed in spiking neurons, which could also potentially amplify responses to the temporal differences that drive learning in the [[GeneRec]] and [[kinase algorithm]]s.
3535

36-
* **Shortcut connections and the importance of _residual_ error.** The original insights of [[@^Schraudolph98]] about the importance of focusing learning on _residual error_ signals (e.g., by subtracting the mean) was amplified and elaborated in the _ResNet_ architecture of [[@^HeZhangRenEtAl15]] which includes pervasive _shortcut_ connections between deep layers. The key idea is that these shortcut connections allow higher layers to automatically benefit from all of the knowledge represented in the lower layers, so that they can focus on learning the _residual_ information beyond this more "basic" level. An early influential idea along these lines was the _cascade correlation_ algorithm of [[@^FahlmanLebiere89]], which involved progressively adding new units as a function of residual error. Other widely-used _normalization_ mechanisms (e.g., "batch norm") also provide a centering, residual-error focus.
36+
* **Shortcut connections and the importance of _residual_ error.** The original insights of [[@^Schraudolph98]] about the importance of focusing learning on _residual error_ signals (e.g., by subtracting the mean) was amplified and elaborated in the _ResNet_ architecture of [[@^HeZhangRenEtAl15]] which includes pervasive _shortcut_ connections between deep layers. The key idea is that these shortcut connections allow higher layers to automatically benefit from all of the knowledge represented in the lower layers, so that they can focus on learning the _residual_ information beyond this more "basic" level. An early influential idea along these lines was the _cascade correlation_ algorithm of [[@^FahlmanLebiere89]], which involved progressively adding new units as a function of residual error. Other widely used _normalization_ mechanisms (e.g., "batch norm") also provide a centering, residual-error focus.
3737

3838
Shortcut connections are a prominent feature of the brain and are often used in Axon models. Furthermore, the ubiquitous pooled [[inhibition]] in the brain, which is also essential for Axon, provides a form of normalization and dynamic range centering.
3939

content/acetylcholine.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ In general, the ACh system is dauntingly complex at a biological level, with mul
1717

1818
### Volume transmission
1919

20-
Like other neuromodulators such as [[dopamine]] ACh is typically released via volume transmission over broad neural areas, delivering the ACh signal across all neurons within volumes of neural tissue, with huge numbers of release sites per neuron. Furthermore, individual neurons project to diverse, functionally-related targets, effectively binding together distal neural areas under a common neuromodulatory signal ([[@AnanthRajebhosaleKimEtAl23]]).
20+
Like other neuromodulators such as [[dopamine]] ACh is typically released via volume transmission over broad neural areas, delivering the ACh signal across all neurons within volumes of neural tissue, with huge numbers of release sites per neuron. Furthermore, individual neurons project to diverse, functionally related targets, effectively binding together distal neural areas under a common neuromodulatory signal ([[@AnanthRajebhosaleKimEtAl23]]).
2121

2222
### Rapid phasic responding
2323

24-
As with most neuromodulators, there are both tonic and phasic aspects of ACh release. There is clear evidence for rapid, phasic, behaviorally-correlated firing in BFCNs ([[@LaszlovszkySchlingloffHegedusEtAl20]]) consistent with the Rubicon model.
24+
As with most neuromodulators, there are both tonic and phasic aspects of ACh release. There is clear evidence for rapid, phasic, behaviorally correlated firing in BFCNs ([[@LaszlovszkySchlingloffHegedusEtAl20]]) consistent with the Rubicon model.
2525

2626
### Salience encoding
2727

0 commit comments

Comments
 (0)