transformerless_lm: split-brain mixer -> golden-weighted arithmetic

claude · claude · commit e1269d7e6d21 · 2026-05-22T17:06:28.000Z
v73 geometric mean (sqrt(p_math * p_lang)) was over-conservative.
Required both hemispheres to consent; valid spikes from one
hemisphere alone got cancelled.

v74 mixer: (phi * p_math + p_lang) / (phi + 1)

Math gets phi=1.618 weight (older substrate foundation = primary).
Lang gets 1.0 weight (modulator). Both contribute additively in
probability space. High-confidence proposals from either come
through without requiring agreement.

Substrate-canonical weights (golden ratio).
diff --git a/experiments/transformerless_lm/train_self_recursive.py b/experiments/transformerless_lm/train_self_recursive.py
@@ -1327,29 +1327,33 @@ def _omniweight_apply(base_probs: torch.Tensor,
 def _omniweight_apply_split(base_probs: torch.Tensor,
                                 math_delta: torch.Tensor,
                                 lang_delta: torch.Tensor) -> torch.Tensor:
-    """SPLIT-BRAIN omniweight: two registers, geometric-mean mixer.
+    """SPLIT-BRAIN omniweight: two registers, golden-weighted mixer.
 
     Math hemisphere: bigram, recency, substrate sampling, anti-stag,
     bigram-saturation. Frequency / decay primitives.
 
     Language hemisphere: iambic, anaphora, need-fill, phonotactics,
-    rhyme, agreement, word-spacing, char-cascade, pronunciation,
+    rhyme, agreement, word-spacing, char-cascade, pronounceability,
     subject-threading, theme. Purpose / structure primitives.
 
     Each hemisphere builds its own fluid delta via tanh-scaled
-    substrate reserve. Final distribution = geometric mean of the
-    two -- a token survives only if both hemispheres consent.
-
-    Pure substrate (phi^pi reserve, sqrt mixing = Bayesian PoE).
+    substrate reserve. Final distribution = golden-weighted arithmetic
+    mean:  (phi * p_math + p_lang) / (phi + 1).
+
+    Math gets phi=1.618 weight (older substrate foundation, primary
+    signal). Lang gets 1.0 weight (modulator). Both contribute --
+    high-confidence proposals from either come through. Less
+    restrictive than geometric mean which required both-consent
+    (v73 was over-conservative).
     """
     math_fluid = _OMNIWEIGHT_RESERVE * torch.tanh(math_delta / _OMNIWEIGHT_RESERVE)
     lang_fluid = _OMNIWEIGHT_RESERVE * torch.tanh(lang_delta / _OMNIWEIGHT_RESERVE)
     p_math = base_probs * torch.exp(math_fluid)
     p_lang = base_probs * torch.exp(lang_fluid)
     p_math = p_math / (p_math.sum() + 1e-8)
     p_lang = p_lang / (p_lang.sum() + 1e-8)
-    # Geometric mean (Bayesian product of experts).
-    p_final = torch.sqrt(p_math * p_lang)
+    phi = _PHI_FOR_SAMPLING
+    p_final = (phi * p_math + p_lang) / (phi + 1.0)
     return p_final / (p_final.sum() + 1e-8)