transformerless_lm: scaled sample bench includes Subsim arch

claude · claude · commit bfbf6914202b · 2026-05-21T01:40:04.000Z
Added subsim_K32 to the scaled-up sampler factories. The launched
run uses --archs dense_crt,subsim_K32 to focus the ~1h budget on the
two archs that matter for the "does substrate produce coherent text
at GPT-2-tiny scale?" question, dropping fibgen and composed (those
have been measured at small scale already and gain little from being
re-measured at scale on this hardware budget).

Subsim is the substrate-native operator candidate. At d=128 it
already closed the gap to dense from FibGen's +7.2% to +5.7% AND
reached its best attractor 3x faster. The hypothesis at scale: if
substrate compression preserves the patterns that govern language,
Subsim will produce text within a noticeable-but-acceptable distance
of dense. If dense produces coherent Shakespeare and Subsim produces
gibberish, substrate compression breaks at scale and we need a
different basis or a different operator.
diff --git a/experiments/transformerless_lm/sample_text_scaled.py b/experiments/transformerless_lm/sample_text_scaled.py
@@ -27,6 +27,7 @@
 from corpus import make_dataset
 from models import make_model
 from models_fibgen import FibGenLM, FibGenTransformerless
+from models_subsim import SubsimLM
 from train_distractor_mix import build_distractor_stream
 from lazy_data import fib_positions_in_window, get_fib_strided_batch
 from sample_text import evaluate, train, generate_text
@@ -69,6 +70,11 @@ def main():
             vocab_size=vocab_size, d_model=args.d_model,
             n_blocks=args.n_blocks, seq_len=args.seq_len, K=32, mode="cross",
         ),
+        "subsim_K32": lambda: SubsimLM(
+            vocab_size=vocab_size, d_model=args.d_model,
+            n_blocks=args.n_blocks, seq_len=args.seq_len,
+            K=32, fibgen_K=32, mode="cross",
+        ),
         "composed_transformerless": lambda: FibGenTransformerless(
             vocab_size=vocab_size, d_model=args.d_model, n_blocks=args.n_blocks,
             seq_len=args.seq_len, K=32, mode="cross", n_specialists=5,