Skip to content

Commit bfbf691

Browse files
committed
transformerless_lm: scaled sample bench includes Subsim arch
Added subsim_K32 to the scaled-up sampler factories. The launched run uses --archs dense_crt,subsim_K32 to focus the ~1h budget on the two archs that matter for the "does substrate produce coherent text at GPT-2-tiny scale?" question, dropping fibgen and composed (those have been measured at small scale already and gain little from being re-measured at scale on this hardware budget). Subsim is the substrate-native operator candidate. At d=128 it already closed the gap to dense from FibGen's +7.2% to +5.7% AND reached its best attractor 3x faster. The hypothesis at scale: if substrate compression preserves the patterns that govern language, Subsim will produce text within a noticeable-but-acceptable distance of dense. If dense produces coherent Shakespeare and Subsim produces gibberish, substrate compression breaks at scale and we need a different basis or a different operator.
1 parent 5448da1 commit bfbf691

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

experiments/transformerless_lm/sample_text_scaled.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
from corpus import make_dataset
2828
from models import make_model
2929
from models_fibgen import FibGenLM, FibGenTransformerless
30+
from models_subsim import SubsimLM
3031
from train_distractor_mix import build_distractor_stream
3132
from lazy_data import fib_positions_in_window, get_fib_strided_batch
3233
from sample_text import evaluate, train, generate_text
@@ -69,6 +70,11 @@ def main():
6970
vocab_size=vocab_size, d_model=args.d_model,
7071
n_blocks=args.n_blocks, seq_len=args.seq_len, K=32, mode="cross",
7172
),
73+
"subsim_K32": lambda: SubsimLM(
74+
vocab_size=vocab_size, d_model=args.d_model,
75+
n_blocks=args.n_blocks, seq_len=args.seq_len,
76+
K=32, fibgen_K=32, mode="cross",
77+
),
7278
"composed_transformerless": lambda: FibGenTransformerless(
7379
vocab_size=vocab_size, d_model=args.d_model, n_blocks=args.n_blocks,
7480
seq_len=args.seq_len, K=32, mode="cross", n_specialists=5,

0 commit comments

Comments
 (0)