You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
transformerless_lm: sample_text — use BEST-val checkpoint, not final
Per the user's observation: substrate-aligned models snap to discrete
Fibonacci-tier attractor configurations during training, so the loss
curve jumps between attractor states rather than monotonically
descending. Standard val-at-final-step measurement understates the
quality of the best attractor the model visited.
sample_text.py now tracks the best-val checkpoint across all
evaluation points (every 200 steps), saves the state_dict at that
point, and loads it before generating. The text sample comes from
the BEST attractor configuration, not whatever attractor the model
happens to be sitting at when training stops.
This is the right way to measure substrate-model quality: the
substrate's discrete state space means optimization explores
multiple stable configurations, and the deployment-relevant one
is the lowest-val of those, not the temporally-last one.
0 commit comments