Skip to content

Commit 61f7ac0

Browse files
unamedkrclaude
andcommitted
docs: env_vars.md — single-page reference for all TQ_* runtime envs
Consolidates scattered env var mentions across RELEASE_NOTES.md, comments in tq_transformer.c, and memory notes into one table. Covers: - Performance controls (TQ_NO_METAL / MLOCK / Q4 / BATCH_PREFILL / MOE_BATCH) - Quality controls (TQ_NO_AUTO_SERIAL, TQ_FORCE_QK_NORM, TQ_ROPE_PAIRS) - General debug (TQ_DEBUG, TQ_DEBUG_PREFILL, TQ_DEBUG_WQ) - Refparity framework (TQ_DUMP_HIDDEN, TQ_DUMP_POS, TQ_DUMP_INTERMEDIATE) - DeltaNet ablations (TQ_DELTA_PROBE, TQ_DELTA_RESET_EVERY, TQ_DELTA_RESET_LAYER) Includes concrete copy-paste examples for BPE regression, refparity diff, DeltaNet state probe, and the 35B user-recommended config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ad30813 commit 61f7ac0

1 file changed

Lines changed: 97 additions & 0 deletions

File tree

docs/env_vars.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Environment Variables
2+
3+
Reference for `TQ_*` runtime env vars. Grouped by purpose. Everything
4+
here is opt-in; defaults are the tested production path.
5+
6+
## Performance / resource controls
7+
8+
| Var | Default | Purpose |
9+
|---|---|---|
10+
| `TQ_NO_METAL` | off | Skip Metal (Apple GPU) path; force CPU-only |
11+
| `TQ_NO_MLOCK` | off | Don't `mlock` the mmap'd weights; lets OS page out cold experts on small machines |
12+
| `TQ_NO_Q4` | off | Skip load-time FP32→internal-Q4 recompression; use on-the-fly GGUF dequant. Quality tradeoff — see `state.md` R5 |
13+
| `TQ_NO_BATCH_PREFILL` | off | Force per-token prefill (disables batched matrix prefill path) |
14+
| `TQ_NO_MOE_BATCH` | off | Opt-out of batched MoE dispatch (default-on). Restores per-token MoE forward |
15+
| `TQ_NO_MOE_BATCH_DYNAMIC` | off | Opt-out of FCFS dynamic dispatch (default-on). Wave-mode expert dispatch instead |
16+
| `TQ_MOE_BATCH_CHUNK` | 8 | Tokens per batched MoE call (1-20 sensible range); larger = more speedup, worse numerical stability above ~20 |
17+
| `TQ_MOE_BATCH_SELFTEST` | off | Route N=1 MoE through batch(N=1) kernel — proves equivalence vs per-token path |
18+
| `TQ_PHI3_SPLIT` | 0 | Phi-3 fused QKV/FFN split to separate Q4 weights. **Off by default** — degrades chat quality per feedback/perf_commits_need_chat_test |
19+
| `TQ_MOE_FAST_EXP` | off | Use Schraudolph fast-exp in MoE SwiGLU (vs exact expf default). ~2% per-call error; may re-introduce long-gen drift |
20+
21+
## Quality / correctness
22+
23+
| Var | Default | Purpose |
24+
|---|---|---|
25+
| `TQ_NO_AUTO_SERIAL` | off | Opt-out of Qwen3.6 auto single-thread mode. Multi-thread is non-deterministic at T=0 — default forces `-j 1` on qwen35moe+DeltaNet hybrid. Cost: ~2-3× slower decode |
26+
| `TQ_FORCE_QK_NORM` | off | Force QK-norm on Qwen hybrid (normally disabled for that arch) |
27+
| `TQ_ROPE_PAIRS` | off | Force LLaMA-style interleaved RoPE pairs (overrides NEOX auto-detect) |
28+
| `TQ_NO_PLE` | off | Disable Gemma-4 per-layer-embedding path |
29+
30+
## Debugging — general
31+
32+
| Var | Default | Purpose |
33+
|---|---|---|
34+
| `TQ_DEBUG` | off | Prints per-layer output norms, attention range, tokenized prompt, etc. |
35+
| `TQ_DEBUG_PREFILL` | off | Per-layer `final x sum` / `sumabs` during prefill (layers 0-3) |
36+
| `TQ_DEBUG_WQ` | off | L0 pre-norm RMS at first token |
37+
38+
## Debugging — refparity framework
39+
40+
The `tools/refparity/` framework uses these to produce comparable dumps
41+
against HF FP32 reference. Do not enable in production — each dump
42+
is a fsync'd file.
43+
44+
| Var | Value | Purpose |
45+
|---|---|---|
46+
| `TQ_DUMP_HIDDEN` | `/path/to/dir` | Dump `emb.bin`, `h0.bin``hN.bin`, `post_norm.bin`, `logits.bin` (one raw FP32 file per slot) |
47+
| `TQ_DUMP_POS` | `0` (default) or `N` or `all` | Which token position to dump. `all` is expensive (28 × seq_len files) |
48+
| `TQ_DUMP_INTERMEDIATE` | off | Also dump per-layer sub-stage: `h{l}_in/postattn/preffn/ffnout` — bisects attention vs FFN divergences |
49+
50+
## Debugging — DeltaNet (Qwen3.5/3.6)
51+
52+
Added in the 2026-04-21 DeltaNet investigation. Probe or ablate the
53+
recurrent state to localize drift.
54+
55+
| Var | Value | Purpose |
56+
|---|---|---|
57+
| `TQ_DELTA_PROBE` | `call1,call2,...` | Print per-layer `delta_state` L2 norm at listed layer-0 call counts. E.g. `TQ_DELTA_PROBE=50,100,115,120` |
58+
| `TQ_DELTA_RESET_EVERY` | `N` | Zero `delta_state` + `conv_state` every N-th layer-0 call. Diagnostic only (destroys useful context) |
59+
| `TQ_DELTA_RESET_LAYER` | `N` or unset | Combined with `RESET_EVERY`, clears only that layer's slice. `-1` or unset = all layers |
60+
61+
## Examples
62+
63+
**Reproduce BPE UTF-8 regression suite**:
64+
```bash
65+
bash scripts/test_models.sh # runs test_tokenizer.sh at tail
66+
```
67+
68+
**Reference-parity diff on one model**:
69+
```bash
70+
export PYTHONPATH=tools/pillar1/venv/lib/python3.12/site-packages
71+
python tools/refparity/hf_reference.py --model Qwen/Qwen3-0.6B --prompt "Hello" --out /tmp/ref.npz
72+
TQ_DUMP_HIDDEN=/tmp/eng TQ_NO_METAL=1 TQ_NO_MLOCK=1 TQ_NO_BATCH_PREFILL=1 TQ_NO_AUTO_SERIAL=1 \
73+
./build/quant models/Qwen3-0.6B-Q4_K_M.gguf -p "Hello" -n 1 -T 0
74+
python tools/refparity/diff_layers.py /tmp/ref.npz /tmp/eng
75+
```
76+
77+
**Probe Qwen3.6 DeltaNet state at drift boundary**:
78+
```bash
79+
TQ_DELTA_PROBE=50,100,115,118,120 \
80+
./build/quant models/Qwen3.6-35B-A3B-UD-IQ4_XS.gguf \
81+
-p "Once upon a time in a faraway land" -n 125 -T 0 2>&1 | grep delta-probe
82+
```
83+
84+
**35B best-quality user config**:
85+
```bash
86+
./build/quant models/Qwen3.6-35B-A3B-UD-Q5_K_M.gguf \
87+
-p "<your prompt>" -n 200 -T 0 --rep-penalty 1.3
88+
```
89+
90+
## Notes
91+
92+
- Most `TQ_NO_*` envs exist because the default path has a correctness
93+
or quality tradeoff someone wanted to A/B. Flipping them usually
94+
trades speed for determinism or vice versa. Read `state.md` and
95+
`bench/results/` for the measured impact before relying on any.
96+
- New envs land with `state.md` entries documenting *why* they exist.
97+
Don't add undocumented envs.

0 commit comments

Comments
 (0)