docs: microbench cross-tree index + synthesis-doc currency (org fixes) by Lightheartdevs · Pull Request #37 · Light-Heart-Labs/MMBT-Messy-Model-Bench-Tests

Lightheartdevs · 2026-06-01T01:54:45Z

Addresses the repo-organization audit (Tiers 1–3). No data changes — navigation, taxonomy, and currency only.

Tier 3 — taxonomy (the headline org problem)

The 12-family agentic microbench (a model-behavior study) is split across benchmarks/ and hardware-tests/ by which GPU/quant a model happened to need, not by the question asked. Low-disruption fix (no directory moves, no broken links/history):

NEW MICROBENCH-INDEX.md — one table of every 12-family microbench across both trees, plus disambiguation of the four "27B"s (AWQ / Q8 / FP8 / 35B-A3B sibling), and the one finding that holds across all of them (thinking net-negative).
"Where this lives" taxonomy notes in the benchmarks/ microbench READMEs. (The 397B and 27B-FP8 entry notes ship in PRs Add MiniMax-M2.7-NVFP4 (N=5, TP=2): temp serving-trap + exhaustive-completer findings #33 / Qwen3.6-27B-FP8 full microbench N=5 — think vs no-think (clean FP8 redo) #34.)

Tier 2 — currency (synthesis docs were frozen at 2026-05-02)

SCORECARD.md: added a 27B-quant scope banner (its "27B" = AWQ — kills the conflation risk), a newer-results summary (397B / Step / MiniMax / 27B-FP8), and marked the "FP8 higher-precision" follow-up partly-done.
COMPARISON.md: "The FP8 re-run is the highest-priority follow-up" → done, cross-linked, with the confirmed result (thinking still net-negative at FP8).
ROADMAP.md item 1: marked ✅ DONE for 27B; remaining FP8 work narrowed to Coder-Next + 35B-A3B.

Tier 1 — discoverability

Indexed the two previously-unindexed hardware-tests entries (best-stack-followup, qwen3.5-397b-vs-step3.7-flash) in both the root README and hardware-tests/README.md tables.
Added a MICROBENCH-INDEX.md row to the root "five-minute answers" table.

Merge notes

References the 27B-FP8 (PR Qwen3.6-27B-FP8 full microbench N=5 — think vs no-think (clean FP8 redo) #34) and MiniMax (PR Add MiniMax-M2.7-NVFP4 (N=5, TP=2): temp serving-trap + exhaustive-completer findings #33) entries — merge those first so the links resolve.
Light expected overlap with Qwen3.6-27B-FP8 full microbench N=5 — think vs no-think (clean FP8 redo) #34 on the README index tables (rows inserted at different anchors to minimize conflict). claims.yaml is not touched here (its merge-order is between Add MiniMax-M2.7-NVFP4 (N=5, TP=2): temp serving-trap + exhaustive-completer findings #33 and Qwen3.6-27B-FP8 full microbench N=5 — think vs no-think (clean FP8 redo) #34).

🤖 Generated with Claude Code

…org fixes) Addresses the repo-organization audit findings: Tier 3 (taxonomy): the 12-family agentic microbench (a model-behavior study) is split across benchmarks/ and hardware-tests/ by which GPU/quant a model needed, not by question. Low-disruption fix (no directory moves): - NEW MICROBENCH-INDEX.md — gathers all 12-family microbench entries across both trees + disambiguates the four "27B"s (AWQ / Q8 / FP8 / 35B-A3B). - "where this lives" taxonomy notes in the benchmarks/ microbench READMEs (the 397B and 27B-FP8 entry notes ship in PRs #33/#34). Tier 2 (currency): SCORECARD/COMPARISON/ROADMAP were frozen at 2026-05-02 and still listed the now-done FP8 re-run as future work. - SCORECARD: 27B-quant scope banner (its "27B" = AWQ) + newer-results summary (397B/Step/MiniMax/FP8) + "would change this picture" #6 marked partly-done. - COMPARISON: "FP8 re-run is highest-priority follow-up" -> done, cross-linked. - ROADMAP item 1: marked DONE for 27B, narrowed remaining to Coder/35B-A3B. Tier 1 (discoverability): indexed the previously-unindexed best-stack and qwen3.5-397b entries in root README + hardware-tests/README; added a MICROBENCH-INDEX row to the five-minute-answers table. Merge note: references the 27B-FP8 (PR #34) and MiniMax (PR #33) entries — merge those first. Light expected overlap with #34 on the README index tables (inserted at different anchors to minimize it) and with #33/#34 on claims.yaml (not touched here). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Lightheartdevs merged commit 1135170 into main Jun 1, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: microbench cross-tree index + synthesis-doc currency (org fixes)#37

docs: microbench cross-tree index + synthesis-doc currency (org fixes)#37
Lightheartdevs merged 1 commit into
mainfrom
docs-microbench-index-and-currency-2026-05-31

Lightheartdevs commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Lightheartdevs commented Jun 1, 2026

Tier 3 — taxonomy (the headline org problem)

Tier 2 — currency (synthesis docs were frozen at 2026-05-02)

Tier 1 — discoverability

Merge notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant