Skip to content

docs: microbench cross-tree index + synthesis-doc currency (org fixes)#37

Merged
Lightheartdevs merged 1 commit into
mainfrom
docs-microbench-index-and-currency-2026-05-31
Jun 1, 2026
Merged

docs: microbench cross-tree index + synthesis-doc currency (org fixes)#37
Lightheartdevs merged 1 commit into
mainfrom
docs-microbench-index-and-currency-2026-05-31

Conversation

@Lightheartdevs

Copy link
Copy Markdown
Contributor

Addresses the repo-organization audit (Tiers 1–3). No data changes — navigation, taxonomy, and currency only.

Tier 3 — taxonomy (the headline org problem)

The 12-family agentic microbench (a model-behavior study) is split across benchmarks/ and hardware-tests/ by which GPU/quant a model happened to need, not by the question asked. Low-disruption fix (no directory moves, no broken links/history):

Tier 2 — currency (synthesis docs were frozen at 2026-05-02)

  • SCORECARD.md: added a 27B-quant scope banner (its "27B" = AWQ — kills the conflation risk), a newer-results summary (397B / Step / MiniMax / 27B-FP8), and marked the "FP8 higher-precision" follow-up partly-done.
  • COMPARISON.md: "The FP8 re-run is the highest-priority follow-up" → done, cross-linked, with the confirmed result (thinking still net-negative at FP8).
  • ROADMAP.md item 1: marked ✅ DONE for 27B; remaining FP8 work narrowed to Coder-Next + 35B-A3B.

Tier 1 — discoverability

  • Indexed the two previously-unindexed hardware-tests entries (best-stack-followup, qwen3.5-397b-vs-step3.7-flash) in both the root README and hardware-tests/README.md tables.
  • Added a MICROBENCH-INDEX.md row to the root "five-minute answers" table.

Merge notes

🤖 Generated with Claude Code

…org fixes)

Addresses the repo-organization audit findings:

Tier 3 (taxonomy): the 12-family agentic microbench (a model-behavior study)
is split across benchmarks/ and hardware-tests/ by which GPU/quant a model
needed, not by question. Low-disruption fix (no directory moves):
- NEW MICROBENCH-INDEX.md — gathers all 12-family microbench entries across
  both trees + disambiguates the four "27B"s (AWQ / Q8 / FP8 / 35B-A3B).
- "where this lives" taxonomy notes in the benchmarks/ microbench READMEs
  (the 397B and 27B-FP8 entry notes ship in PRs #33/#34).

Tier 2 (currency): SCORECARD/COMPARISON/ROADMAP were frozen at 2026-05-02 and
still listed the now-done FP8 re-run as future work.
- SCORECARD: 27B-quant scope banner (its "27B" = AWQ) + newer-results summary
  (397B/Step/MiniMax/FP8) + "would change this picture" #6 marked partly-done.
- COMPARISON: "FP8 re-run is highest-priority follow-up" -> done, cross-linked.
- ROADMAP item 1: marked DONE for 27B, narrowed remaining to Coder/35B-A3B.

Tier 1 (discoverability): indexed the previously-unindexed best-stack and
qwen3.5-397b entries in root README + hardware-tests/README; added a
MICROBENCH-INDEX row to the five-minute-answers table.

Merge note: references the 27B-FP8 (PR #34) and MiniMax (PR #33) entries —
merge those first. Light expected overlap with #34 on the README index tables
(inserted at different anchors to minimize it) and with #33/#34 on claims.yaml
(not touched here).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Lightheartdevs Lightheartdevs merged commit 1135170 into main Jun 1, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant