docs: microbench cross-tree index + synthesis-doc currency (org fixes)#37
Merged
Lightheartdevs merged 1 commit intoJun 1, 2026
Merged
Conversation
…org fixes) Addresses the repo-organization audit findings: Tier 3 (taxonomy): the 12-family agentic microbench (a model-behavior study) is split across benchmarks/ and hardware-tests/ by which GPU/quant a model needed, not by question. Low-disruption fix (no directory moves): - NEW MICROBENCH-INDEX.md — gathers all 12-family microbench entries across both trees + disambiguates the four "27B"s (AWQ / Q8 / FP8 / 35B-A3B). - "where this lives" taxonomy notes in the benchmarks/ microbench READMEs (the 397B and 27B-FP8 entry notes ship in PRs #33/#34). Tier 2 (currency): SCORECARD/COMPARISON/ROADMAP were frozen at 2026-05-02 and still listed the now-done FP8 re-run as future work. - SCORECARD: 27B-quant scope banner (its "27B" = AWQ) + newer-results summary (397B/Step/MiniMax/FP8) + "would change this picture" #6 marked partly-done. - COMPARISON: "FP8 re-run is highest-priority follow-up" -> done, cross-linked. - ROADMAP item 1: marked DONE for 27B, narrowed remaining to Coder/35B-A3B. Tier 1 (discoverability): indexed the previously-unindexed best-stack and qwen3.5-397b entries in root README + hardware-tests/README; added a MICROBENCH-INDEX row to the five-minute-answers table. Merge note: references the 27B-FP8 (PR #34) and MiniMax (PR #33) entries — merge those first. Light expected overlap with #34 on the README index tables (inserted at different anchors to minimize it) and with #33/#34 on claims.yaml (not touched here). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses the repo-organization audit (Tiers 1–3). No data changes — navigation, taxonomy, and currency only.
Tier 3 — taxonomy (the headline org problem)
The 12-family agentic microbench (a model-behavior study) is split across
benchmarks/andhardware-tests/by which GPU/quant a model happened to need, not by the question asked. Low-disruption fix (no directory moves, no broken links/history):MICROBENCH-INDEX.md— one table of every 12-family microbench across both trees, plus disambiguation of the four "27B"s (AWQ / Q8 / FP8 / 35B-A3B sibling), and the one finding that holds across all of them (thinking net-negative).benchmarks/microbench READMEs. (The 397B and 27B-FP8 entry notes ship in PRs Add MiniMax-M2.7-NVFP4 (N=5, TP=2): temp serving-trap + exhaustive-completer findings #33 / Qwen3.6-27B-FP8 full microbench N=5 — think vs no-think (clean FP8 redo) #34.)Tier 2 — currency (synthesis docs were frozen at 2026-05-02)
Tier 1 — discoverability
best-stack-followup,qwen3.5-397b-vs-step3.7-flash) in both the root README andhardware-tests/README.mdtables.MICROBENCH-INDEX.mdrow to the root "five-minute answers" table.Merge notes
claims.yamlis not touched here (its merge-order is between Add MiniMax-M2.7-NVFP4 (N=5, TP=2): temp serving-trap + exhaustive-completer findings #33 and Qwen3.6-27B-FP8 full microbench N=5 — think vs no-think (clean FP8 redo) #34).🤖 Generated with Claude Code