Skip to content

Commit 3529759

Browse files
Decouple runner onboarding (#45)
* Decouple new-runner onboarding from shared files Adding a runner used to require touching at least three shared files — README.md, meta.schema.json, and collect_env.py — even when the work was confined to a single accelerator family. This PR rewires those touch points so contributors normally only edit files inside their own runner folder. What changed: * README platforms matrix is now auto-generated from each runner's meta.json (new optional suite_support / hardware_label fields). README.md carries marker comments and tools/generate_platforms_matrix.py splices the table in; CI can call --check to fail PRs that get out of sync. * meta.schema.json no longer hard-codes the set of accelerator platforms. The platform field is now validated by a lowercase regex, and the curated catalogue lives in schema/platforms.json — purely for presentation (display label, sort order). validate_runners.py prints a non-fatal warning when it meets an uncatalogued platform. * collect_env.py is split into a thin orchestrator plus one self-contained plug-in per accelerator family under runners/platforms/ (nvidia, amd, ascend, apple, google, moorethreads). Plug-ins are auto-discovered; adding a new accelerator only requires dropping a single file in that directory. env_info.json now carries an accelerator_platform field identifying the active plug-in. Side effects worth flagging: * The regenerated README matrix now includes the apple_mlx_lm and nvidia_sglang_c43a8309 runners that had been missed in the hand-maintained table. * All 7 existing runners gained explicit suite_support entries; no behaviour change, just self-description used by the generator. * runners/README.md got a new "Adding a new accelerator family" section that documents the plug-in protocol. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: drop nvidia_sglang_6da83845 and tighten .gitignore * Remove the older SGLang runner (nvidia_sglang_6da83845, sglang 0.4.0 / torch 2.5.1 / transformers 4.46.3). The newer nvidia_sglang_c43a8309 (sglang 0.5.6 / torch 2.9.1 / EAGLE speculative decoding) supersedes it in practice. No results in this repo reference the old hash and there are no external consumers (pre-open-source), so we delete the folder rather than mark deprecated_by — this is the last opportunity to do so before the immutability rule kicks in. * Expand .gitignore so the dozens of locally generated samples.jsonl files under results/verified/** stop showing up as untracked, and add the common IDE / lint / test-cache directories (.idea/, .vscode/, .pytest_cache/, .mypy_cache/, .ruff_cache/, .coverage*, .tox/) that contributors typically have. Co-authored-by: Cursor <cursoragent@cursor.com> * docs: open-source prep — community files, pyproject metadata, decoupled-flow walk-through * Add CODE_OF_CONDUCT.md (Contributor Covenant 2.1) with a small benchmark-specific addendum covering fabricated results and vendor affiliation disclosure. * Add SECURITY.md scoping the threat model (code that runs on contributor machines + validator bypasses for fake leaderboard entries) and pointing reporters at GitHub private security advisories instead of public issues. * Flesh out pyproject.toml with authors, maintainers, keywords, Trove classifiers (license, audience, Python 3.10–3.12, platforms), and the full set of project.urls (Homepage, Leaderboard, Documentation, Repository, Issues, Changelog) so it renders nicely on PyPI once we cut a release. * Rewrite the 'Adding support for a new platform' section of CONTRIBUTING.md to match the decoupled onboarding flow that landed in the previous commit: a new runner on an existing platform no longer needs to touch any shared file, and a brand-new accelerator family only needs a single self-contained plug-in under runners/platforms/. The section is renamed 'Adding a new runner' to reflect what most contributors actually do, with a clearly marked sub-section for the rarer 'new accelerator family' case. * Repoint two README.md links that pointed at the old '#adding-support-for-a-new-platform' anchor. No behavioural changes to the framework or runners. Co-authored-by: Cursor <cursoragent@cursor.com> * ci(validate_pr): enforce README matrix sync and full-repo runner validation * Run runners/validate_runners.py over **every** runner folder in the repo (not just the ones touched in the current PR). This catches drift introduced by shared changes — e.g. a meta.schema.json edit that accidentally breaks an unrelated existing runner. * Run tools/generate_platforms_matrix.py --check on every triggering PR. The README 'Supported platforms' matrix is auto-generated from each runner's meta.json; if a PR changes a runner's suite_support / hardware_label or adds a new runner without regenerating the table, the job now fails with a clear instruction to regenerate locally and commit the result. * Expand the workflow's paths trigger to cover the README, the platforms catalogue (schema/platforms.json), and the generator itself, so the matrix-sync check actually runs when those files are modified. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: prune dead artefacts (utils/ duplicate, orphan runner_config examples) Pre-1.0 cleanup before open-sourcing: * `utils/run_all_{4,8}gpu.sh` — older duplicates of the same-named scripts under `examples/`. Nothing references them; drop the folder. * `configs/runner_configs/runner_*_{523da458,605db33a,9f42fabb}.yaml.example` — stale templates whose runner folders were superseded by the current hash IDs (`6c18cd8f`, `d4aa9fda`, `c43a8309`). Each surviving runner has its own up-to-date `*.yaml.example` companion. No code path or doc references any of these, so this is a pure delete. Co-authored-by: Cursor <cursoragent@cursor.com> * docs: open-source polish — logo, PR-first flow, community-facing language Round of pre-1.0 documentation work driven by the question "what would a first-time contributor see, and does it look maintained or maintainer-run?". Visual identity * New SVG mark + wordmark: a lightning bolt crossing a speedometer arc. Lives under `docs/assets/` and renders via `<picture>` with separate light/dark variants — GitHub README's `prefers-color-scheme` swap. * The README header now uses the wordmark instead of an emoji + H1. README slimming * Drop the full `Repository structure` tree from the top page. Mature projects (PyTorch, vLLM, llama.cpp) don't ship the tree on the front door; the trimmed copy in DEVELOPMENT.md is enough for spelunkers. * Quick-start step 4 is now "open a pull request" with `gh pr create`. The issue-bot path is kept verbatim as a one-line escape hatch for people who don't want to touch git. * New top-level links to Discussions for Q&A and to `openclaw_skill/` for the optional voice-driven launcher (clearly marked optional). * Citation now credits "The AccelMark Contributors" alongside the original author. CONTRIBUTING rewrite of the submission flow * The whole "Submitting your results" section was rewritten under a new `## Submitting a result` anchor (referenced from README). PR path is primary, with the bot-drafted-PR path as the no-git fallback. * New paragraph documents the `configs/runner_configs/runner_<id>.yaml` gitignore policy explicitly — only the `*.yaml.example` companions ship; the live override file is strictly local. * Verified-tier definition rephrased: it is hardware reproducibility, not a maintainer privilege. Anyone with the same chip + runner can open a reproduction PR and bump a community result to verified. Community-facing language cleanup * `results/README.md`, `suites/README.md`, `DEVELOPMENT.md`, and `CONTRIBUTING.md` no longer describe verification / flagging / suite-acceptance as maintainer-gated. They read as community workflows that anyone can drive. * Time SLAs ("within a day or two") and "maintainer reviews" copy removed from the contribution path so the doc doesn't make promises that depend on a single person. `CODE_OF_CONDUCT.md` and `SECURITY.md` still mention maintainers intentionally — those documents need a clear enforcement contact and that's expected of any open-source repo. Co-authored-by: Cursor <cursoragent@cursor.com> * docs+leaderboard: tighten logo alignment, brand the site, label runners Round-trip feedback from rendering the new README header in light mode: * The icon was visually drifting below the wordmark because the SVG was packing both "AccelMark" and a tagline into the same image, forcing the icon to balance two text lines. * The leaderboard site still used a bare emoji and had no favicon, so there was no continuity between the README and the public site. * When two runners share the same `framework` string (e.g. `vLLM` ships both the stable runner and a future `vllm-0.20` one), result cards rendered as indistinguishable "Qwen2.5-0.5B-Instruct · vLLM · BF16" rows even though the `framework_version` field already disambiguates. Logo + README * `docs/assets/logo-wordmark{,-dark}.svg`: single-row mark of the form `[icon] AccelMark`. ViewBox shrunk from 480×96 to 280×72 with the icon's geometric centre put exactly on the cap-height midline of the AccelMark glyphs. The "Cross-platform LLM inference benchmark" tagline previously baked into the SVG is now a separate `<p>` under the logo in README, so the brand mark stays compact and reusable. * README rendering knob: `width="360"` (was 420) to fit the new aspect ratio. Leaderboard site branding * New `leaderboard/site/favicon.svg` (copy of the standalone icon). Registered via `<link rel="icon" type="image/svg+xml" …>` so the tab picks it up immediately. * `header h1` swapped the ⚡ emoji for the inline SVG mark, using a dark-theme palette (#FCD34D bolt + #93C5FD gauge) that pops on the #0d1117 background. Flex layout for vertical alignment between the icon and the title. Runner disambiguation on cards and tables * Card layout (line 836): the framework field now reads `${framework}${framework_version}`, e.g. `vLLM 0.5.5`. A `title=` on the same span exposes `runner: <implementation_id>` on hover when the user wants the precise hash. * Table cell formatter (`formatFramework`): same inline version after the framework name (rendered in a muted colour so the framework name stays the dominant token), and `implementation_id` is added to the hover tooltip alongside the existing version / script / notes lines. Net effect for the open question raised in review: two vLLM runners on the same hardware are now visually distinct without anyone editing the runner's `_get_framework_name()` to fake a variant suffix. Co-authored-by: Cursor <cursoragent@cursor.com> * ci(generate_leaderboard): also redeploy when site or generator changes Previously the leaderboard deploy workflow only fired on `results/**` changes, so PRs that touched `leaderboard/site/index.html`, `leaderboard/generate.py`, or platform metadata could land on main and never reach the public site until somebody happened to merge a new result. Widen the `paths:` filter so any of these can trigger a redeploy: * `leaderboard/**` — the static site and generator script * `tools/generate_platforms_matrix.py` and `schema/platforms.json` — the README platforms matrix inputs (the workflow regenerates that too) * `runners/*/meta.json` — runner metadata that the leaderboard surfaces (framework, suite support, hardware labels) `workflow_dispatch` stays available as the escape hatch for forcing a redeploy when nothing in the watched paths changed. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: drop pre-1.0 backward-compat shims and stale Suite C comments All three removals were verified to have zero in-repo dependencies — every suite.json and the entire codebase is already on the new format. suite_C/suite.py — stale runner-backend gating Eleven lines of commented-out code that gated each quantized format on whether the runner declared the backend in SUPPORTED_QUANTIZATION_BACKENDS. The strategy changed long ago: now we always send the format through and let the inference engine report its own incompatibility (recorded in the subprocess summary). The accompanying skip-reason `print` was updated to match what actually causes the skip today (the *other* full-precision baseline, e.g. FP16 on Ampere where the baseline is BF16). benchmark_runner._parse_scenarios_config — flat-list legacy Five lines that accepted suite.json with `"scenarios": ["accuracy", ...]` instead of the documented `{"default": [...], "extra": [...]}`. All seven suite.json files are on the dict form; flat-list was never documented for external authors. Docstring and the DEVELOPMENT.md line referencing the legacy form updated. benchmark_runner._resolve_requests_path — per-suite requests.jsonl fallback Ten lines that fell back to `suites/<id>/requests.jsonl` when a suite had no `dataset` key. Every suite.json now declares `dataset:` and points at `datasets/<name>/requests.jsonl`; there is no `suites/*/requests.jsonl` anywhere in the repo. The function now requires `dataset` and produces a pointed error message if it's missing. Kept on purpose `/v1/completions` in `serve/server.py` and the README — that is OpenAI's own legacy endpoint (still widely used by older LangChain/llama.cpp/etc. clients), not an AccelMark-internal compat shim, so removing it would narrow the audience of the drop-in OpenAI replacement we advertise. Net: -28 lines, +13 lines of clearer code paths, no functional change. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent b4098f1 commit 3529759

47 files changed

Lines changed: 2431 additions & 1855 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/generate_leaderboard.yml

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,22 @@
11
name: Generate Leaderboard
22

3-
# Triggers only when results/ changes land on main.
4-
# This covers both direct merges and squash merges.
3+
# Triggers when anything that can change the rendered leaderboard lands on
4+
# main: new community/verified results, generator code, the static site, or
5+
# the auto-generated README platforms matrix (so a runner/platform metadata
6+
# change also redeploys). This covers both direct merges and squash merges.
57
on:
68
push:
79
branches:
810
- main
911
paths:
1012
- 'results/**'
13+
- 'leaderboard/**'
14+
- 'tools/generate_platforms_matrix.py'
15+
- 'schema/platforms.json'
16+
- 'runners/*/meta.json'
1117

12-
# Allow manual trigger from Actions tab (useful for first deploy)
18+
# Allow manual trigger from Actions tab (useful for first deploy or to
19+
# force a redeploy when nothing in the watched paths changed).
1320
workflow_dispatch:
1421

1522
jobs:

.github/workflows/validate_pr.yml

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
11
name: Validate Submission
22

3-
# Triggers only when results/ directory is touched in a PR.
4-
# Schema changes, script changes, and doc changes do not trigger this.
3+
# Triggers when results/, runner metadata, the platforms catalogue, the
4+
# README (which contains the auto-generated matrix), or the matrix
5+
# generator itself are touched in a PR.
56
on:
67
pull_request:
78
paths:
89
- 'results/**'
910
- 'runners/**'
11+
- 'schema/platforms.json'
12+
- 'tools/generate_platforms_matrix.py'
13+
- 'README.md'
1014

1115
jobs:
1216
validate-runners:
@@ -66,6 +70,24 @@ jobs:
6670
if: steps.changed.outputs.folders == ''
6771
run: echo "No runner folders changed in this PR — skipping."
6872

73+
# Always validate every runner folder (not just the ones touched in
74+
# this PR). This catches drift introduced by shared changes — e.g.
75+
# a meta.schema.json edit that breaks an unrelated existing runner.
76+
- name: Validate all runner folders (drift check)
77+
run: |
78+
echo "::group::Validating every runner folder in the repo"
79+
python runners/validate_runners.py
80+
echo "::endgroup::"
81+
82+
# README "Supported platforms" matrix is generated from each runner's
83+
# meta.json. If a PR changes a runner's suite_support / hardware_label
84+
# or adds a new runner without regenerating the table, fail.
85+
- name: README platforms matrix is in sync
86+
run: |
87+
echo "::group::tools/generate_platforms_matrix.py --check"
88+
python tools/generate_platforms_matrix.py --check
89+
echo "::endgroup::"
90+
6991
validate:
7092
name: Validate result submissions
7193
runs-on: ubuntu-latest

.gitignore

Lines changed: 33 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,50 @@
1+
# ── Python ──────────────────────────────────────────────────────────────────
12
__pycache__/
23
*.py[cod]
4+
*.egg
35
*.egg-info/
46
dist/
57
build/
68
.venv/
79
venv/
810
env/
9-
*.egg
11+
12+
# ── Editor / IDE ────────────────────────────────────────────────────────────
13+
.idea/
14+
.vscode/
15+
*.swp
16+
*.swo
17+
*~
18+
*.tmp
1019
.DS_Store
11-
*.log
12-
my_submission/
13-
mini_result/
14-
/tmp/
15-
leaderboard/site/leaderboard.js
16-
leaderboard/site/api/
20+
21+
# ── Test / lint caches ──────────────────────────────────────────────────────
22+
.pytest_cache/
23+
.mypy_cache/
24+
.ruff_cache/
25+
.coverage
26+
.coverage.*
27+
htmlcov/
28+
.tox/
29+
30+
# ── Jupyter ─────────────────────────────────────────────────────────────────
31+
.ipynb_checkpoints/
32+
33+
# ── AccelMark local-only files ──────────────────────────────────────────────
1734
configs/models_local.yaml
1835
configs/submitter.yaml
1936
configs/runner_configs/*.yaml
37+
leaderboard/site/leaderboard.js
38+
leaderboard/site/api/
2039

21-
# Local-only benchmark artifacts (not needed for submission)
40+
# ── Benchmark artifacts (local-only — not part of submissions) ──────────────
41+
samples.jsonl
42+
samples.jsonl.ipynb_checkpoints/
2243
accuracy_outputs.jsonl
2344
run.log
24-
samples.jsonl.ipynb_checkpoints/
45+
*.log
46+
my_submission/
47+
mini_result/
2548
*_backup/
2649
backup/
27-
.ipynb_checkpoints/
50+
/tmp/

CODE_OF_CONDUCT.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# Contributor Covenant Code of Conduct
2+
3+
## Our Pledge
4+
5+
We as members, contributors, and maintainers pledge to make participation in
6+
AccelMark a harassment-free experience for everyone, regardless of age, body
7+
size, visible or invisible disability, ethnicity, sex characteristics, gender
8+
identity and expression, level of experience, education, socio-economic
9+
status, nationality, personal appearance, race, religion, or sexual identity
10+
and orientation.
11+
12+
We pledge to act and interact in ways that contribute to an open, welcoming,
13+
diverse, inclusive, and healthy community.
14+
15+
## Our Standards
16+
17+
Examples of behavior that contributes to a positive environment for our
18+
community include:
19+
20+
* Demonstrating empathy and kindness toward other people
21+
* Being respectful of differing opinions, viewpoints, and experiences
22+
* Giving and gracefully accepting constructive feedback
23+
* Accepting responsibility and apologizing to those affected by our mistakes,
24+
and learning from the experience
25+
* Focusing on what is best not just for us as individuals, but for the
26+
overall community
27+
28+
Examples of unacceptable behavior include:
29+
30+
* The use of sexualized language or imagery, and sexual attention or advances
31+
of any kind
32+
* Trolling, insulting or derogatory comments, and personal or political attacks
33+
* Public or private harassment
34+
* Publishing others' private information, such as a physical or email address,
35+
without their explicit permission
36+
* Other conduct which could reasonably be considered inappropriate in a
37+
professional setting
38+
39+
## Benchmark-specific expectations
40+
41+
AccelMark is a results-driven leaderboard. The following are specifically
42+
out of scope:
43+
44+
* **Cherry-picked, doctored, or fabricated results.** Submitting a result
45+
knowing it does not reflect the listed hardware / software is misconduct,
46+
not a mistake. Mistakes are expected and welcome; fabrication is not.
47+
* **Misrepresentation of affiliation.** Vendor employees may submit results
48+
for their own hardware (it is encouraged) — but the `[vendor]` tag in the
49+
submitter name must be present, per `CONTRIBUTING.md`.
50+
* **Disparaging another vendor or contributor's hardware in PR/issue
51+
comments.** Numbers speak; commentary should focus on methodology and
52+
reproducibility, not on the entity behind a competing result.
53+
54+
## Enforcement Responsibilities
55+
56+
Project maintainers are responsible for clarifying and enforcing our
57+
standards of acceptable behavior and will take appropriate and fair
58+
corrective action in response to any behavior that they deem inappropriate,
59+
threatening, offensive, or harmful.
60+
61+
Maintainers have the right and responsibility to remove, edit, or reject
62+
comments, commits, code, wiki edits, issues, and other contributions that
63+
are not aligned to this Code of Conduct, and will communicate reasons for
64+
moderation decisions when appropriate.
65+
66+
## Scope
67+
68+
This Code of Conduct applies within all community spaces, and also applies
69+
when an individual is officially representing the community in public spaces.
70+
Examples of representing our community include using an official e-mail
71+
address, posting via an official social media account, or acting as an
72+
appointed representative at an online or offline event.
73+
74+
## Enforcement
75+
76+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
77+
reported to the project maintainers by opening a confidential security
78+
advisory at <https://github.com/JuhaoLiang1997/AccelMark/security/advisories/new>
79+
or, when GitHub access is not available, by emailing the maintainer listed
80+
in the repository profile. All complaints will be reviewed and investigated
81+
promptly and fairly.
82+
83+
All community leaders are obligated to respect the privacy and security of
84+
the reporter of any incident.
85+
86+
## Enforcement Guidelines
87+
88+
Project maintainers will follow these Community Impact Guidelines in
89+
determining the consequences for any action they deem in violation of this
90+
Code of Conduct:
91+
92+
### 1. Correction
93+
94+
**Community Impact**: Use of inappropriate language or other behavior deemed
95+
unprofessional or unwelcome in the community.
96+
97+
**Consequence**: A private, written warning from a maintainer, providing
98+
clarity around the nature of the violation and an explanation of why the
99+
behavior was inappropriate. A public apology may be requested.
100+
101+
### 2. Warning
102+
103+
**Community Impact**: A violation through a single incident or series of
104+
actions.
105+
106+
**Consequence**: A warning with consequences for continued behavior. No
107+
interaction with the people involved, including unsolicited interaction
108+
with those enforcing the Code of Conduct, for a specified period of time.
109+
Violating these terms may lead to a temporary or permanent ban.
110+
111+
### 3. Temporary Ban
112+
113+
**Community Impact**: A serious violation of community standards, including
114+
sustained inappropriate behavior.
115+
116+
**Consequence**: A temporary ban from any sort of interaction or public
117+
communication with the community for a specified period of time. No public
118+
or private interaction with the people involved, including unsolicited
119+
interaction with those enforcing the Code of Conduct, is allowed during
120+
this period. Violating these terms may lead to a permanent ban.
121+
122+
### 4. Permanent Ban
123+
124+
**Community Impact**: Demonstrating a pattern of violation of community
125+
standards, including sustained inappropriate behavior, harassment of an
126+
individual, or aggression toward or disparagement of classes of individuals.
127+
128+
**Consequence**: A permanent ban from any sort of public interaction within
129+
the community.
130+
131+
## Attribution
132+
133+
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
134+
version 2.1, available at
135+
<https://www.contributor-covenant.org/version/2/1/code_of_conduct.html>.
136+
137+
Community Impact Guidelines were inspired by [Mozilla's code of conduct
138+
enforcement ladder](https://github.com/mozilla/diversity).
139+
140+
For answers to common questions about this code of conduct, see the FAQ at
141+
<https://www.contributor-covenant.org/faq>. Translations are available at
142+
<https://www.contributor-covenant.org/translations>.
143+
144+
[homepage]: https://www.contributor-covenant.org

0 commit comments

Comments
 (0)