You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* chore: OSS readiness audit — license, schema, validators, refactor, docs
Phase 1 — Legal & attribution
- Align license: pyproject.toml + README badge → Apache-2.0 (matches LICENSE).
- Add NOTICE summarising bundled third-party data and upstream terms.
- Add License & attribution sections to datasets/README.md and each
datasets/sharegpt_*_v1/README.md (CC BY 4.0, upstream link).
- Add schema/accuracy_subset.README.md documenting the MMLU subset (MIT).
Phase 2 — Contributor experience & validation
- Fix doc drift in DEVELOPMENT.md, README.md, runners/README.md,
suites/README.md, runners/template/runner.py (rename
SUPPORTED_QUANTIZATIONS → SUPPORTED_QUANTIZATION_BACKENDS in
*editable* files only; existing runner.py hashes untouched).
- Add schema/suite.schema.json + runners/validate_suites.py and wire
both into validate_pr.yml / generate_leaderboard.yml.
- Add .github/ISSUE_TEMPLATE/new_suite.md for community suite proposals.
- CONTRIBUTING.md: add local leaderboard preview instructions.
- .gitignore: ignore node_modules/, .cursor/, .aider*, .envrc, .direnv/.
Phase 3 — Code quality & CI
- runners/benchmark_runner.py:
* Remove dead code (stub format_prompt, dead spec-decoding branch,
redundant acc_result init, duplicated _build_result_json block).
* Extract helpers (_prepare_load_context, _score_accuracy_questions,
_write_accuracy_artifacts) shared between accuracy scenarios.
* Replace inference dispatch if/elif ladder with _SCENARIO_REGISTRY
(ScenarioSpec dataclass: inference_kind, use_async, merge_key…).
* _MERGE_SCENARIO_KEYS now derived from the registry. Net −111 lines.
- leaderboard: split SUITE_META into
leaderboard/site/assets/data/suite-meta.js, data.js re-exports it
(data.js 1010 → 800 lines).
- validate_pr.yml: add python-tests job (serve + openclaw_skill pytest).
- pyproject.toml: setuptools.packages.find now lists loadgen/runners/
serve/openclaw_skill explicitly and excludes tests*.
README hero & citation
- Embed docs/assets/framework-overview.png under nav links and
docs/assets/chip-cloud.png in a new "Currently on the leaderboard"
section.
- Expand BibTeX author list in the Citation section.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs(readme): update citation title
Co-authored-by: Cursor <cursoragent@cursor.com>
* ci: fix python-tests collection — lazy uvicorn import + add numpy
- serve/server.py: import uvicorn lazily inside start_server() so that
importing the module (e.g. from tests, or to expose the ASGI app)
does not require uvicorn to be installed.
- validate_pr.yml: add numpy to the python-tests install list — pulled
in transitively by loadgen, needed once serve.server imports
runners.benchmark_runner during test collection.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(serve/tests): add TokenStreamingMockRunner; MockRunner now raises NotImplementedError
Pre-existing breakage in serve/tests/test_server.py — never caught
because python-tests was not wired into CI until this branch.
- test_server.py imports TokenStreamingMockRunner from mock_runner,
but the class did not exist (4 ImportError collection errors).
- test_fallback_when_no_token_stream expects MockRunner to *not*
implement true token streaming so the server's single-chunk
fallback path runs. MockRunner used to yield word-by-word, so the
test asserted len(content_chunks) == 1 but got more (1 AssertionError).
Fix to match the RunnerProtocol contract (runners/protocol.py:67) —
true token streaming is optional, runners signal "not supported" by
raising NotImplementedError:
- MockRunner.inference_fn_token_stream now raises NotImplementedError
(with a trailing unreachable yield so the function shape stays an
async generator, matching the protocol).
- Add TokenStreamingMockRunner(MockRunner) that overrides the method
to yield word-by-word with a small async delay — used by the four
tests that exercise the multi-chunk SSE path.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(serve/tests): TokenStreamingMockRunner — emit leading separator, not trailing
test_token_stream_reassembles_correctly concatenates every content
delta and expects exact equality with the response_text. Yielding
"word + ' '" tacks an extra trailing space onto the reassembled string,
so the assertion failed:
got: 'Hello from token stream. '
expected: 'Hello from token stream.'
Switch to a leading-space separator (space before every word *after*
the first). Concatenation now round-trips exactly, and the shape
matches how real BPE / SentencePiece tokenizers stream pieces (the
first token has no preceding space; subsequent ones do).
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
-**Bug in LoadGen or schema:** Open a GitHub Issue
1101
-
-**New suite proposal:** Open a GitHub Issue with the "Request new suite" template
1137
+
-**New suite proposal:** Open a GitHub Issue with the [**Propose a new suite**](https://github.com/JuhaoLiang1997/AccelMark/issues/new?template=new_suite.md) template
1102
1138
-**New platform support:** Open a PR with a working platform script and at least one verified result
0 commit comments