Skip to content

build: upgrade skillgym to 0.6.0#473

Merged
thymikee merged 7 commits intomainfrom
build/upgrade-skillgym-0-6
Apr 29, 2026
Merged

build: upgrade skillgym to 0.6.0#473
thymikee merged 7 commits intomainfrom
build/upgrade-skillgym-0-6

Conversation

@thymikee
Copy link
Copy Markdown
Contributor

@thymikee thymikee commented Apr 29, 2026

Summary

Upgrade SkillGym to v0.6.0 and wire the local benchmark harness into the new suite controls.

  • tag fixture smoke and skill-guidance cases for v0.6 tag filtering
  • switch the planning suite to schedule: parallel so case/runner pairs can use the SkillGym available-machine parallelism cap
  • use assert.commands.includes / assert.commands.notIncludes with structured commandMatcher(...) planned command checks for both agent-device ... and bare command lines
  • use assert.output.includes for positive final-output assertions
  • document v0.6 reporters, tags, and parallelism guidance in the SkillGym README and AGENTS.md

Touched files: 5. Scope stayed within SkillGym tooling/docs/tests.

Validation

  • pnpm format
  • pnpm check:tooling
  • pnpm typecheck
  • pnpm check:quick
  • pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --case open-and-snapshot --tag fixture-smoke --runner codex-mini --reporter json
  • pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --case text-replace-uses-fill --tag skill-guidance --runner codex-mini
  • pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --case open-and-snapshot --runner codex-mini --tag fixture-smoke (confirmed default Parallel 12)
  • pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --tag fixture-smoke --max-parallel 6 (48 runs passed in 3m43s, confirmed out-of-order parallel completion)
  • pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --tag fixture-smoke (uncapped default reported Parallel 12; 48 runs passed in 2m27s)
  • pnpm test:skillgym ran full matrix under previous isolated-by-runner config in 31m48s with 66/70 cases passed and 136/140 runs passed; the 4 failed case/runner pairs all passed when rerun once
  • git diff --check
  • git diff --check -- AGENTS.md

Known gap: SkillGym does not currently have a built-in retry-failed option; filed callstackincubator/skillgym#17. Uncapped parallel runs currently emit Node MaxListenersExceededWarning at Parallel 12; noted on the upstream issue.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-473/

Built to branch gh-pages at 2026-04-29 18:28 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@thymikee thymikee merged commit f7c8f45 into main Apr 29, 2026
18 checks passed
@thymikee thymikee deleted the build/upgrade-skillgym-0-6 branch April 29, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant