feat: add skillgym tests and a test-app by thymikee · Pull Request #453 · callstackincubator/agent-device

thymikee · 2026-04-26T22:26:40Z

Summary

Add the Expo example app under examples/test-app for agent-device smoke flows.
Add SkillGym config, docs, and a 48-case suite split into fixture smoke coverage and MECE skill-guidance regressions, including perf, React DevTools, gesture, settings, and trace planning coverage.
Update SkillGym to 0.5.0, include the Claude Haiku runner, and keep the fixture install lockfile-respecting.
Ignore the nested fixture app in Fallow analysis and exclude generated SkillGym result artifacts from pnpm format traversal.

Touched files: 33. Scope is limited to the example app plus SkillGym docs/test support and related tooling config.

Validation

pnpm format
pnpm typecheck
pnpm check:tooling
pnpm check:unit
pnpm check:fallow --base origin/main
pnpm test-app:install
pnpm test-app:typecheck
git diff --check
pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --runner claude-haiku --case open-and-snapshot
pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --runner codex-main (benchmark run completed with 26 passed / 15 failed, surfacing intended command-planning regressions in Codex mini)
pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts (full matrix completed: 51 passed / 45 failed runs; 20 passed / 28 failed cases)
pnpm --dir examples/test-app exec expo start --ios --port 8082 --host lan plus simulator screenshots for light and dark UI review

github-actions · 2026-04-26T22:28:00Z

PR Preview Action v1.8.1
🚀 View preview at https://callstackincubator.github.io/agent-device/pr-preview/pr-453/
Built to branch `gh-pages` at 2026-04-27 00:46 UTC. Preview will be ready when the GitHub Pages deployment is complete.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 721373139d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-26T22:31:26Z

  20,
  5,
 );
+const RUNNER_AUTH_TOKEN = process.env.AGENT_DEVICE_RUNNER_AUTH_TOKEN?.trim() || undefined;


Use a single env var for runner command auth token

waitForRunner only sends Authorization when AGENT_DEVICE_RUNNER_AUTH_TOKEN is set, but the runner enforces auth using AGENT_DEVICE_RUNNER_COMMAND_TOKEN (RunnerTests+Environment.swift). When only the command token is configured (the server-side knob), iOS /command requests are sent without a bearer token and are rejected with 401, which blocks all runner commands.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-26T22:31:26Z

+      'footer visible text: Seasonal footer target',
+    ],
+    task: 'Assume Agent Device Tester is on the Catalog tab. Plan the commands to scroll into view the Seasonal footer target card using scrollintoview.',
+    outputs: [/scrollintoview/i, /catalog-footer/i],


Assert the real scroll command in smoke suite

This case requires scrollintoview, but the CLI command surface defines scroll (no scrollintoview command). As written, the benchmark rewards an invalid command and can fail agents that output the valid scroll ... plan, producing false negatives in the smoke suite.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-26T22:31:26Z

    "check": "pnpm check:tooling && pnpm check:fallow && pnpm check:unit",
    "prepack": "pnpm build:all",
    "typecheck": "tsc -p tsconfig.json",
+    "test-app:install": "pnpm install --dir examples/test-app --ignore-workspace --lockfile=false",


Install fixture app with its lockfile enabled

The new test-app:install script passes --lockfile=false, which makes pnpm ignore the committed examples/test-app/pnpm-lock.yaml. That causes dependency drift across runs and can make SkillGym smoke results non-reproducible when upstream package versions change.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

thymikee force-pushed the feat/skillgym branch from 7213731 to 8b97b5e Compare April 26, 2026 22:31

thymikee changed the title ~~feat: add skillgym tests~~ feat: add skillgym tests and a test-app Apr 26, 2026

thymikee force-pushed the feat/skillgym branch 8 times, most recently from 68fba33 to 0b6dd04 Compare April 27, 2026 00:27

feat: add skillgym tests

d3febd1

thymikee force-pushed the feat/skillgym branch from 0b6dd04 to d3febd1 Compare April 27, 2026 00:46

thymikee merged commit 7c5b767 into main Apr 27, 2026
16 checks passed

thymikee deleted the feat/skillgym branch April 27, 2026 00:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add skillgym tests and a test-app#453

feat: add skillgym tests and a test-app#453
thymikee merged 1 commit intomainfrom
feat/skillgym

thymikee commented Apr 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 26, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-27 00:46 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thymikee commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

github-actions Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-27 00:46 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thymikee commented Apr 26, 2026 •

edited

Loading

github-actions Bot commented Apr 26, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-27 00:46 UTC.
Preview will be ready when the GitHub Pages deployment is complete.