Skip to content

test(scenarios): add bai v2 CLI integration scenario suite#11307

Draft
seedspirit wants to merge 4 commits intomainfrom
tests/cli-scenario-suite
Draft

test(scenarios): add bai v2 CLI integration scenario suite#11307
seedspirit wants to merge 4 commits intomainfrom
tests/cli-scenario-suite

Conversation

@seedspirit
Copy link
Copy Markdown
Contributor

Summary

End-to-end shell scripts that exercise the ./bai v2 CLI against a
live cluster. Sits alongside (does not replace) the unit / pytest suite —
its job is to catch regressions in the cross-component contract: API
handlers, CLI plumbing, RBAC boundaries, storage proxy round-trips, and
resource-policy enforcement.

  • 16 scenarios + setup/teardown, runnable individually or via scenarios/run_all.sh
  • Self-contained: each script logs in as the appropriate user, asserts behavior, cleans up after itself
  • Test resources are prefixed (scn- by default) so cleanup is by-prefix and never touches non-scenario data

Coverage

Domain Scenarios
vfolder lifecycle, multi-user access, invite/clone, cross-project isolation, mounted-delete guard, cloneable=false guard, bulk ops, TUS file I/O round-trip
session lifecycle, BATCH exec + logs, keypair max_concurrent_sessions cap
model card / deployment deploy from card, revision/update/delete, endpoint URL serve

Strict failure policy

No soft-pass. Every scenario exit 1s on failure. Missing fixtures
(e.g. no model card on a fresh dev DB) or manager bugs surface as real
test failures rather than silent skips.

Current results (against main @ f55366d34)

13 / 17 PASS. The 4 failures are real and documented in
scenarios/README.md:

# Scenario Root cause
03 model_card_deploy model-store has 0 model cards (fixture gap on dev DB)
04 deployment_revision Manager bug: deployment delete leaves lifecycle.status empty (no terminal-state transition)
07 vfolder_invite_clone Manager bug: post-clone GET hits RBAC eventual-consistency PermissionDeniedError (vfolder/adapter.py:626)
14 deployment_endpoint_serve Same as 03 (also reaches into model-store)

Test plan

  • ./dev start all + halfstack healthy
  • scenarios/run_all.sh reports 13 PASS / 4 FAIL with listed failures
  • SKIP_TEARDOWN=1 ONLY="01" scenarios/run_all.sh runs a single scenario
  • Failures match the manager bugs / fixture gaps listed in README
  • No leftover scn--prefixed resources after 99_teardown

🤖 Generated with Claude Code

End-to-end shell scripts that exercise the v2 CLI against a live cluster:
session/vfolder/deployment/model-card lifecycle, multi-user permission
boundaries, mount-locked deletion, bulk ops, file I/O round-trip via TUS,
BATCH session log retrieval, and keypair concurrency caps.

Scenarios are strict — no soft-pass / soft-skip paths. Missing fixtures
or manager bugs surface as real failures so regressions stay visible.

Current pass rate against main: 13/17. Four scenarios fail on real
issues (documented in README "Known failures"):
- 03/14: model-store has no model cards on dev DB (fixture gap)
- 04: deployment delete leaves lifecycle.status empty (manager bug)
- 07: post-clone GET hits RBAC eventual-consistency (manager bug)

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…python helpers

Extract inline `python3 -c` heredocs from scenario shell scripts into
per-scenario `.py` helper files. Each scenario now lives in its own
directory containing `run.sh` and any scenario-specific python helpers.
Generic JSON parsers shared across scenarios live in `scenarios/lib/py/`,
exposed via the `$SCN_PY` env var.

Also fixes two bugs surfaced during the refactor: scripts were using
`UID=` to pass an env var to python, but `UID` is a bash readonly
variable — renamed to `TARGET_UID` in 00_setup and 15_session_concurrency_cap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL 500~ LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant