Commit aba397a
authored
test(e2e): export a fully-featured harness in-project and by ARN (#1641)
* test(e2e): export a fully-featured harness in-project and by ARN
End-to-end coverage for `export harness` across both source modes, proving each
exported agent works at runtime (not just that the spec/wiring is generated).
The source harness attaches every export surface: an existing project memory
(by name), an agentcore_code_interpreter tool, a public GitHub skill, and an MCP
gateway tool. Flow: deploy memory+gateway → create the harness attaching both →
deploy → invoke the harness → export --name (in-project) → deploy → verify →
export --arn (new empty project) → deploy → verify.
Each exported agent is behaviorally verified via four invokes: code interpreter
(exact factorial value), gateway tool (assert the gateway-prefixed/Exa MCP tool
is listed), skill (assert the returns-policy skill is referenced), and memory
(same-session round-trip recall). Verified live: 7/7 steps pass; both projects
torn down in afterAll; project names are E2e-prefixed + per-run-unique so a
failed teardown is still swept by global-setup's stale-stack GC.
* test(e2e): retry capability content assertions to avoid flakiness
The per-capability checks asserted on a single LLM response outside the retry —
only invokeAndExpectSuccess's CLI-`success` check was retried. That risks
intermittent CI failures from (a) memory same-session write/read visibility lag
on the recall turn and (b) LLM phrasing nondeterminism on the gateway-tool and
skill checks (a `success: true` response that happens not to name the expected
token).
Add an optional `verify` predicate to invokeAndExpectSuccess that runs INSIDE the
retried unit, and move every content assertion (factorial value, gateway tool
token, skill reference, memory recall) into it — so a flaky sample re-invokes
instead of failing. Mirrors the retry pattern already used in harness-e2e-helper.1 parent 2ffbd9a commit aba397a
1 file changed
Lines changed: 561 additions & 0 deletions
0 commit comments