Skip to content

fix(local-cli): Loosen the ARN-mode gate in run eval from `!!(runtimeAr... (#737)#42

Draft
aidandaly24 wants to merge 1 commit into
mainfrom
fix/737
Draft

fix(local-cli): Loosen the ARN-mode gate in run eval from `!!(runtimeAr... (#737)#42
aidandaly24 wants to merge 1 commit into
mainfrom
fix/737

Conversation

@aidandaly24

Copy link
Copy Markdown
Owner

Refs aws#737

Issues

Root cause

command.tsx:121 gates with !!(runtimeArn && evaluatorArn) requiring both flags; --evaluator Builtin.* alone yields isArnMode=false and triggers requireProject() (project.tsx:88-91) before handleRunEval, even though resolveFromArn (run-eval.ts:76-86) supports Builtin.* in ARN mode. Misleading name: resolveEvaluatorArns (run-eval.ts:40-45) passes non-ARNs through verbatim. Both from d41e14b (aws#706), unchanged at HEAD v0.20.2.

The fix

Loosen command.tsx:121 to const isArnMode = !!cliOptions.runtimeArn; (resolveFromArn already validates evaluators and errors cleanly). Fix/rename the misleading --evaluator-arn flag at :86 (and :198 batch-eval): minimally correct the description to note it accepts ARNs or Builtin.*/managed IDs; preferably add --evaluator-id with --evaluator-arn as a deprecated alias. Design decision: hidden alias vs breaking hard rename.

Files touched: src/cli/commands/run/command.tsx:121 (isArnMode gate) and :86 (--evaluator-arn flag definition/description); :198 (batch-evaluation --evaluator-arn) for naming consistency. Behavior already supported in src/cli/operations/eval/run-eval.ts:76-96 (resolveFromArn) and :40-45 (resolveEvaluatorArns). Error origin: src/cli/tui/guards/project.tsx:84-92.

Validation evidence

The fix was verified by reproducing the original symptom and re-running after the change:

Original symptom reproduced and fixed at the real file src/cli/commands/run/command.tsx:122 (task description said src/actions/run-eval/command.tsx:121, but the logic matches exactly; project guard is src/cli/tui/guards/project.tsx:88-91). The fix is in the working tree on branch fix/737: gate changed from const isArnMode = !!(cliOptions.runtimeArn && cliOptions.evaluatorArn); to const isArnMode = !!cliOptions.runtimeArn; (plus a docstring tweak on --evaluator-arn) and a new CLI-level test file src/cli/commands/run/tests/run-eval-arn-gating.test.ts.

BEFORE (reverted gate to buggy !!(runtimeArn && evaluatorArn), rebuilt, ran from /tmp non-project dir):
node dist/cli/index.mjs run eval --runtime-arn arn:aws:bedrock-agentcore:us-east-1:123456789012:runtime/my-runtime-abc123 --evaluator Builtin.Correctness --region us-east-1 --json
=> printed No agentcore project found. / Run agentcore create to fix this. — requireProject() fired before handleRunEval/resolveFromArn. Symptom confirmed.

AFTER (restored fix, rebuilt OK -> dist/cli/index.mjs, same non-project dir):

  • Builtin.Correctness in ARN mode => NO project error; proceeded into resolveFromArn (Builtin.* accepted) and handleRunEval, returning {"success":false,"error":"No session spans found for agent "my-runtime-abc123" in the last 7 day(s). Has the agent been invoked?"}. Proves gate passed and Builtin evaluator treated as valid.
  • Custom evaluator my-custom-eval in ARN mode => {"success":false,"error":"Custom evaluator ... cannot be resolved in ARN mode"} (resolveFromArn error, not the project-missing error). Both new vitest cases pass.

Test suite: green.


Staged on the fork as a draft for human review. Promote to aws/agentcore-cli after vetting.

…for Builtin evaluators

The ARN-mode gate required both --runtime-arn and --evaluator-arn, so
`run eval --runtime-arn ... --evaluator Builtin.Correctness` was wrongly
rejected with "No agentcore project found." even though resolveFromArn
already supports Builtin.* evaluators in ARN mode. Loosen the gate to
key off --runtime-arn alone, and clarify the --evaluator-arn description
to steer Builtin.* IDs toward -e/--evaluator.
@github-actions github-actions Bot added size/s PR size: S agentcore-harness-reviewing AgentCore Harness review in progress and removed agentcore-harness-reviewing AgentCore Harness review in progress labels Jun 25, 2026
@github-actions

Copy link
Copy Markdown

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 37.16% 13593 / 36577
🔵 Statements 36.43% 14452 / 39667
🔵 Functions 31.8% 2333 / 7336
🔵 Branches 31.1% 9000 / 28930
Generated in workflow #96 for commit bbcfb66 by the Vitest Coverage Report Action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/s PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant