Skip to content

Commit e266576

Browse files
feat(import): add evaluator and online eval config import subcommands (#780)
* feat(import): add evaluator import subcommand with TUI wizard Add `agentcore import evaluator` to import existing AWS evaluators into CLI projects. Refactor import types and utilities for extensibility so future resource types require minimal new code. Changes: - Add import-evaluator.ts handler with toEvaluatorSpec mapping (LLM-as-a-Judge and code-based evaluators), duplicate detection, and CDK import pipeline - Enhance getEvaluator API wrapper to extract full evaluatorConfig (model, instructions, ratingScale) and tags from SDK tagged unions - Add listAllEvaluators pagination helper filtering out built-in evaluators - Widen ImportableResourceType union and shared utilities for evaluator support - Add evaluator to TUI import flow (select, ARN input, progress screens) - Add 17 unit tests covering spec conversion, template lookup, and error cases Tested end-to-end against real AWS evaluator (bugbash_eval_1775226567-zrDxm7Gpcw) with verified field mapping for all config fields, tags, and deployed state. * fix(import): use correct importType for evaluator in TUI flow The TUI import wizard hardcoded importType as 'memory' for all non-runtime resources, causing evaluator imports to fail with "ARN resource type evaluator does not match expected type memory". Use flow.resourceType instead so the correct handler is dispatched. * feat(import): add online eval config import subcommand Add `agentcore import online-eval` to import existing online evaluation configs from AWS into CLI-managed projects. Follows the same pattern as runtime, memory, and evaluator imports. The command extracts the agent reference from the config's service names (pattern: {agentName}.DEFAULT), maps evaluator IDs to local names or ARN fallbacks, and runs the full CDK import pipeline. Also removes incorrect project-prefix stripping from evaluator and runtime imports — imported resources come from outside the project and won't have the project prefix. Constraint: Agent must exist in project runtimes[] before import (schema enforces cross-reference) Constraint: Evaluators not in project fall back to ARN format to bypass schema validation Rejected: Loose agent validation | schema writeProjectSpec() enforces runtimes[] cross-reference Confidence: high Scope-risk: moderate * feat(import): add online eval config to TUI import wizard Add 'Online Eval Config' option to the interactive import flow so users can import online evaluation configs via the TUI, not just the CLI. Follows the same ARN-only pattern as evaluator and memory imports: select type → enter ARN → import progress → success/error. * docs: add TUI import wizard screenshots for online eval Screenshots captured from the TUI import flow showing: - Import type selection menu with Online Eval Config option - ARN input screen for online eval config - ARN input with a real config ARN filled in * Revert "docs: add TUI import wizard screenshots for online eval" This reverts commit cb4c675. * refactor(import): extract generic import orchestrator with descriptor pattern Reduce ~1,400 lines of duplicated orchestration across four import handlers (runtime, memory, evaluator, online-eval) to ~600 lines by extracting shared logic into executeResourceImport(). Each resource type now provides a thin descriptor declaring its specific behavior. Constraint: Public handleImport* function signatures unchanged (TUI depends on them) Constraint: Factory functions needed for runtime/online-eval to share mutable state between hooks Rejected: Strategy class hierarchy | descriptor objects are simpler and more composable Confidence: high Scope-risk: moderate * refactor(aws): extract paginateAll and fetchTags helpers in agentcore-control Deduplicates identical pagination loops across 4 listAll* functions and identical tag-fetching try/catch blocks across 3 getDetail functions. Also adds optional client param to listEvaluators and listOnlineEvaluationConfigs for connection reuse during pagination. Addresses deferred review feedback from PR #763. Constraint: evaluator listAll still filters out Builtin.* entries Confidence: high Scope-risk: narrow * fix(import): resolve evaluator references via deployed state for imported evaluators resolveEvaluatorReferences used string-contains matching (evaluatorId.includes(localName)) which only works when the evaluator was deployed by the same project. Imported evaluators with renamed local names never matched, falling back to raw ARNs in the config. Now reads deployed-state.json to build an evaluatorId → localName reverse map and checks it first, before the string-contains heuristic. Constraint: Deployed state may not exist yet (first import) — .catch() handles gracefully Rejected: Passing deployed state through descriptor interface | only online-eval needs this Confidence: high Scope-risk: narrow * fix(import): auto-disable online eval configs to unlock evaluators during import Evaluators referenced by ENABLED online eval configs are locked by the service (lockedForModification=true), causing CFN import to fail when it tries to apply stack-level tags. Now the evaluator import detects the lock, temporarily disables referencing online eval configs, performs the import, then re-enables them. Constraint: Re-enable runs in finally block so configs are restored on both success and failure Constraint: Only disables configs that actually reference this specific evaluator Rejected: Refuse import with manual guidance | user can't pause configs not yet in project Confidence: high Scope-risk: moderate * Revert "fix(import): auto-disable online eval configs to unlock evaluators during import" This reverts commit 5839391. * fix(import): block evaluator import when referenced by online eval, use ARN-only references Evaluators locked by an online eval config cannot be CFN-imported because CloudFormation triggers a post-import TagResource call that the resource handler rejects. Instead of stripping tags from the import template, block the import with a clear error and suggestion to use import online-eval. Online eval config import now always references evaluators by ARN rather than resolving to local names, since the evaluators cannot be imported into the project alongside the config. Constraint: CFN IMPORT triggers TagResource which fails on locked evaluators Rejected: Strip Tags from import template | still fails on some resource types Confidence: high Scope-risk: narrow * fix(import): resolve OEC agent reference via deployed state when runtime has custom name extractAgentName() derives the AWS runtime name from the OEC service name pattern, but this fails to match when the runtime was imported with --name since the project spec stores the local name. Now falls back to listing runtimes to find the runtime ID, then looks up the local name in deployed-state.json. * fix(import): strip CDK project prefix from OEC service name when resolving agent CDK constructs set the OEC service name as "{projectName}_{agentName}.DEFAULT". extractAgentName() strips ".DEFAULT" but not the project prefix, so the lookup fails against local runtime names. Now strips the prefix as a fast path before falling back to the deployed-state API lookup. * fix(import): show friendly error for non-existent evaluator ID getEvaluator() now catches ResourceNotFoundException and ValidationException from the SDK and rethrows a clear message instead of exposing the raw regex validation error. * fix(import): validate ARN resource type for online-eval import import online-eval used a naive regex to extract the config ID from the ARN, skipping resource type, region, and account validation. Now uses parseAndValidateArn like all other import commands. Added an ARN resource type mapping to handle the online-eval vs online-evaluation-config mismatch between ImportableResourceType and the ARN format. * refactor(import): address PR review feedback - Add `red` to ANSI constants, replace inline escape codes - Type GetEvaluatorResult.level as EvaluationLevel at boundary - Combine ARN_RESOURCE_TYPE_MAP, collectionKeyMap, idFieldMap into single RESOURCE_TYPE_CONFIG to prevent drift - Export IMPORTABLE_RESOURCES as const array, derive type from it, replace || chains with .includes() - Fix samplingPercentage === 0 false positive (use == null) - Document closure state sequencing contract on descriptor hooks * test(import): remove unreachable empty-level evaluator test The test exercised a defensive fallback in toEvaluatorSpec for an empty level string, but now that GetEvaluatorResult.level is typed as EvaluationLevel, the boundary cast in getEvaluator prevents this case from ever reaching toEvaluatorSpec.
1 parent 380ac6e commit e266576

16 files changed

+2059
-595
lines changed

src/cli/aws/agentcore-control.ts

Lines changed: 234 additions & 51 deletions
Large diffs are not rendered by default.

src/cli/commands/import/__tests__/import-evaluator.test.ts

Lines changed: 437 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 368 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,368 @@
1+
/**
2+
* Import Online Eval Config Unit Tests
3+
*
4+
* Covers:
5+
* - extractAgentName: service name parsing
6+
* - toOnlineEvalConfigSpec conversion: happy path, missing sampling, enableOnCreate
7+
* - Template logical ID lookup for online eval configs
8+
* - Phase 2 import resource list construction for online eval configs
9+
*/
10+
import type { GetOnlineEvalConfigResult } from '../../../aws/agentcore-control';
11+
import { extractAgentName, toOnlineEvalConfigSpec } from '../import-online-eval';
12+
import { buildImportTemplate, findLogicalIdByProperty, findLogicalIdsByType } from '../template-utils';
13+
import type { CfnTemplate } from '../template-utils';
14+
import type { ResourceToImport } from '../types';
15+
import { describe, expect, it } from 'vitest';
16+
17+
// ============================================================================
18+
// extractAgentName Tests
19+
// ============================================================================
20+
21+
describe('extractAgentName', () => {
22+
it('extracts agent name from service name with .DEFAULT suffix', () => {
23+
expect(extractAgentName(['my_agent.DEFAULT'])).toBe('my_agent');
24+
});
25+
26+
it('extracts agent name with project prefix pattern', () => {
27+
expect(extractAgentName(['testproject_my_agent.DEFAULT'])).toBe('testproject_my_agent');
28+
});
29+
30+
it('returns full string when no dot suffix', () => {
31+
expect(extractAgentName(['my_agent'])).toBe('my_agent');
32+
});
33+
34+
it('returns undefined for empty array', () => {
35+
expect(extractAgentName([])).toBeUndefined();
36+
});
37+
38+
it('uses first service name when multiple provided', () => {
39+
expect(extractAgentName(['agent_one.DEFAULT', 'agent_two.DEFAULT'])).toBe('agent_one');
40+
});
41+
42+
it('handles service name with multiple dots', () => {
43+
expect(extractAgentName(['my.agent.DEFAULT'])).toBe('my.agent');
44+
});
45+
});
46+
47+
// ============================================================================
48+
// toOnlineEvalConfigSpec Conversion Tests
49+
// ============================================================================
50+
51+
describe('toOnlineEvalConfigSpec', () => {
52+
it('maps online eval config with all fields', () => {
53+
const detail: GetOnlineEvalConfigResult = {
54+
configId: 'oec-123',
55+
configArn: 'arn:aws:bedrock-agentcore:us-west-2:123456789012:online-evaluation-config/oec-123',
56+
configName: 'QualityMonitor',
57+
status: 'ACTIVE',
58+
executionStatus: 'ENABLED',
59+
description: 'Monitor agent quality',
60+
samplingPercentage: 50,
61+
serviceNames: ['my_agent.DEFAULT'],
62+
evaluatorIds: ['eval-456'],
63+
};
64+
65+
const result = toOnlineEvalConfigSpec(detail, 'QualityMonitor', 'my_agent', ['my_evaluator']);
66+
67+
expect(result.name).toBe('QualityMonitor');
68+
expect(result.agent).toBe('my_agent');
69+
expect(result.evaluators).toEqual(['my_evaluator']);
70+
expect(result.samplingRate).toBe(50);
71+
expect(result.description).toBe('Monitor agent quality');
72+
expect(result.enableOnCreate).toBe(true);
73+
});
74+
75+
it('omits enableOnCreate when execution status is DISABLED', () => {
76+
const detail: GetOnlineEvalConfigResult = {
77+
configId: 'oec-456',
78+
configArn: 'arn:aws:bedrock-agentcore:us-west-2:123456789012:online-evaluation-config/oec-456',
79+
configName: 'DisabledConfig',
80+
status: 'ACTIVE',
81+
executionStatus: 'DISABLED',
82+
samplingPercentage: 10,
83+
serviceNames: ['agent.DEFAULT'],
84+
evaluatorIds: ['eval-1'],
85+
};
86+
87+
const result = toOnlineEvalConfigSpec(detail, 'DisabledConfig', 'agent', ['eval_one']);
88+
89+
expect(result.enableOnCreate).toBeUndefined();
90+
});
91+
92+
it('omits description when not present', () => {
93+
const detail: GetOnlineEvalConfigResult = {
94+
configId: 'oec-789',
95+
configArn: 'arn:aws:bedrock-agentcore:us-west-2:123456789012:online-evaluation-config/oec-789',
96+
configName: 'NoDesc',
97+
status: 'ACTIVE',
98+
executionStatus: 'ENABLED',
99+
samplingPercentage: 25,
100+
serviceNames: ['agent.DEFAULT'],
101+
evaluatorIds: ['eval-1'],
102+
};
103+
104+
const result = toOnlineEvalConfigSpec(detail, 'NoDesc', 'agent', ['eval_one']);
105+
106+
expect(result.description).toBeUndefined();
107+
});
108+
109+
it('throws when sampling percentage is missing', () => {
110+
const detail: GetOnlineEvalConfigResult = {
111+
configId: 'oec-no-sampling',
112+
configArn: 'arn:aws:bedrock-agentcore:us-west-2:123456789012:online-evaluation-config/oec-no-sampling',
113+
configName: 'NoSampling',
114+
status: 'ACTIVE',
115+
executionStatus: 'ENABLED',
116+
serviceNames: ['agent.DEFAULT'],
117+
evaluatorIds: ['eval-1'],
118+
};
119+
120+
expect(() => toOnlineEvalConfigSpec(detail, 'NoSampling', 'agent', ['eval_one'])).toThrow(
121+
'has no sampling configuration'
122+
);
123+
});
124+
125+
it('supports multiple evaluator references', () => {
126+
const detail: GetOnlineEvalConfigResult = {
127+
configId: 'oec-multi',
128+
configArn: 'arn:aws:bedrock-agentcore:us-west-2:123456789012:online-evaluation-config/oec-multi',
129+
configName: 'MultiEval',
130+
status: 'ACTIVE',
131+
executionStatus: 'ENABLED',
132+
samplingPercentage: 75,
133+
serviceNames: ['agent.DEFAULT'],
134+
evaluatorIds: ['eval-1', 'eval-2'],
135+
};
136+
137+
const result = toOnlineEvalConfigSpec(detail, 'MultiEval', 'agent', [
138+
'local_eval',
139+
'arn:aws:bedrock-agentcore:us-west-2:123456789012:evaluator/eval-2',
140+
]);
141+
142+
expect(result.evaluators).toHaveLength(2);
143+
expect(result.evaluators[0]).toBe('local_eval');
144+
expect(result.evaluators[1]).toMatch(/^arn:/);
145+
});
146+
});
147+
148+
// ============================================================================
149+
// Template Logical ID Lookup Tests for Online Eval Configs
150+
// ============================================================================
151+
152+
describe('Template Logical ID Lookup for Online Eval Configs', () => {
153+
const synthTemplate: CfnTemplate = {
154+
Resources: {
155+
MyOnlineEvalConfig: {
156+
Type: 'AWS::BedrockAgentCore::OnlineEvaluationConfig',
157+
Properties: {
158+
OnlineEvaluationConfigName: 'QualityMonitor',
159+
},
160+
},
161+
PrefixedOnlineEvalConfig: {
162+
Type: 'AWS::BedrockAgentCore::OnlineEvaluationConfig',
163+
Properties: {
164+
OnlineEvaluationConfigName: 'TestProject_PrefixedConfig',
165+
},
166+
},
167+
MyAgentRuntime: {
168+
Type: 'AWS::BedrockAgentCore::Runtime',
169+
Properties: {
170+
AgentRuntimeName: 'TestProject_my_agent',
171+
},
172+
},
173+
MyIAMRole: {
174+
Type: 'AWS::IAM::Role',
175+
Properties: {
176+
RoleName: 'MyRole',
177+
},
178+
},
179+
},
180+
};
181+
182+
it('finds online eval config logical ID by OnlineEvaluationConfigName property', () => {
183+
const logicalId = findLogicalIdByProperty(
184+
synthTemplate,
185+
'AWS::BedrockAgentCore::OnlineEvaluationConfig',
186+
'OnlineEvaluationConfigName',
187+
'QualityMonitor'
188+
);
189+
expect(logicalId).toBe('MyOnlineEvalConfig');
190+
});
191+
192+
it('finds prefixed online eval config by full name', () => {
193+
const logicalId = findLogicalIdByProperty(
194+
synthTemplate,
195+
'AWS::BedrockAgentCore::OnlineEvaluationConfig',
196+
'OnlineEvaluationConfigName',
197+
'TestProject_PrefixedConfig'
198+
);
199+
expect(logicalId).toBe('PrefixedOnlineEvalConfig');
200+
});
201+
202+
it('finds all online eval config logical IDs by type', () => {
203+
const logicalIds = findLogicalIdsByType(synthTemplate, 'AWS::BedrockAgentCore::OnlineEvaluationConfig');
204+
expect(logicalIds).toHaveLength(2);
205+
expect(logicalIds).toContain('MyOnlineEvalConfig');
206+
expect(logicalIds).toContain('PrefixedOnlineEvalConfig');
207+
});
208+
209+
it('returns undefined for non-existent config name', () => {
210+
const logicalId = findLogicalIdByProperty(
211+
synthTemplate,
212+
'AWS::BedrockAgentCore::OnlineEvaluationConfig',
213+
'OnlineEvaluationConfigName',
214+
'nonexistent_config'
215+
);
216+
expect(logicalId).toBeUndefined();
217+
});
218+
219+
it('falls back to single online eval config logical ID when name does not match', () => {
220+
const singleConfigTemplate: CfnTemplate = {
221+
Resources: {
222+
OnlyConfig: {
223+
Type: 'AWS::BedrockAgentCore::OnlineEvaluationConfig',
224+
Properties: {
225+
OnlineEvaluationConfigName: 'some_config',
226+
},
227+
},
228+
},
229+
};
230+
231+
let logicalId = findLogicalIdByProperty(
232+
singleConfigTemplate,
233+
'AWS::BedrockAgentCore::OnlineEvaluationConfig',
234+
'OnlineEvaluationConfigName',
235+
'different_name'
236+
);
237+
238+
// Primary lookup fails
239+
expect(logicalId).toBeUndefined();
240+
241+
// Fallback: if there's only one config resource, use it
242+
if (!logicalId) {
243+
const configLogicalIds = findLogicalIdsByType(
244+
singleConfigTemplate,
245+
'AWS::BedrockAgentCore::OnlineEvaluationConfig'
246+
);
247+
if (configLogicalIds.length === 1) {
248+
logicalId = configLogicalIds[0];
249+
}
250+
}
251+
expect(logicalId).toBe('OnlyConfig');
252+
});
253+
});
254+
255+
// ============================================================================
256+
// Phase 2 Resource Import List Construction for Online Eval Configs
257+
// ============================================================================
258+
259+
describe('Phase 2: ResourceToImport List Construction for Online Eval Configs', () => {
260+
const synthTemplate: CfnTemplate = {
261+
Resources: {
262+
OnlineEvalLogicalId: {
263+
Type: 'AWS::BedrockAgentCore::OnlineEvaluationConfig',
264+
Properties: {
265+
OnlineEvaluationConfigName: 'QualityMonitor',
266+
},
267+
},
268+
IAMRoleLogicalId: {
269+
Type: 'AWS::IAM::Role',
270+
Properties: {},
271+
},
272+
},
273+
};
274+
275+
it('builds ResourceToImport list for online eval config', () => {
276+
const configName = 'QualityMonitor';
277+
const configId = 'oec-123';
278+
279+
const resourcesToImport: ResourceToImport[] = [];
280+
281+
const logicalId = findLogicalIdByProperty(
282+
synthTemplate,
283+
'AWS::BedrockAgentCore::OnlineEvaluationConfig',
284+
'OnlineEvaluationConfigName',
285+
configName
286+
);
287+
288+
if (logicalId) {
289+
resourcesToImport.push({
290+
resourceType: 'AWS::BedrockAgentCore::OnlineEvaluationConfig',
291+
logicalResourceId: logicalId,
292+
resourceIdentifier: { OnlineEvaluationConfigId: configId },
293+
});
294+
}
295+
296+
expect(resourcesToImport).toHaveLength(1);
297+
expect(resourcesToImport[0]!.resourceType).toBe('AWS::BedrockAgentCore::OnlineEvaluationConfig');
298+
expect(resourcesToImport[0]!.logicalResourceId).toBe('OnlineEvalLogicalId');
299+
expect(resourcesToImport[0]!.resourceIdentifier).toEqual({ OnlineEvaluationConfigId: 'oec-123' });
300+
});
301+
302+
it('returns empty list when online eval config not found in template', () => {
303+
const emptyTemplate: CfnTemplate = {
304+
Resources: {
305+
IAMRoleLogicalId: {
306+
Type: 'AWS::IAM::Role',
307+
Properties: {},
308+
},
309+
},
310+
};
311+
312+
const logicalId = findLogicalIdByProperty(
313+
emptyTemplate,
314+
'AWS::BedrockAgentCore::OnlineEvaluationConfig',
315+
'OnlineEvaluationConfigName',
316+
'QualityMonitor'
317+
);
318+
319+
expect(logicalId).toBeUndefined();
320+
});
321+
});
322+
323+
// ============================================================================
324+
// buildImportTemplate Tests for Online Eval Config Resources
325+
// ============================================================================
326+
327+
describe('buildImportTemplate with Online Eval Config', () => {
328+
it('adds online eval config resource to deployed template with Retain deletion policy', () => {
329+
const deployedTemplate: CfnTemplate = {
330+
Resources: {
331+
ExistingIAMRole: {
332+
Type: 'AWS::IAM::Role',
333+
Properties: { RoleName: 'ExistingRole' },
334+
},
335+
},
336+
};
337+
338+
const synthTemplate: CfnTemplate = {
339+
Resources: {
340+
ExistingIAMRole: {
341+
Type: 'AWS::IAM::Role',
342+
Properties: { RoleName: 'ExistingRole' },
343+
},
344+
OnlineEvalLogicalId: {
345+
Type: 'AWS::BedrockAgentCore::OnlineEvaluationConfig',
346+
Properties: {
347+
OnlineEvaluationConfigName: 'QualityMonitor',
348+
},
349+
DependsOn: 'ExistingIAMRole',
350+
},
351+
},
352+
};
353+
354+
const importTemplate = buildImportTemplate(deployedTemplate, synthTemplate, ['OnlineEvalLogicalId']);
355+
356+
// Verify online eval config resource was added
357+
expect(importTemplate.Resources.OnlineEvalLogicalId).toBeDefined();
358+
expect(importTemplate.Resources.OnlineEvalLogicalId!.Type).toBe('AWS::BedrockAgentCore::OnlineEvaluationConfig');
359+
expect(importTemplate.Resources.OnlineEvalLogicalId!.DeletionPolicy).toBe('Retain');
360+
expect(importTemplate.Resources.OnlineEvalLogicalId!.UpdateReplacePolicy).toBe('Retain');
361+
362+
// DependsOn should be removed for import
363+
expect(importTemplate.Resources.OnlineEvalLogicalId!.DependsOn).toBeUndefined();
364+
365+
// Original resource should still be there
366+
expect(importTemplate.Resources.ExistingIAMRole).toBeDefined();
367+
});
368+
});

src/cli/commands/import/command.ts

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,13 @@
11
import { handleImport } from './actions';
2+
import { ANSI } from './constants';
3+
import { registerImportEvaluator } from './import-evaluator';
24
import { registerImportMemory } from './import-memory';
5+
import { registerImportOnlineEval } from './import-online-eval';
36
import { registerImportRuntime } from './import-runtime';
47
import type { Command } from '@commander-js/extra-typings';
58
import * as fs from 'node:fs';
69

7-
const green = '\x1b[32m';
8-
const yellow = '\x1b[33m';
9-
const cyan = '\x1b[36m';
10-
const dim = '\x1b[2m';
11-
const reset = '\x1b[0m';
10+
const { green, yellow, cyan, dim, reset } = ANSI;
1211

1312
export const registerImport = (program: Command) => {
1413
const importCmd = program
@@ -151,4 +150,6 @@ export const registerImport = (program: Command) => {
151150
// Register subcommands for importing individual resource types from AWS
152151
registerImportRuntime(importCmd);
153152
registerImportMemory(importCmd);
153+
registerImportEvaluator(importCmd);
154+
registerImportOnlineEval(importCmd);
154155
};

0 commit comments

Comments
 (0)