Skip to content

Commit 677ab95

Browse files
committed
test: per-directory coverage thresholds + E2E cleanup hardening
Day 5: Coverage gate - Add scripts/check-coverage.mjs with ratchet thresholds per source directory (schema 78%, operations 52%, lib 82%, aws 40%, tui/hooks 14%, tui/components 68%, commands 46%). Target thresholds documented in the report, ratchet set ~5% below current to prevent regression. - Also check PR changed-line coverage (>=50%) and post a report comment. - Switch davelosert/vitest-coverage-report-action to file-coverage-mode: changes so PR comments highlight uncovered new lines. Day 6: E2E cleanup - teardownE2EProject() now retries deploy up to 3x and throws on failure instead of swallowing silently. The remove step logs but does not retry. - Tag CDK stacks with Environment=e2e-test / CreatedAt / CreatedBy when AGENTCORE_E2E_TEST=1 is set. The CLI's e2e helper sets this env var; user projects are unaffected. - Shrink cleanupStaleCredentialProviders() window from 30min -> 5min to catch orphans faster. - byo-custom-jwt.test.ts Cognito cleanup now retries 3x with 5s delay instead of swallowing errors on first try. - Add docs/test-resource-sweeper-spec.md for the infra team to implement a scheduled sweeper (CFN stacks, credential providers, log groups, ECR).
1 parent 9f231d0 commit 677ab95

6 files changed

Lines changed: 503 additions & 30 deletions

File tree

.github/workflows/build-and-test.yml

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -134,16 +134,29 @@ jobs:
134134

135135
steps:
136136
- uses: actions/checkout@v6
137+
with:
138+
fetch-depth: 0
139+
- uses: actions/setup-node@v6
140+
with:
141+
node-version: 20.x
142+
cache: 'npm'
137143
- name: Download coverage artifact
138144
uses: actions/download-artifact@v8
139145
with:
140146
name: coverage-report
141147
path: coverage/
142-
- name: Coverage Report
148+
- name: Coverage Report (PR comment)
143149
uses: davelosert/vitest-coverage-report-action@v2
144150
with:
145151
json-summary-path: coverage/coverage-summary.json
146152
json-final-path: coverage/coverage-final.json
147153
vite-config-path: vitest.config.ts
148-
file-coverage-mode: none
149-
coverage-thresholds: '{ "lines": 50, "branches": 50, "functions": 50, "statements": 50 }'
154+
file-coverage-mode: changes
155+
- name: Per-directory coverage gate
156+
env:
157+
BASE_SHA: ${{ github.event.pull_request.base.sha }}
158+
HEAD_SHA: ${{ github.event.pull_request.head.sha }}
159+
PR_NUMBER: ${{ github.event.pull_request.number }}
160+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
161+
GITHUB_REPOSITORY: ${{ github.repository }}
162+
run: node scripts/check-coverage.mjs

docs/test-resource-sweeper-spec.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# E2E Test Resource Sweeper — Specification
2+
3+
## Purpose
4+
5+
The e2e test suite provisions real AWS resources: CloudFormation stacks, credential providers, CloudWatch log groups, ECR images, and S3 artifacts. When a test crashes, times out, or is cancelled mid-run, its `afterAll` teardown may not execute, leaving orphaned resources behind.
6+
7+
Orphan cost estimates from the April 2026 audit put sustained leaks at $2.5k-$10k/week depending on which stacks escape (runtimes, memory, credential providers). Teardown is now fatal on failure (PR bundled with this spec), but belt-and-suspenders dictates a scheduled sweeper that catches anything slipping through.
8+
9+
This document specifies what a daily sweeper must do. The infra team owns implementation.
10+
11+
## Scope
12+
13+
The sweeper runs on a schedule (daily at 04:00 UTC recommended) in the same AWS account used by CI. It identifies and deletes resources that meet all three criteria:
14+
15+
1. Match a well-known e2e naming or tagging pattern.
16+
2. Created more than N hours ago (N=4 recommended — longer than the longest e2e run).
17+
3. Are not part of an active workflow run (check GitHub Actions API for running e2e jobs).
18+
19+
Dry-run mode must print what would be deleted without acting. Default to dry-run for the first two weeks.
20+
21+
## Resources to Sweep
22+
23+
### 1. CloudFormation Stacks
24+
25+
**Identification:** Stacks tagged with `Environment=e2e-test` (this PR adds the tag via `AGENTCORE_E2E_TEST=1` env var in `src/assets/cdk/bin/cdk.ts`). Also match by stack name prefix `AgentCore-E2e*` as a fallback for older stacks.
26+
27+
**Action:** `DeleteStack`. Watch for `DELETE_FAILED` and surface to the on-call rotation — some failures (S3 buckets with objects, ENIs held by Lambda) need manual intervention.
28+
29+
**API calls:**
30+
- `cloudformation:DescribeStacks` (paginate, filter by tag)
31+
- `cloudformation:DeleteStack`
32+
- `cloudformation:DescribeStackEvents` (on DELETE_FAILED for context)
33+
34+
### 2. API Key Credential Providers
35+
36+
**Identification:** Providers named with the `E2e` prefix and `createdTime` older than N hours.
37+
38+
**Action:** `DeleteApiKeyCredentialProvider`. Silent failures acceptable — the CLI already logs these via `cleanupStaleCredentialProviders()`.
39+
40+
**API calls:**
41+
- `bedrock-agentcore-control:ListApiKeyCredentialProviders` (paginate)
42+
- `bedrock-agentcore-control:DeleteApiKeyCredentialProvider`
43+
44+
### 3. CloudWatch Log Groups
45+
46+
**Identification:** Log groups under `/aws/bedrock-agentcore/runtimes/E2e*` and `/aws/codebuild/AgentCore-E2e*`.
47+
48+
**Action:** `DeleteLogGroup` for groups older than N hours. Alternatively, set a retention policy of 3 days on all matched groups and let CloudWatch expire data. Retention is safer — deletion drops diagnostic context if someone is debugging a test failure.
49+
50+
**API calls:**
51+
- `logs:DescribeLogGroups` (filter by prefix, paginate)
52+
- `logs:PutRetentionPolicy` (preferred) or `logs:DeleteLogGroup`
53+
54+
### 4. ECR Repositories
55+
56+
**Identification:** Repositories tagged with `Environment=e2e-test` or named with the `agentcore-e2e-*` prefix.
57+
58+
**Action:** Delete images older than N hours. Keep the repository itself — CDK recreates images on every deploy, so repo deletion causes churn. Image cleanup is sufficient.
59+
60+
**API calls:**
61+
- `ecr:DescribeRepositories` (filter by tag or name)
62+
- `ecr:ListImages`
63+
- `ecr:BatchDeleteImage`
64+
65+
### 5. S3 (CDK Bootstrap Bucket)
66+
67+
**Identification:** The bootstrap bucket (`cdk-*-assets-*`) is shared across all deploys in the account. Don't delete the bucket or its tagged objects — CDK uses content-hashed object keys and expects them to persist.
68+
69+
**Recommendation:** Apply an S3 lifecycle policy to the bootstrap bucket: transition objects to Intelligent-Tiering after 30 days, expire non-current versions after 90 days. Do this once via Terraform/CLI, not via the sweeper.
70+
71+
## Workflow Structure
72+
73+
GitHub Actions workflow (`.github/workflows/e2e-sweeper.yml`) with:
74+
75+
```yaml
76+
on:
77+
schedule:
78+
- cron: '0 4 * * *' # Daily at 04:00 UTC
79+
workflow_dispatch:
80+
inputs:
81+
dry-run:
82+
type: boolean
83+
default: true
84+
```
85+
86+
Permissions: use the same OIDC role that e2e tests use, but with delete permissions for the resources above. Store the role ARN in `AWS_E2E_SWEEPER_ROLE_ARN`.
87+
88+
Steps:
89+
90+
1. Configure AWS credentials (OIDC).
91+
2. Check for running e2e jobs via `gh api /repos/aws/agentcore-cli/actions/runs?status=in_progress`. If any e2e workflow is running, skip the sweep (or narrow age threshold to 24 hours).
92+
3. Run sweep script (Node or Python) against each resource category.
93+
4. Post a summary to a Slack channel (resource counts deleted per category, failures).
94+
5. On any resource with repeated delete failures (>3 runs in a row), open a GitHub issue.
95+
96+
## Safety Rails
97+
98+
- **Hard age floor:** never delete anything younger than 2 hours, even if the script says to.
99+
- **Account allow-list:** the script must fail closed if `AWS_ACCOUNT_ID` is not in the expected list (CI account only).
100+
- **Kill switch:** check for a `SWEEPER_DISABLED` repo variable before running. On-call can flip this if the sweeper misbehaves.
101+
- **Rate limits:** cap deletes per category at 50 per run to avoid runaway behavior.
102+
103+
## Implementation Order
104+
105+
1. Start with CloudFormation stack sweeping (highest $ impact). Run in dry-run for one week.
106+
2. Add credential provider sweeping (already scoped by prefix, low risk).
107+
3. Add log group retention policies (set-and-forget, no scheduled action needed).
108+
4. Add ECR image cleanup (low $ impact; deferrable).
109+
5. Enable live deletes after two weeks of clean dry-run output.
110+
111+
## References
112+
113+
- CDK stack tagging: `src/assets/cdk/bin/cdk.ts` (tags applied when `AGENTCORE_E2E_TEST=1`)
114+
- Credential provider cleanup: `e2e-tests/e2e-helper.ts#cleanupStaleCredentialProviders`
115+
- E2E teardown: `e2e-tests/e2e-helper.ts#teardownE2EProject` (throws on repeated failure as of this PR)

e2e-tests/byo-custom-jwt.test.ts

Lines changed: 39 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,36 @@ const region = process.env.AWS_REGION ?? 'us-east-1';
4848
* Run the local CLI build without skipping install (needed for deploy).
4949
*/
5050
function runLocalCLI(args: string[], cwd: string): Promise<RunResult> {
51-
return runCLI(args, cwd, { skipInstall: false });
51+
return runCLI(args, cwd, {
52+
skipInstall: false,
53+
env: {
54+
AGENTCORE_E2E_TEST: '1',
55+
AGENTCORE_E2E_CREATOR: process.env.AGENTCORE_E2E_CREATOR ?? 'github-actions',
56+
},
57+
});
58+
}
59+
60+
async function deleteCognitoResourceWithRetry(
61+
label: string,
62+
op: () => Promise<unknown>,
63+
attempts = 3,
64+
delayMs = 5000
65+
): Promise<void> {
66+
for (let attempt = 1; attempt <= attempts; attempt++) {
67+
try {
68+
await op();
69+
return;
70+
} catch (err) {
71+
const name = (err as { name?: string }).name ?? 'Unknown';
72+
const msg = (err as { message?: string }).message ?? String(err);
73+
if (attempt === attempts) {
74+
console.error(`[cognito-cleanup] ${label} failed after ${attempts} attempts: [${name}] ${msg}`);
75+
return;
76+
}
77+
console.warn(`[cognito-cleanup] ${label} attempt ${attempt}/${attempts} failed: [${name}] — retrying`);
78+
await new Promise(resolve => setTimeout(resolve, delayMs));
79+
}
80+
}
5281
}
5382

5483
describe.sequential('e2e: BYO agent with CUSTOM_JWT auth', () => {
@@ -201,21 +230,15 @@ describe.sequential('e2e: BYO agent with CUSTOM_JWT auth', () => {
201230

202231
// ── Delete Cognito resources ──
203232
if (userPoolId) {
204-
try {
205-
await cognitoClient.send(new DeleteResourceServerCommand({ UserPoolId: userPoolId, Identifier: 'agentcore' }));
206-
} catch {
207-
/* best-effort */
208-
}
209-
try {
210-
await cognitoClient.send(new DeleteUserPoolDomainCommand({ UserPoolId: userPoolId, Domain: domainPrefix }));
211-
} catch {
212-
/* best-effort */
213-
}
214-
try {
215-
await cognitoClient.send(new DeleteUserPoolCommand({ UserPoolId: userPoolId }));
216-
} catch {
217-
/* best-effort */
218-
}
233+
await deleteCognitoResourceWithRetry('DeleteResourceServer', () =>
234+
cognitoClient.send(new DeleteResourceServerCommand({ UserPoolId: userPoolId, Identifier: 'agentcore' }))
235+
);
236+
await deleteCognitoResourceWithRetry('DeleteUserPoolDomain', () =>
237+
cognitoClient.send(new DeleteUserPoolDomainCommand({ UserPoolId: userPoolId, Domain: domainPrefix }))
238+
);
239+
await deleteCognitoResourceWithRetry('DeleteUserPool', () =>
240+
cognitoClient.send(new DeleteUserPoolCommand({ UserPoolId: userPoolId }))
241+
);
219242
}
220243

221244
// ── Clean up temp directory ──

e2e-tests/e2e-helper.ts

Lines changed: 38 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,10 @@ export function createE2ESuite(cfg: E2EConfig) {
308308
export { hasAws, baseCanRun };
309309

310310
export function runAgentCoreCLI(args: string[], cwd: string): Promise<RunResult> {
311-
return spawnAndCollect('agentcore', args, cwd);
311+
return spawnAndCollect('agentcore', args, cwd, {
312+
AGENTCORE_E2E_TEST: '1',
313+
AGENTCORE_E2E_CREATOR: process.env.AGENTCORE_E2E_CREATOR ?? 'github-actions',
314+
});
312315
}
313316

314317
// TODO: Replace with `agentcore add target` once the CLI command is re-introduced
@@ -347,7 +350,7 @@ async function deleteCredentialProvider(client: BedrockAgentCoreControlClient, n
347350
* Runs in beforeAll to prevent accumulation from previous runs that
348351
* crashed or timed out before their afterAll teardown could execute.
349352
*/
350-
export async function cleanupStaleCredentialProviders(maxAgeMs: number = 30 * 60 * 1000): Promise<void> {
353+
export async function cleanupStaleCredentialProviders(maxAgeMs: number = 5 * 60 * 1000): Promise<void> {
351354
const region = process.env.AWS_REGION ?? 'us-east-1';
352355
const client = new BedrockAgentCoreControlClient({ region });
353356
const cutoff = new Date(Date.now() - maxAgeMs);
@@ -365,17 +368,45 @@ export async function cleanupStaleCredentialProviders(maxAgeMs: number = 30 * 60
365368
}
366369

367370
export async function teardownE2EProject(projectPath: string, agentName: string, modelProvider: string): Promise<void> {
368-
await spawnAndCollect('agentcore', ['remove', 'all', '--json'], projectPath);
369-
const result = await spawnAndCollect('agentcore', ['deploy', '--yes', '--json'], projectPath);
370-
if (result.exitCode !== 0) {
371-
console.log('Teardown stdout:', result.stdout);
372-
console.log('Teardown stderr:', result.stderr);
371+
const failures: string[] = [];
372+
373+
const removeResult = await runAgentCoreCLI(['remove', 'all', '--json'], projectPath);
374+
if (removeResult.exitCode !== 0) {
375+
console.error(`[teardown] remove all failed (exit ${removeResult.exitCode})`);
376+
console.error('[teardown] remove stdout:', removeResult.stdout);
377+
console.error('[teardown] remove stderr:', removeResult.stderr);
378+
failures.push(`remove all: exit ${removeResult.exitCode}`);
379+
}
380+
381+
const MAX_DEPLOY_ATTEMPTS = 3;
382+
let deploySucceeded = false;
383+
for (let attempt = 1; attempt <= MAX_DEPLOY_ATTEMPTS; attempt++) {
384+
const result = await runAgentCoreCLI(['deploy', '--yes', '--json'], projectPath);
385+
if (result.exitCode === 0) {
386+
deploySucceeded = true;
387+
if (attempt > 1) console.error(`[teardown] deploy succeeded on attempt ${attempt}`);
388+
break;
389+
}
390+
console.error(`[teardown] deploy attempt ${attempt}/${MAX_DEPLOY_ATTEMPTS} failed (exit ${result.exitCode})`);
391+
console.error('[teardown] deploy stdout:', result.stdout);
392+
console.error('[teardown] deploy stderr:', result.stderr);
393+
if (attempt < MAX_DEPLOY_ATTEMPTS) {
394+
await new Promise(resolve => setTimeout(resolve, 15000));
395+
}
373396
}
397+
if (!deploySucceeded) {
398+
failures.push(`deploy teardown failed after ${MAX_DEPLOY_ATTEMPTS} attempts`);
399+
}
400+
374401
if (modelProvider !== 'Bedrock' && agentName) {
375402
const region = process.env.AWS_REGION ?? 'us-east-1';
376403
const client = new BedrockAgentCoreControlClient({ region });
377404
await deleteCredentialProvider(client, `${agentName}${modelProvider}`);
378405
}
406+
407+
if (failures.length > 0) {
408+
throw new Error(`E2E teardown failed: ${failures.join('; ')}`);
409+
}
379410
}
380411

381412
export async function dumpImportDebugInfo(

0 commit comments

Comments
 (0)