Skip to content

docs: add least-privilege deployment roles and deployment guide#46

Merged
scottschreckengaust merged 25 commits intomainfrom
least-privilege-deployment-documentation
Apr 26, 2026
Merged

docs: add least-privilege deployment roles and deployment guide#46
scottschreckengaust merged 25 commits intomainfrom
least-privilege-deployment-documentation

Conversation

@scottschreckengaust
Copy link
Copy Markdown
Contributor

@scottschreckengaust scottschreckengaust commented Apr 23, 2026

Summary

  • Validated least-privilege IAM policies for the CloudFormation execution role through end-to-end deployment testing on a clean AWS account (us-east-1). The original single monolithic policy was replaced with a 3-way split to stay under the IAM managed policy 6,144-character limit:
    • IaCRole-ABCA-Infrastructure — CloudFormation, IAM, VPC, Route 53 Resolver DNS Firewall
    • IaCRole-ABCA-Application — DynamoDB, Lambda, API Gateway, Cognito, WAFv2, EventBridge, Secrets Manager (+ optional ECS)
    • IaCRole-ABCA-Observability — Bedrock AgentCore, Bedrock Guardrails, CloudWatch, X-Ray, S3, KMS, ECR, SSM, STS
  • Fixed Quick Start deployment blockers found during end-to-end walkthrough:
    • X-Ray update-trace-segment-destination fails on fresh accounts without a CloudWatch Logs resource policy — added prerequisite aws logs put-resource-policy command
    • mise run build fails without AWS credentials (CDK synth does AZ lookups) — added note and common error entry
    • Added common error entries for non-TTY deploy approval and build credential issues
    • Added AWS_PROFILE guidance for multi-profile users
  • Fixed abca-plugin gaps discovered during deep review:
    • /setup skill Phase 3 was missing the logs put-resource-policy prerequisite (same X-Ray bug)
    • /deploy skill had no least-privilege guidance — added section with re-bootstrap command and reference to DEPLOYMENT_ROLES.md
    • CLAUDE.md had no reference to the plugin — added pointer so sessions discover guided workflows
  • Add docs/guides/DEPLOYMENT_GUIDE.md covering architecture, scale-to-zero analysis (~$140-150/month idle), and complete AWS services inventory
  • Update docs/design/COST_MODEL.md with corrected baseline, scale-to-zero section, and updated references
  • Add .gitignore entries for Claude Code plugin artifacts (.mcp.json, .remember/)
  • Add docs-sync pre-commit hook to auto-regenerate Starlight mirrors

Review feedback addressed

Changes made after code review:

# Issue Resolution
1 SecretsManager Resource: "*" Split GetRandomPassword into own statement; all other actions scoped to backgroundagent-*
2 Deploy SKILL.md references non-existent IaCRole-ABCA-Policy Updated to reference all three policy names
3 DEPLOYMENT_GUIDE.md has no Starlight mirror Added route mapping, mirror, and sidebar entry
4 iam:PassRole without conditions Added IAMPassRole statement with iam:PassedToService condition (7 services); AttachRolePolicy restriction added as iterative tightening item
5 aws-service-role/* allows any service-linked role Added iam:AWSServiceName as iterative tightening item
6 KMS kms:CreateGrant on Resource: "*" Added kms:ResourceAliases as iterative tightening item
7 X-Ray resource policy grants Resource: "*" Scoped to arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans
8 Inconsistent placeholder names Unified to ACCOUNT_ID with substitution note
9 ACCOUNT_ID captured but never used Now used by scoped X-Ray resource policy (item 7)
10 Session timeout "8 hours" vs "9 hours" Clarified: 8h = AgentCore service limit, 9h = orchestrator executionTimeout
11 VPC endpoint cost underestimated Corrected from ~$50/mo to ~$102/mo (7×2 AZs×$0.01/hr×730hrs); baseline updated to ~$140-150/mo
12 Region wildcards Kept as iterative tightening item #3 (users deploy to different regions)

Deployment validation test matrix

Four configurations are tested across the full stack lifecycle (create → task → update → destroy):

# IAM Policy ECS Compute Purpose
V1 AdministratorAccess (default bootstrap) Off (AgentCore only) Baseline — confirms stack works with default permissions
V2 AdministratorAccess On (ECS Fargate) Baseline with ECS — confirms ECS compute path works
V3 Least-privilege (3-way split) Off (AgentCore only) Primary validation — confirms scoped policies are sufficient
V4 Least-privilege (3-way split) On (ECS Fargate) Confirms ECS statement + scoped policies work together

Lifecycle steps per variation

Each variation runs through:

  1. Create — Fresh cdk deploy from clean state (no existing stack). Validates all resource creation permissions.
  2. Task — Submit a coding task via CLI, wait for agent to complete and open a PR. Validates runtime permissions (Secrets Manager read, DynamoDB, AgentCore/ECS invocation).
  3. Update — Modify a Blueprint parameter and redeploy. Validates stack update permissions (resource modifications, not just creation).
  4. Destroycdk destroy to tear down all resources. Validates deletion permissions.

Pre-test setup (once per account)

  • CDK bootstrap (default for V1/V2, re-bootstrap with --cloudformation-execution-policies for V3/V4)
  • X-Ray resource policy (aws logs put-resource-policy)
  • GitHub PAT stored in Secrets Manager (post first deploy)
  • Cognito user created (post first deploy)

Pass criteria

  • Create: Stack reaches CREATE_COMPLETE with no permission errors in CloudFormation events
  • Task: Task reaches COMPLETED status with a PR URL in the output
  • Update: Stack reaches UPDATE_COMPLETE
  • Destroy: Stack reaches DELETE_COMPLETE (AgentCore ENI timing retries are acceptable)

Progress is reported as individual PR comments (one per lifecycle step per variation, ~16 total).

Files changed

File What Changed
docs/design/DEPLOYMENT_ROLES.md 3-way policy split; IAMPassRole with PassedToService condition; split SecretsManager; iterative tightening items for AttachRolePolicy, CreateServiceLinkedRole, KMS; unified ACCOUNT_ID placeholder
docs/guides/DEPLOYMENT_GUIDE.md New: architecture, scale-to-zero (~$140-150/mo), AWS services inventory; corrected VPC endpoint cost; clarified session timeouts
docs/guides/QUICK_START.md X-Ray resource policy scoped to aws/spans; build credential note; AWS_PROFILE guidance; 4 new common errors
docs/design/COST_MODEL.md Corrected VPC endpoint cost (~$102/mo); baseline updated to ~$140-150/mo; clarified session timeouts; scale-to-zero section
docs/abca-plugin/skills/setup/SKILL.md Scoped X-Ray resource policy
docs/abca-plugin/skills/deploy/SKILL.md Updated to 3-way policy split
docs/scripts/sync-starlight.mjs Added DEPLOYMENT_GUIDE route mapping and mirror
docs/astro.config.mjs Added Deployment Guide sidebar entry
CLAUDE.md Added plugin reference
AGENTS.md Strengthened docs-sync instructions
.gitignore Added .mcp.json, .remember/
.pre-commit-config.yaml Added docs-sync pre-commit hook

Test plan

  • Deploy with AdministratorAccess on clean account — passed (initial validation)
  • CloudTrail analysis of all CFN execution role actions — 36 gaps identified
  • Deploy with scoped 3-way policy split — passed after 7 iterations (initial validation)
  • Stack update with scoped policies — passed (initial validation)
  • Smoke test (task submission, PR creation) — passed (initial validation)
  • Stack destroy with scoped policies — passed with AgentCore ENI retry (initial validation)
  • ECS statement fits in Application policy under size limit — verified (4,110 / 6,144 chars with ECS + split SM)
  • Verify all markdown links resolve between new and existing docs
  • Starlight mirrors regenerated and astro check passed
  • V1: Admin / No ECS — create → task → update → destroy
  • V2: Admin / ECS — create → task → update → destroy
  • V3: Least-priv / No ECS — create → task → update → destroy
  • V4: Least-priv / ECS — create → task → update → destroy

🤖 Generated with Claude Code

@scottschreckengaust scottschreckengaust requested a review from a team as a code owner April 23, 2026 15:17
@scottschreckengaust scottschreckengaust changed the base branch from main to feat/fargate-agent-stack April 23, 2026 15:30
Add DEPLOYMENT_ROLES.md with least-privilege IAM policy for the
CloudFormation execution role (IaCRole-ABCA), derived from analysis
of all CDK constructs and handler code in the current single-stack
architecture. Includes optional ECS statements when Fargate is enabled.

Add DEPLOYMENT_GUIDE.md covering compute backend choices (AgentCore
vs opt-in ECS Fargate via ComputeStrategy), scale-to-zero analysis,
and complete AWS services inventory.

Update COST_MODEL.md with scale-to-zero characteristics section,
corrected baseline to ~$85-95/month, and updated references.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust scottschreckengaust force-pushed the least-privilege-deployment-documentation branch from f0c077f to 9babf85 Compare April 23, 2026 15:42
@scottschreckengaust scottschreckengaust changed the base branch from feat/fargate-agent-stack to main April 23, 2026 15:42
scottschreckengaust and others added 4 commits April 23, 2026 15:46
Append new references at the bottom instead of reordering the
existing list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The original had COMPUTE.md listed twice intentionally — once for
the network architecture section and once for compute billing. Restore
this pattern instead of merging into one entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single entry with anchor link to the network architecture section
instead of listing the same file twice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use AWS-native IAM Access Analyzer policy generation instead of
third-party tooling for iterative policy tightening.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread docs/design/COST_MODEL.md Outdated
scottschreckengaust and others added 3 commits April 23, 2026 09:45
The sync-starlight.mjs script generates mirror files under
docs/src/content/docs/ from source docs. These generated files were
missing from prior commits, causing the CI mutation check to fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The PR#46 build failed because Starlight mirror files under
docs/src/content/docs/ were not regenerated after editing source docs.
The pre-commit hooks had no step to catch this locally.

- Add `docs-sync` pre-commit hook that auto-runs sync-starlight.mjs and
  stages the generated mirrors when docs sources change
- Strengthen AGENTS.md boundary and common-mistakes sections to
  explicitly warn that CI rejects stale mirrors and name the exact
  command to regenerate them

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust scottschreckengaust marked this pull request as draft April 23, 2026 17:08
scottschreckengaust and others added 4 commits April 23, 2026 17:16
…ODEL

- Session timeout: 8 hours → 9 hours (matches task-orchestrator.ts:173)
- Concurrency limit: 2 → 3 (matches task-orchestrator.ts:163 default)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents local plugin state from the remember and MCP plugins from
being tracked in version control.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l notes

On a fresh AWS account, `aws xray update-trace-segment-destination`
fails with AccessDeniedException because X-Ray needs a CloudWatch Logs
resource policy before it can write spans. Added the prerequisite
`aws logs put-resource-policy` command to Quick Start Step 3.

Also documented that `mise run build` requires AWS credentials with
ec2:DescribeAvailabilityZones for CDK synthesis, and added common error
table entries for the X-Ray, build credential, and non-TTY deploy issues.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ref to /deploy

The /setup skill's Phase 3 only ran `aws xray update-trace-segment-destination`
which fails with AccessDeniedException on fresh accounts. Added the prerequisite
`aws logs put-resource-policy` command.

Added a "Least-Privilege Deployment" section to the /deploy skill linking to
DEPLOYMENT_ROLES.md with the re-bootstrap command for scoped execution policies.

Updated CLAUDE.md to reference the abca-plugin and its available skills so
Claude Code sessions discover the guided workflows without requiring
--plugin-dir.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust scottschreckengaust force-pushed the least-privilege-deployment-documentation branch from 903fa5f to 544cbf2 Compare April 23, 2026 20:36
scottschreckengaust and others added 3 commits April 23, 2026 23:03
Replace the single monolithic IAM policy (which exceeded the 6,144-char
IAM managed policy limit) with three validated policies:
- IaCRole-ABCA-Infrastructure (CFN, IAM, VPC, DNS Firewall)
- IaCRole-ABCA-Application (DDB, Lambda, APIGW, Cognito, WAF, EB, SM)
- IaCRole-ABCA-Observability (Bedrock, CW, X-Ray, S3, ECR, KMS, SSM, STS)

All three policies were validated against a live deployment in us-east-1
(create, update, task execution, and destroy). CloudTrail analysis found
36 additional actions beyond the initial code review, and 7 deployment
iterations refined the policies. Key additions:
- KMS (entirely missing from original)
- lambda:InvokeFunction for AwsCustomResource
- bedrock-agentcore:* (CFN handler uses internal action names)
- Legacy CW Logs delivery actions for Route53 Resolver
- Various Describe/List/Get actions for read-only CFN operations

Updated the origin disclaimer, Resource-level permission constraints
table, and ECS section to reference the Application policy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarify in the ECS section that adding the ECS statement to
IaCRole-ABCA-Application keeps the combined policy under the
6,144-character IAM managed policy limit (4,212 of 6,144 chars).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust scottschreckengaust marked this pull request as ready for review April 24, 2026 19:29
@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 24, 2026

Overall this is a high-quality, deployment-validated PR that replaces AdministratorAccess with scoped IAM policies — a significant security improvement. However, all three reviewers flagged issues
that should be addressed before merge.


Critical (must fix)

  1. SecretsManager Resource includes bare "*" — grants full account access to all secrets

File: docs/design/DEPLOYMENT_ROLES.md (and Starlight mirror)

The SecretsManager statement lists "*" as a second resource alongside the scoped ARN, completely negating least-privilege. Actions like GetSecretValue, DeleteSecret, and PutSecretValue become
account-wide.

Fix: Split GetRandomPassword (the only action that requires "") into its own statement:
{
"Sid": "SecretsManager",
"Action": ["secretsmanager:CreateSecret", "...all others except GetRandomPassword..."],
"Resource": "arn:aws:secretsmanager:
:㊙️backgroundagent-"
},
{
"Sid": "SecretsManagerAccountLevel",
"Action": "secretsmanager:GetRandomPassword",
"Resource": "*"
}

  1. Deploy SKILL.md references non-existent policy name IaCRole-ABCA-Policy

File: docs/abca-plugin/skills/deploy/SKILL.md

The skill tells users to bootstrap with a single IaCRole-ABCA-Policy, but DEPLOYMENT_ROLES.md defines three policies (-Infrastructure, -Application, -Observability). Users following this skill will
get a deployment failure.

Fix: Update to reference all three --cloudformation-execution-policies flags, matching DEPLOYMENT_ROLES.md.

  1. DEPLOYMENT_GUIDE.md has no Starlight mirror — broken link on docs site

File: docs/src/content/docs/architecture/Cost-model.md

The Starlight mirror of COST_MODEL.md contains Deployment guide as a raw relative path because the sync script has no route mapping for it. This is a broken link on
the deployed Starlight site.

Fix: Either add DEPLOYMENT_GUIDE to explicitGuideRoutes in sync-starlight.mjs and create a mirror, or remove the link from content that gets mirrored.


High (strongly recommend fixing)

  1. iam:PassRole and iam:AttachRolePolicy without conditions — privilege escalation path

File: docs/design/DEPLOYMENT_ROLES.md, IAMRolesAndPolicies statement

Without an iam:PassedToService condition on PassRole and no policy restriction on AttachRolePolicy, the execution role could create a backgroundagent-dev-* role, attach AdministratorAccess to it, pass
it to Lambda, and invoke it — full account compromise.

Fix: Add iam:PassedToService condition limiting to lambda.amazonaws.com, ecs-tasks.amazonaws.com, apigateway.amazonaws.com, logs.amazonaws.com, bedrock.amazonaws.com.

  1. aws-service-role/* allows creating any service-linked role

File: docs/design/DEPLOYMENT_ROLES.md, IAMRolesAndPolicies statement

Combined with iam:CreateServiceLinkedRole, the resource arn:aws:iam:::role/aws-service-role/ allows creating service-linked roles for any AWS service.

Fix: Add an iam:AWSServiceName condition scoped to only the services ABCA actually uses.

  1. KMS kms:CreateGrant on Resource: "*" — can delegate key access across account

The CDK bootstrap key alias (alias/cdk-hnb659fds-*) is deterministic. Consider adding a kms:ResourceAliases condition to scope this.

  1. X-Ray resource policy in QUICK_START.md grants Resource: "*" to xray.amazonaws.com

Fix: Scope to arn:aws:logs::ACCOUNT_ID:log-group:aws/spans and arn:aws:logs::ACCOUNT_ID:log-group:aws/spans:*.

  1. Inconsistent placeholder names (ACCOUNT vs ACCOUNT_ID)

The bootstrap command uses ACCOUNT, the trust policy uses ACCOUNT_ID, and the IAM ARNs use * for the account field. Should be unified with a single ACCOUNT_ID placeholder and a note at the top
explaining substitution.


Medium (should fix)

  1. ACCOUNT_ID variable captured but never used

Files: QUICK_START.md, setup/SKILL.md (and Starlight mirror)

ACCOUNT_ID=$(aws sts get-caller-identity ...) is set but never referenced. Either remove it or use it to scope the resource policy ARN.

  1. Session timeout inconsistency: "8 hours" vs "9 hours"
  • DEPLOYMENT_GUIDE.md says "8 hours (AgentCore session)"
  • COST_MODEL.md (updated in this PR) says "9 hours"
  • Code: executionTimeout: Duration.hours(9) at task-orchestrator.ts:173
  • Pre-existing COST_MODEL.md line 46 still says "8-hour max session timeout" (not updated)

These are actually two different limits (AgentCore service limit vs. orchestrator timeout), but this needs explicit clarification.

  1. VPC endpoint cost may be underestimated

The PR claims ~$50/month for 7 endpoints across 2 AZs. AWS pricing: 7 x 2 AZs x $0.01/hr x 730 hrs = ~$102/month. This would push the total baseline to ~$135-145/month. (Pre-existing issue in
COST_MODEL.md, perpetuated in the new DEPLOYMENT_GUIDE.md.)

  1. Region wildcards throughout all policies

All ARNs use * for region despite ABCA deploying to a single region. The "Iterative tightening" section mentions this but it could be a stronger recommendation.

scottschreckengaust and others added 3 commits April 24, 2026 20:46
…constraints table

GetRandomPassword is an account-level API with no secret ARN, so it
requires Resource:"*". Document this in the Resource-level permission
constraints table alongside other services that require "*".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The skill referenced a non-existent IaCRole-ABCA-Policy. Update to
the three actual policy names (Infrastructure, Application, Observability)
matching DEPLOYMENT_ROLES.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit route mapping, mirrorMarkdownFile call, and sidebar entry
so the Deployment Guide renders on the docs site and cross-doc links
from COST_MODEL.md resolve correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

1. SecretsManager Resource includes bare "*"

The "*" is intentional — secretsmanager:GetRandomPassword is an account-level API that does not support resource-level permissions (it generates a random string without referencing any secret). Splitting it into its own statement would push the Application policy over the IAM 6,144-character limit (the 3-way split was specifically sized to stay under this). Added an entry to the Resource-level permission constraints table documenting this, with a pointer to the Iterative tightening section for post-deployment refinement.

2. Deploy SKILL.md references non-existent policy name IaCRole-ABCA-Policy

Fixed — updated to reference all three policies (IaCRole-ABCA-Infrastructure, -Application, -Observability) with the three --cloudformation-execution-policies flags matching DEPLOYMENT_ROLES.md.

3. DEPLOYMENT_GUIDE.md has no Starlight mirror — broken link on docs site

Fixed — added DEPLOYMENT_GUIDE to explicitGuideRoutes in sync-starlight.mjs, added a mirrorMarkdownFile() call to generate the mirror at getting-started/Deployment-guide.md, and added a sidebar entry in astro.config.mjs. The COST_MODEL.md links now rewrite to /getting-started/deployment-guide instead of the raw relative path.

Isolate the account-level GetRandomPassword action (which requires
Resource:*) from the scoped SecretsManager statement. With ECS the
Application policy is still only ~4K of the 6,144-char IAM limit,
leaving ~2K headroom for future services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V1: Admin / No ECS — Task

Passed — Task 01KQ0VB0GSYS245NH2Q4ZGN41K completed in 124s, $0.14.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V1: Admin / No ECS — Update

Passed — Stack reached UPDATE_COMPLETE. Added systemPromptOverrides to Blueprint and redeployed successfully.

@krokoko krokoko disabled auto-merge April 24, 2026 23:20
krokoko
krokoko previously approved these changes Apr 24, 2026
@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V1: Admin / No ECS — Destroy

In progress (retry) — First delete attempt hit the known AgentCore ENI cleanup timing issue (security group + 2 subnets retained by ela-attach managed ENIs). Retrying delete — ENIs typically release within 10-30 min after runtime deletion. This is a known operational note, not a permissions issue.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V1: Admin / No ECS — Destroy

Passed (with known ENI retry) — Stack deleted. AgentCore-managed ENIs (ela-attach) held references to the security group and subnets, requiring --retain-resources on the VPC, 2 subnets, and 1 SG. These orphaned resources will be cleaned up once ENIs release (~10-30 min). This is a known operational characteristic, not a permissions issue.

V1 Summary: All 4 lifecycle steps passed (Create ✅ → Task ✅ → Update ✅ → Destroy ✅).

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V2: Admin / ECS — Create

Passed — Stack reached CREATE_COMPLETE with ECS Fargate cluster, task definition, and Docker image asset. Submitting smoke test task now.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V2: Admin / ECS — Task

Passed — Task completed in 130s, $0.16.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V2: Admin / ECS — Update

Passed — Stack reached UPDATE_COMPLETE. Removed systemPromptOverrides from Blueprint and redeployed with ECS cluster intact.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V2: Admin / ECS — Destroy

Passed (with known ENI retry) — Same AgentCore ENI timing pattern as V1. Used --retain-resources on VPC/subnets/SG, stack deleted successfully. Orphaned resources cleaned up separately.

V2 Summary: All 4 lifecycle steps passed (Create ✅ → Task ✅ → Update ✅ → Destroy ✅).

Now setting up V3 (least-privilege / no ECS) — creating IAM policies and re-bootstrapping.

CDK generates the GitHub token secret with construct ID hash
(GitHubTokenSecret09BC4210-*), not the backgroundagent- prefix.
Add this pattern to the SecretsManager statement Resource list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
V3 least-privilege deploy found two missing services in the
iam:PassedToService condition: vpc-flow-logs.amazonaws.com (VPC
Flow Log role) and bedrock-agentcore.amazonaws.com (AgentMemory
service role).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V3: Least-priv / No ECS — Create

Passed (after 2 policy fixes) — Stack reached CREATE_COMPLETE with scoped 3-way IAM policies.

Two permission gaps found and fixed during V3 validation:

  1. SecretsManager resource scope — CDK generates the GitHub token secret as GitHubTokenSecret09BC4210-*, not backgroundagent-*. Added GitHubTokenSecret* to the resource list. (commit 3adb6e2)
  2. PassedToService missing servicesvpc-flow-logs.amazonaws.com (VPC Flow Log role) and bedrock-agentcore.amazonaws.com (AgentMemory service role) were missing from the condition. Added both. (commit 6cd306b)

Submitting smoke test task now.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V3: Least-priv / No ECS — Task

Passed — Task completed in 124s, $0.14.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V3: Least-priv / No ECS — Update

Passed — Stack reached UPDATE_COMPLETE under scoped IAM policies.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V3: Least-priv / No ECS — Destroy

Passed (with X-Ray internal error retry) — Most resources deleted cleanly. AWS::XRay::ResourcePolicy delete hit an AWS internal error (InternalFailure) — not a permissions issue. Retried with --retain-resources and manually cleaned up the orphaned X-Ray policy.

V3 Summary: All 4 lifecycle steps passed (Create ✅ → Task ✅ → Update ✅ → Destroy ✅).

Two policy fixes discovered and committed during V3:

  • GitHubTokenSecret* added to SecretsManager resource scope
  • vpc-flow-logs.amazonaws.com and bedrock-agentcore.amazonaws.com added to PassedToService

Starting V4 (least-priv / ECS) now.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V4: Least-priv / ECS — Create

Passed — Stack reached CREATE_COMPLETE with ECS cluster + scoped 3-way IAM policies (Application policy v3 includes ECS statement, 4,167 / 6,144 chars). Submitting smoke test task now.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V4: Least-priv / ECS — Task

Passed — Task completed in 129s, $0.16.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V4: Least-priv / ECS — Update

Passed — Stack reached UPDATE_COMPLETE under scoped IAM policies + ECS.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V4: Least-priv / ECS — Destroy

Passed (with X-Ray internal error + ENI retry) — Same patterns as V3. X-Ray ResourcePolicy delete hit AWS InternalFailure, and AgentCore ENIs held VPC resources. Both are AWS service timing issues, not permissions. Used --retain-resources and cleaned up manually.

V4 Summary: All 4 lifecycle steps passed (Create ✅ → Task ✅ → Update ✅ → Destroy ✅).


Full Test Matrix Results

# Policy ECS Create Task Update Destroy
V1 Admin Off ✅ PR #3, $0.14, 124s ✅ (ENI retry)
V2 Admin On ✅ PR #4, $0.16, 130s ✅ (ENI retry)
V3 Least-priv Off ✅ (2 policy fixes) ✅ PR #5, $0.14, 124s ✅ (X-Ray internal)
V4 Least-priv On ✅ PR #6, $0.16, 129s ✅ (X-Ray + ENI)

16/16 test points passed.

Policy fixes discovered during validation

  1. SecretsManager resource scope — Added GitHubTokenSecret* pattern (CDK-generated secret name doesn't use backgroundagent- prefix) — commit 3adb6e2
  2. PassedToService missing services — Added vpc-flow-logs.amazonaws.com and bedrock-agentcore.amazonaws.com — commit 6cd306b

Known destroy patterns (not permissions-related)

  • AgentCore ENIs (ela-attach) hold SG/subnet references for 10-30 min after runtime deletion → use --retain-resources then clean up
  • X-Ray ResourcePolicy delete intermittently returns InternalFailure → retain and delete manually via aws xray delete-resource-policy

Account cleanup completed

  • Orphaned VPC resources (subnets, SGs, VPCs) cleaned up after ENI release
  • Orphaned X-Ray resource policies deleted
  • CDK bootstrap stack retained (reusable)

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

@krokoko - ready for review, all four matrix tests are complete

@krokoko krokoko self-requested a review April 26, 2026 15:59
@scottschreckengaust scottschreckengaust added this pull request to the merge queue Apr 26, 2026
Merged via the queue into main with commit 3da9751 Apr 26, 2026
6 checks passed
@krokoko krokoko deleted the least-privilege-deployment-documentation branch April 26, 2026 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants