Skip to content

Commit 3da9751

Browse files
scottschreckengaustclaudekrokoko
authored
docs: add least-privilege deployment roles and deployment guide (#46)
* docs: add least-privilege deployment roles and deployment guide Add DEPLOYMENT_ROLES.md with least-privilege IAM policy for the CloudFormation execution role (IaCRole-ABCA), derived from analysis of all CDK constructs and handler code in the current single-stack architecture. Includes optional ECS statements when Fargate is enabled. Add DEPLOYMENT_GUIDE.md covering compute backend choices (AgentCore vs opt-in ECS Fargate via ComputeStrategy), scale-to-zero analysis, and complete AWS services inventory. Update COST_MODEL.md with scale-to-zero characteristics section, corrected baseline to ~$85-95/month, and updated references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): preserve original reference order in COST_MODEL.md Append new references at the bottom instead of reordering the existing list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): restore dual COMPUTE.md references in COST_MODEL.md The original had COMPUTE.md listed twice intentionally — once for the network architecture section and once for compute billing. Restore this pattern instead of merging into one entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): consolidate COMPUTE.md references with section anchor Single entry with anchor link to the network architecture section instead of listing the same file twice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): replace iamlive with IAM Access Analyzer recommendation Use AWS-native IAM Access Analyzer policy generation instead of third-party tooling for iterative policy tightening. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove sub section * fix(docs): add generated Starlight mirrors for new and modified docs The sync-starlight.mjs script generates mirror files under docs/src/content/docs/ from source docs. These generated files were missing from prior commits, causing the CI mutation check to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add docs-sync pre-commit hook and strengthen agentic instructions The PR#46 build failed because Starlight mirror files under docs/src/content/docs/ were not regenerated after editing source docs. The pre-commit hooks had no step to catch this locally. - Add `docs-sync` pre-commit hook that auto-runs sync-starlight.mjs and stages the generated mirrors when docs sources change - Strengthen AGENTS.md boundary and common-mistakes sections to explicitly warn that CI rejects stale mirrors and name the exact command to regenerate them Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): correct session timeout and concurrency defaults in COST_MODEL - Session timeout: 8 hours → 9 hours (matches task-orchestrator.ts:173) - Concurrency limit: 2 → 3 (matches task-orchestrator.ts:163 default) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: gitignore Claude Code plugin artifacts (.mcp.json, .remember/) Prevents local plugin state from the remember and MCP plugins from being tracked in version control. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): add X-Ray resource policy prerequisite and build credential notes On a fresh AWS account, `aws xray update-trace-segment-destination` fails with AccessDeniedException because X-Ray needs a CloudWatch Logs resource policy before it can write spans. Added the prerequisite `aws logs put-resource-policy` command to Quick Start Step 3. Also documented that `mise run build` requires AWS credentials with ec2:DescribeAvailabilityZones for CDK synthesis, and added common error table entries for the X-Ray, build credential, and non-TTY deploy issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(plugin): add X-Ray resource policy to /setup and least-privilege ref to /deploy The /setup skill's Phase 3 only ran `aws xray update-trace-segment-destination` which fails with AccessDeniedException on fresh accounts. Added the prerequisite `aws logs put-resource-policy` command. Added a "Least-Privilege Deployment" section to the /deploy skill linking to DEPLOYMENT_ROLES.md with the re-bootstrap command for scoped execution policies. Updated CLAUDE.md to reference the abca-plugin and its available skills so Claude Code sessions discover the guided workflows without requiring --plugin-dir. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): replace IaCRole-ABCA with validated 3-way policy split Replace the single monolithic IAM policy (which exceeded the 6,144-char IAM managed policy limit) with three validated policies: - IaCRole-ABCA-Infrastructure (CFN, IAM, VPC, DNS Firewall) - IaCRole-ABCA-Application (DDB, Lambda, APIGW, Cognito, WAF, EB, SM) - IaCRole-ABCA-Observability (Bedrock, CW, X-Ray, S3, ECR, KMS, SSM, STS) All three policies were validated against a live deployment in us-east-1 (create, update, task execution, and destroy). CloudTrail analysis found 36 additional actions beyond the initial code review, and 7 deployment iterations refined the policies. Key additions: - KMS (entirely missing from original) - lambda:InvokeFunction for AwsCustomResource - bedrock-agentcore:* (CFN handler uses internal action names) - Legacy CW Logs delivery actions for Route53 Resolver - Various Describe/List/Get actions for read-only CFN operations Updated the origin disclaimer, Resource-level permission constraints table, and ECS section to reference the Application policy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): note ECS policy fits under IAM size limit Clarify in the ECS section that adding the ECS statement to IaCRole-ABCA-Application keeps the combined policy under the 6,144-character IAM managed policy limit (4,212 of 6,144 chars). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): document SecretsManager GetRandomPassword Resource:"*" in constraints table GetRandomPassword is an account-level API with no secret ARN, so it requires Resource:"*". Document this in the Resource-level permission constraints table alongside other services that require "*". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(plugin): update /deploy skill to reference 3-way policy split The skill referenced a non-existent IaCRole-ABCA-Policy. Update to the three actual policy names (Infrastructure, Application, Observability) matching DEPLOYMENT_ROLES.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): add DEPLOYMENT_GUIDE.md Starlight mirror and sidebar entry Add explicit route mapping, mirrorMarkdownFile call, and sidebar entry so the Deployment Guide renders on the docs site and cross-doc links from COST_MODEL.md resolve correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): split SecretsManager GetRandomPassword into own statement Isolate the account-level GetRandomPassword action (which requires Resource:*) from the scoped SecretsManager statement. With ECS the Application policy is still only ~4K of the 6,144-char IAM limit, leaving ~2K headroom for future services. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): add PassedToService condition to PassRole and tightening notes Separate iam:PassRole into its own statement with iam:PassedToService condition limiting to the 7 services ABCA passes roles to. Add iterative tightening items for AttachRolePolicy (iam:PolicyARN) and CreateServiceLinkedRole (iam:AWSServiceName) conditions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): scope X-Ray resource policy, add KMS tightening, unify placeholders - Scope X-Ray resource policy Resource from * to arn:aws:logs:*:ACCOUNT_ID:log-group:aws/spans in QUICK_START.md and setup SKILL.md (item 7) - Add KMS kms:ResourceAliases tightening recommendation (item 6) - Unify placeholder to ACCOUNT_ID everywhere with substitution note (item 8) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): correct VPC endpoint cost to ~$102/mo and clarify session timeouts VPC endpoint cost was ~$50/mo (1 AZ math), actual is ~$102/mo (7 endpoints x 2 AZs x $0.01/hr x 730 hrs). Update baseline totals from ~$85-95 to ~$140-150 in COST_MODEL.md and DEPLOYMENT_GUIDE.md. Clarify the two distinct timeout limits: AgentCore 8-hour service limit vs orchestrator 9-hour executionTimeout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): add GitHubTokenSecret to SecretsManager resource scope CDK generates the GitHub token secret with construct ID hash (GitHubTokenSecret09BC4210-*), not the backgroundagent- prefix. Add this pattern to the SecretsManager statement Resource list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): add vpc-flow-logs and bedrock-agentcore to PassedToService V3 least-privilege deploy found two missing services in the iam:PassedToService condition: vpc-flow-logs.amazonaws.com (VPC Flow Log role) and bedrock-agentcore.amazonaws.com (AgentMemory service role). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>
1 parent 808b6a0 commit 3da9751

16 files changed

Lines changed: 1768 additions & 34 deletions

File tree

.gitignore

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.pre-commit-config.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,14 @@ repos:
5454
files: ^agent/.*\.py$
5555
stages: [pre-commit]
5656

57+
- id: docs-sync
58+
name: sync docs → Starlight mirrors
59+
entry: bash -lc 'cd "$(git rev-parse --show-toplevel)/docs" && node scripts/sync-starlight.mjs && git add src/content/docs/'
60+
language: system
61+
pass_filenames: false
62+
files: ^(docs/(design|guides)/.*\.md$|CONTRIBUTING\.md$)
63+
stages: [pre-commit]
64+
5765
- id: docs-astro-check
5866
name: astro check (docs)
5967
entry: bash -lc 'cd "$(git rev-parse --show-toplevel)/docs" && ./node_modules/.bin/astro check'

AGENTS.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Handler entry tests: `cdk/test/handlers/orchestrate-task.test.ts`, `create-task.
3838
### Common mistakes
3939

4040
- Editing **`docs/src/content/docs/`** instead of **`docs/guides/`** or **`docs/design/`** — content is generated; sync from sources.
41+
- Adding or editing files in **`docs/design/`** or **`docs/guides/`** without running **`cd docs && node scripts/sync-starlight.mjs`** — CI will reject ("Fail build on mutation") because the Starlight mirror files in `docs/src/content/docs/` are stale. Always commit the regenerated mirrors alongside source changes.
4142
- Changing **`cdk/.../types.ts`** without updating **`cli/src/types.ts`** — CLI and API drift.
4243
- Running raw **`jest`/`tsc`/`cdk`** from muscle memory — prefer **`mise //cdk:test`**, **`mise //cdk:compile`**, **`mise //cdk:synth`** (see [Commands you can use](#commands-you-can-use)).
4344
- **`MISE_EXPERIMENTAL=1`** — required for namespaced tasks like **`mise //cdk:build`** (see [CONTRIBUTING.md](./CONTRIBUTING.md)).
@@ -120,7 +121,7 @@ To build or test only the CLI subproject:
120121

121122
## Boundaries
122123

123-
- **Generated docs**If you change docs sources (`docs/guides/`, `docs/design/`, `CONTRIBUTING.md`), run `mise //docs:sync` or `mise //docs:build`.
124+
- **Generated docs (CI will reject if stale)**Editing files in `docs/guides/`, `docs/design/`, or `CONTRIBUTING.md` requires regenerating Starlight mirrors under `docs/src/content/docs/`. Run **`cd docs && node scripts/sync-starlight.mjs`** (fast, <1 s) or **`mise //docs:sync`**, then commit the updated mirrors alongside your source changes. The pre-commit hook `docs-sync` does this automatically when prek hooks are installed, but if you bypass hooks (e.g. `--no-verify`), CI's "Fail build on mutation" step will catch it.
124125
- **Dependencies** — Add dependencies to the owning package `package.json` (`cdk/`, `cli/`, or `docs/`), then install via workspace/root install.
125-
- **Build before commit** — Run a full build (`mise run build`) when done so tests/synth/docs/security checks stay in sync.
126+
- **Build before commit** — Run a full build (`mise run build`) when done so tests/synth/docs/security checks stay in sync. This is especially critical for docs changes — the build includes `//docs:sync` which regenerates Starlight mirrors, and CI will fail if the committed mirrors don't match what the build produces.
126127
- **Major changes** — Before modifying existing files in a major way (large refactors, new stacks, changing the agent contract), ask first.

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,3 @@
11
@AGENTS.md
2+
3+
See also [README.md](./README.md) for the Claude Code plugin (`docs/abca-plugin/`), which provides interactive guided workflows for setup, deployment, repository onboarding, task submission, and troubleshooting via `/setup`, `/deploy`, `/onboard-repo`, `/submit-task`, `/status`, and `/troubleshoot` skills. Run Claude Code with `claude --plugin-dir docs/abca-plugin` to activate it.

docs/abca-plugin/skills/deploy/SKILL.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,3 +81,16 @@ After a successful deploy, remind the user to:
8181
- Store/update the GitHub PAT in Secrets Manager if this is a fresh deployment
8282
- Onboard repositories via Blueprint constructs if needed
8383
- Run a smoke test: `curl -s -H "Authorization: $TOKEN" $API_URL/tasks`
84+
85+
## Least-Privilege Deployment
86+
87+
By default, CDK bootstrap grants `AdministratorAccess` to the CloudFormation execution role. For production or security-sensitive accounts, re-bootstrap with a scoped execution policy:
88+
89+
```bash
90+
cdk bootstrap aws://ACCOUNT/REGION \
91+
--cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Infrastructure" \
92+
--cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Application" \
93+
--cloudformation-execution-policies "arn:aws:iam::ACCOUNT:policy/IaCRole-ABCA-Observability"
94+
```
95+
96+
See `docs/design/DEPLOYMENT_ROLES.md` in the repo root for the complete least-privilege IAM policies, trust policy, runtime role inventory, and iterative tightening recommendations.

docs/abca-plugin/skills/setup/SKILL.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,11 +52,17 @@ If `mise run install` fails with "yarn: command not found", Corepack wasn't acti
5252

5353
## Phase 3: One-Time AWS Setup
5454

55+
On a fresh AWS account, X-Ray needs a CloudWatch Logs resource policy before it can write spans. Run both commands — the first creates the policy, the second sets the destination:
56+
5557
```bash
58+
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
59+
aws logs put-resource-policy \
60+
--policy-name xray-spans-policy \
61+
--policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Sid\":\"XRaySpansAccess\",\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"xray.amazonaws.com\"},\"Action\":[\"logs:PutLogEvents\",\"logs:CreateLogGroup\",\"logs:CreateLogStream\"],\"Resource\":[\"arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans\",\"arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans:*\"]}]}"
5662
aws xray update-trace-segment-destination --destination CloudWatchLogs
5763
```
5864

59-
This must be run once per AWS account before first deployment.
65+
These must be run once per AWS account before first deployment. If the `put-resource-policy` step is skipped, the `update-trace-segment-destination` command fails with `AccessDeniedException`.
6066

6167
## Phase 4: First Deployment
6268

docs/astro.config.mjs

Lines changed: 4 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/design/COST_MODEL.md

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,16 @@ These costs are incurred regardless of task volume:
1111
| Component | Estimated cost | Notes |
1212
|---|---|---|
1313
| NAT Gateway (1×) | ~$32/month | Fixed hourly cost + data processing. Single AZ (see [COMPUTE.md - Network architecture](./COMPUTE.md)). |
14-
| VPC Interface Endpoints (7×) | ~$50/month | $0.01/hr per endpoint per AZ. |
14+
| VPC Interface Endpoints (7×, 2 AZs) | ~$102/month | $0.01/hr × 7 endpoints × 2 AZs × 730 hrs. |
1515
| VPC Flow Logs | ~$3/month | CloudWatch ingestion. |
1616
| DynamoDB (on-demand, idle) | ~$0/month | Pay-per-request; no cost when idle. |
1717
| CloudWatch Logs retention | ~$1–5/month | Depends on log volume. 90-day retention. |
1818
| API Gateway (idle) | ~$0/month | Pay-per-request. |
19-
| **Total baseline** | **~$85–90/month** | |
19+
| **Total baseline** | **~$140–150/month** | |
20+
21+
### Scale-to-zero characteristics
22+
23+
Most platform components are fully serverless and incur zero cost when idle: DynamoDB (PAY_PER_REQUEST), Lambda, API Gateway, ECS Fargate (cluster is free, when enabled), AgentCore Runtime (per-session), Bedrock (per-token), and Cognito (free tier). The always-on cost floor (~$140–150/month) is dominated by VPC networking infrastructure (NAT Gateway + 7 interface endpoints across 2 AZs) which is required for private subnet connectivity to AWS services and GitHub. See the [Deployment guide](../guides/DEPLOYMENT_GUIDE.md) for the full scale-to-zero breakdown.
2024

2125
## Per-task variable costs
2226

@@ -43,16 +47,16 @@ Assuming a typical task: 1–2 hours, Claude Sonnet, ~100K input tokens, ~20K ou
4347
| Model choice | 5–10× between Haiku and Opus | Default to Claude Sonnet; allow per-repo override. |
4448
| Turn count | Linear with turns | `max_turns` cap (default 100, configurable 1–500). |
4549
| Cost budget | Hard stop at budget | `max_budget_usd` cap (configurable $0.01–$100). Agent stops when budget is reached regardless of remaining turns. |
46-
| Task duration | Sub-linear (compute is cheap; tokens dominate) | 8-hour max session timeout. |
50+
| Task duration | Sub-linear (compute is cheap; tokens dominate) | AgentCore: 8-hour service limit; orchestrator: 9-hour `executionTimeout`. |
4751
| Prompt caching | 50–90% token cost reduction | Enable by default; cache system prompts and repo context. |
4852
| Concurrency | Linear with parallel tasks | Per-user and system-wide concurrency limits. |
4953

5054
## Cost at scale
5155

5256
| Scale | Tasks/month | Estimated monthly cost (infra + tasks) |
5357
|---|---|---|
54-
| Low (1 developer) | 30–60 | $150–500 |
55-
| Medium (small team) | 200–500 | $500–3,000 |
58+
| Low (1 developer) | 30–60 | $200–550 |
59+
| Medium (small team) | 200–500 | $550–3,000 |
5660
| High (org-wide) | 2,000–5,000 | $5,000–30,000 |
5761

5862
These estimates assume Claude Sonnet with prompt caching enabled and average task complexity.
@@ -72,8 +76,8 @@ For multi-user deployments, cost should be attributable to individual users and
7276
|---|---|---|
7377
| Turn limit | `max_turns` per task | 100 |
7478
| Cost budget | `max_budget_usd` per task | None (unlimited) |
75-
| Session timeout | Orchestrator timeout | 8 hours |
76-
| Concurrency limit | Per-user atomic counter | 2 concurrent tasks |
79+
| Session timeout | Orchestrator timeout | 9 hours |
80+
| Concurrency limit | Per-user atomic counter | 3 concurrent tasks |
7781
| System concurrency | System-wide counter | Account-level AgentCore quota |
7882

7983
## Additional guardrails
@@ -85,7 +89,8 @@ For multi-user deployments, cost should be attributable to individual users and
8589

8690
## Reference
8791

88-
- [COMPUTE.md - Network architecture](./COMPUTE.md) - VPC infrastructure cost breakdown.
89-
- [ORCHESTRATOR.md](./ORCHESTRATOR.md) - Polling cost analysis.
90-
- [COMPUTE.md](./COMPUTE.md) - Compute option billing models.
91-
- [OBSERVABILITY.md](./OBSERVABILITY.md) - Cost-related metrics (`agent.cost_usd`, token usage).
92+
- [COMPUTE.md](./COMPUTE.md) -- Compute option billing models and network architecture.
93+
- [ORCHESTRATOR.md](./ORCHESTRATOR.md) -- Polling cost analysis.
94+
- [OBSERVABILITY.md](./OBSERVABILITY.md) -- Cost-related metrics (`agent.cost_usd`, token usage).
95+
- [Deployment guide](../guides/DEPLOYMENT_GUIDE.md) -- Deployment choices, scale-to-zero analysis, AWS services inventory.
96+
- [DEPLOYMENT_ROLES.md](./DEPLOYMENT_ROLES.md) -- Least-privilege IAM policies for deployment.

0 commit comments

Comments
 (0)