Skip to content

refactor(registry/coder/modules/claude-code): slim to install-and-configure only#859

Closed
matifali wants to merge 23 commits intomainfrom
matifali/claude-code-v5-cleanup
Closed

refactor(registry/coder/modules/claude-code): slim to install-and-configure only#859
matifali wants to merge 23 commits intomainfrom
matifali/claude-code-v5-cleanup

Conversation

@matifali
Copy link
Copy Markdown
Member

@matifali matifali commented Apr 22, 2026

Summary

Strip the claude-code module down to what its name says: install Claude Code and export environment variables. Everything opinionated, Coder-specific, or duplicative of an upstream Claude Code env var is gone. The module becomes composable with future claude-code-tasks, agentapi, and boundary modules.

Changes

New surface — 9 variables, 1 required

Variable Purpose
agent_id (required) Coder agent ID
env map(string) that fans out via for_each into one coder_env per entry. Use for any Claude Code env var (ANTHROPIC_API_KEY, CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_BASE_URL, ANTHROPIC_MODEL, CLAUDE_CODE_USE_BEDROCK, etc.) or any custom var your pre/post scripts consume. Sensitive via caller variable declarations.
claude_code_version Pin a version; forwarded to the official installer.
install_claude_code Gate the install step.
claude_binary_path Escape hatch when Claude is baked into the image.
mcp, mcp_config_remote_path MCP servers applied at user scope via claude mcp add-json --scope user.
pre_install_script, post_install_script Extension points, delegated to coder-utils.

No outputs. Zero Coder-specific env var names.

Scripts produced

Exactly one coder_script on the agent by default: Claude Code: Install Script. Pre/post scripts appear only when their respective variables are set. No start script is produced in any configuration.

Removed

  • All AgentAPI, Coder Tasks, Boundary, web-app, CLI-app plumbing.
  • task_app_id output.
  • workdir — MCP applies at user scope; project config belongs in the repo.
  • claude_api_key — renamed conceptually to env["ANTHROPIC_API_KEY"]. The old module emitted CLAUDE_API_KEY, which Claude Code does not read (verified against CLI v2.1.117 and the official env-vars docs).
  • claude_code_oauth_token, model, disable_autoupdater, claude_md_path as dedicated variables — set them via env.
  • enable_aibridge — compose with ANTHROPIC_BASE_URL + ANTHROPIC_AUTH_TOKEN in env; README has an explicit AI Gateway example (the feature formerly known as AI Bridge).
  • install_via_npm — official installer only (npm path already deprecated upstream).
  • allowed_tools / disallowed_tools — broken today (calls coder binary, not claude); write ~/.claude/settings.json permissions.allow/permissions.deny arrays via pre_install_script.
  • scripts/start.sh — no Tasks, no AgentAPI, nothing to start.

Depends on

Validation

  • 6 terraform tests pass (terraform test).
  • 10 bun tests pass against a Docker-backed container (bun test).
  • prettier, terraform fmt, shellcheck clean.
  • 9-scenario live e2e matrix passes on dev.coder.com (table below).
E2E matrix — 9 live workspaces on dev.coder.com

One workspace per scenario, each probed via the Coder API and inside the container. All workspaces and the matrix template torn down after the run.

# Scenario Result Evidence
1 minimal PASS Exactly one script: Claude Code: Install Script
2 env passthrough PASS ANTHROPIC_API_KEY, ANTHROPIC_MODEL, CUSTOM_VAR all reached the container
3 AI Gateway PASS ANTHROPIC_BASE_URL + non-empty ANTHROPIC_AUTH_TOKEN in container
4 version pin (2.0.62) PASS claude --version reports 2.0.62
5 pre-installed binary PASS Stub claude from pre_install_script used; installer skipped
6 MCP inline PASS ~/.claude.json has github MCP server at user scope
7 MCP remote PASS ~/.claude.json has filesystem MCP fetched from raw GitHub URL
8 MCP without claude PASS Guard fires; install log shows the clear error; start_error state
9 pre + post install scripts PASS Branded display names render; E2E-PRE-MARKER-*/E2E-POST-MARKER-* in logs

Notes:

  • The icon argument on coder_script is stored on the paired WorkspaceAgentLogSource (keyed by log_source_id) rather than on the script struct itself. The UI reads it from agent.log_sources[].icon and renders it next to each script tab. Verified /icon/claude.svg reached all three scripts in the API response.
  • AI Gateway docs live at docs/ai-coder/ai-gateway. Server-side endpoints and env vars still use the aibridge prefix; only the product name changed.

Migration notes

Upgrading from v4.x to v5.0.0:

  • claude_api_key = "sk-..."env = { ANTHROPIC_API_KEY = "sk-..." }. Module now emits the variable Claude Code actually reads.
  • claude_code_oauth_token = "..."env = { CLAUDE_CODE_OAUTH_TOKEN = "..." }.
  • model = "opus"env = { ANTHROPIC_MODEL = "opus" }.
  • disable_autoupdater = trueenv = { DISABLE_AUTOUPDATER = "1" }.
  • enable_aibridge = trueenv = { ANTHROPIC_BASE_URL = "${data.coder_workspace.me.access_url}/api/v2/aibridge/anthropic", ANTHROPIC_AUTH_TOKEN = data.coder_workspace_owner.me.session_token } (see the AI Gateway example in the README).
  • Using Tasks, AgentAPI, or Boundary? Switch to the upcoming dedicated modules.
  • Set workdir, install_via_npm, allowed_tools, disallowed_tools, report_tasks, ai_prompt, permission_mode, continue, resume_session_id, dangerously_skip_permissions, system_prompt, web_app, cli_app, *_display_name, icon, order, group, subdomain, install_agentapi, agentapi_version, enable_state_persistence? Remove.
  • Consumed module.claude-code.task_app_id? Read it from the Tasks module when it lands.
Implementation plan and decision log

Goals

  1. Install Claude Code via the official installer.
  2. Export env vars the caller chooses — zero opinions.
  3. Optionally apply user-scope MCP server config.

The module should not: start Claude, create a web app, orchestrate Tasks, install AgentAPI, run boundary, pre-accept onboarding, or write ~/.claude.json. Those are separate concerns with separate modules.

Key design choices

  • env as the sole env-var surface. for_each = nonsensitive(toset(keys(var.env))) lifts the key sensitivity taint so Terraform accepts them as resource instance addresses; var.env[each.key] preserves value sensitivity so secrets stay out of plan output when callers mark their variable sensitive = true.
  • No ~/.claude.json pre-write. Interactive users see the theme picker and per-folder trust dialog once — Claude Code's intended UX. Headless/Tasks pre-acceptance lives in the dedicated Tasks module.
  • MCP at user scope. claude mcp add-json --scope user writes to ~/.claude.json. Works on a clean $HOME (verified); no preset needed.
  • Script orchestration via coder-utils. One consistent $HOME/.claude-module/{install,pre_install,post_install}.log layout. Branded display names via the new display_name_prefix variable.
  • Dependency: pinned to feat(registry/coder/modules/coder-utils): make install_script and start_script optional #842 branch via git source. Switch to the 1.1.0 registry tag once published.

Verified against the real Claude CLI (v2.1.117)

  • claude mcp add-json help: -s, --scope <scope> accepts local (default), user, project.
  • User settings live in ~/.claude/settings.json; permissions are permissions.allow/permissions.deny arrays, not scalar allowedTools/disallowedTools.
  • ~/.claude.json holds user-scope MCP servers and onboarding flags; settings.json rejects those keys per official docs.

🤖 This PR was created with the help of Coder Agents, and needs a human review. 🧑‍💻

matifali added 11 commits April 22, 2026 05:38
Strip the claude-code module down to what its name says. Removes the
AgentAPI child module, Coder Tasks orchestration, process-level network
boundary, web and CLI coder_apps, the task_app_id output, and every
variable that only existed to support those paths.

The module now does three things: install Claude Code via the official
installer, wire up authentication env vars, and optionally apply
user-scope MCP server configuration. Script orchestration is delegated
to coder-utils (v1.1.0) so pre_install/install/post_install hooks have
a single, consistent layout.

Breaking changes:
- Rename claude_api_key -> anthropic_api_key. Now emits ANTHROPIC_API_KEY
  (the variable Claude Code actually reads), not CLAUDE_API_KEY.
- AI Bridge now sets ANTHROPIC_AUTH_TOKEN + ANTHROPIC_BASE_URL, matching
  the dogfood template.
- Remove workdir; MCP applies at user scope.
- Remove install_via_npm; official installer only.
- Remove allowed_tools, disallowed_tools; write ~/.claude/settings.json
  permission rules via pre_install_script instead.
- Remove task_app_id output and every Tasks/AgentAPI/Boundary variable.

coder-utils is pinned to the PR #842 branch via a git source until the
v1.1.0 tag ships on the registry.
…d wiring

Follow-up to the v5 cleanup. The module now only touches variables and
env vars that Claude Code reads natively. Everything Coder-specific is
gone.

Removed:
- enable_aibridge variable and the ANTHROPIC_AUTH_TOKEN / ANTHROPIC_BASE_URL
  coder_env resources. Template authors who want AI Bridge, Bedrock,
  Vertex, or any other custom endpoint set ANTHROPIC_BASE_URL and
  ANTHROPIC_AUTH_TOKEN themselves via coder_env, exactly as the Claude
  Code docs describe.
- claude_md_path variable and the CODER_MCP_CLAUDE_MD_PATH coder_env.
  That env var was only consumed by 'coder exp mcp configure claude-code'
  in Tasks mode. Claude Code itself discovers ~/.claude/CLAUDE.md from
  user scope automatically.
- Associated tftest.hcl cases and the aibridge-env-vars bun test.

Also fix a bug in the bun test runModuleScripts helper that produced a
malformed bash command when the env map was an empty object.
Expose a generic `env = { KEY = VALUE }` map that fans out via
`for_each` into one `coder_env` resource per entry. Template authors
can set any Claude Code env var (ANTHROPIC_BASE_URL, ANTHROPIC_MODEL,
DISABLE_AUTOUPDATER, CLAUDE_CODE_USE_BEDROCK, etc.) or any custom var
their pre/post scripts consume, without the module needing to know
about each one.

Remove the now-redundant `model`, `disable_autoupdater`, and
`claude_md_path` variables. `claude_md_path` was a Coder-specific
holdover; the other two have canonical env vars users can set through
`env` directly. The module no longer invents names it doesn't need to.

`anthropic_api_key` and `claude_code_oauth_token` stay as dedicated
sensitive variables (sensitive values can't be used as `for_each`
keys), and `env` rejects those two keys via validation to avoid
double-resource collisions.

Add a local extractCoderEnvVars helper in the bun tests that iterates
every instance of every coder_env resource. The upstream helper in
agentapi/test-util only reads instances[0], which misses every
for_each entry past the first.
Remove `anthropic_api_key` and `claude_code_oauth_token` as dedicated
sensitive variables. Template authors set them through `env` like
every other Claude Code env var. The module's job is to export what
the caller asks for. It has no opinion about which env vars matter.

`env` uses `nonsensitive(toset(keys(var.env)))` so sensitive values
can pass through without tainting the for_each keys. Callers declare
their own variables with `sensitive = true` and pipe them in.

Down to 9 variables from 11 (1 required, 8 optional).
Promote AI Bridge from a footnote in the custom-endpoints section to
its own example with the exact env block, including the data source
references for access_url and session_token. Keep the generic
custom-endpoints example below for Bedrock/Vertex/LiteLLM/proxies.
The module already covered the happy paths (install_claude_code=true
with or without Claude already present, install_claude_code=false with
Claude on PATH). It silently no-oped MCP configuration when
install_claude_code=false and Claude was nowhere to be found, which
is a confusing failure mode for template authors who forget to either
install Claude or set install_claude_code=true.

Add an explicit guard after install_claude_code_cli:
- If Claude is absent and MCP was requested: fail loudly with a
  clear message pointing at the three ways to fix it
  (install_claude_code=true, install via pre_install_script, or
  point claude_binary_path at a pre-installed binary).
- If Claude is absent and no MCP was requested: log a note and
  exit 0. The script has nothing else to do.

Two new bun tests cover these cases.
Drop the /tmp/install.sh wrapper. Previously main.tf rendered a small
bash script that base64-decoded the real install.sh into /tmp, then
ran it with ARG_* exports. coder-utils was then wrapping that wrapper
with its own base64 round-trip and mkdir+sync plumbing.

Now main.tf prepends the ARG_* exports directly in front of the
install.sh body with a local join(), and passes the combined string
straight to coder-utils. coder-utils writes it once to
$HOME/.claude-module/install.sh, runs it, logs to install.log.

One file on disk instead of two, easier to debug by cat-ing the
install.sh that actually ran.
Pass display_name_prefix = "Claude Code" and icon = "/icon/claude.svg"
through to the coder-utils module so the workspace shows
"Claude Code: Install Script" with the Claude icon instead of the
generic defaults.

Relies on the two new variables added to coder-utils on PR #842's
branch.
… defaults

The Premium feature formerly known as AI Bridge is now called AI Gateway.
Server-side endpoints and env vars still use the 'aibridge' prefix; only
the product name changed. Rework the README section accordingly:

- Section heading: 'Coder AI Bridge' -> 'Coder AI Gateway'.
- Link: docs/ai-coder/ai-bridge -> docs/ai-coder/ai-gateway.
- Add a note calling out the rename and that the endpoint prefix is
  unchanged.
- Upgrading section mentions the rename so readers coming from v4.x
  find the new example.

Also document the script surface: this module creates exactly one
coder_script by default (Claude Code: Install Script). Pre- and
post-install scripts appear only when the caller opts in. No start
script is produced in any configuration.
…t unset

Locks in the invariant that coder-utils does not create optional
coder_script resources when the caller does not pass pre_install_script
or post_install_script. A regression here would leak empty scripts into
the agent's script list in the Coder UI.
Enables pointing mcp_config_remote_path at an in-repo raw URL during
end-to-end verification. Kept minimal: one filesystem server.
@matifali matifali requested a review from 35C4n0r April 22, 2026 09:02
…og leaks

Address deep review findings on the install-script assembly layer and
the shell script itself.

- Base64-encode every ARG_* value in local.install_script instead of only
  ARG_MCP and ARG_MCP_CONFIG_REMOTE_PATH. Previously, claude_code_version
  and claude_binary_path flowed directly into a single-quoted shell
  literal; a value containing a closing quote would break out and inject
  arbitrary shell. Decoding happens inside install.sh; the encoded wire
  form is [A-Za-z0-9+/=] only.
- Reject non-https URLs in mcp_config_remote_path at plan time. Plain
  http allowed MITM on credentialed MCP configs and made SSRF to
  plaintext internal services easier.
- Stop logging ARG_MCP and ARG_MCP_CONFIG_REMOTE_PATH contents in the
  install log. Inline MCP JSON can embed credentials for MCP servers;
  log only presence and size.
- Track add-json successes and failures. If every MCP server fails to
  register, exit non-zero instead of silently passing.
- Replace 'for url in $(jq -r ...)' with a while-read loop so URLs
  with whitespace don't word-split into multiple loop iterations.
- Switch grep -q to grep -qF so paths with regex metacharacters don't
  cause false negatives in shell profile detection.
Tighten the test surface based on deep review findings.

- Rename 'happy-path' to 'install-script-runs-with-mock' and assert the
  'Skipping Claude Code installation' log line. The previous name
  implied a full install run, but setup() defaults to
  install_claude_code=false with a mock binary. Test name now matches
  what it actually checks.
- Pass coderEnvVars through runModuleScripts in env-map-passthrough so
  the script execution context sees the Terraform-declared values, not
  just the Terraform state.
- Add tftest cases asserting coder-utils script_names output: default
  surface creates only install, pre/post appear only when set, start
  never exists. Covers the claim documented in the README.
- Add tftest case asserting http:// in mcp_config_remote_path is
  rejected by the new validation.
- Extract a ResourceAttributes type alias and getStringAttr helper to
  replace three identical Record<string, unknown> casts.
- Drop the duplicate TerraformState import (already imported on line 16).
- Switch the failing MCP remote URL to an https 127.0.0.1:19999 variant
  so it still fails the fetch but passes the https-only validation.
Add a CAUTION admonition to the Upgrade section. Users who depend on
report_tasks, ai_prompt, continue, resume_session_id, web_app, cli_app,
or install_agentapi lose that surface entirely in v5.0.0. The previous
wording ('switch to the upcoming dedicated modules') was too soft for
users whose templates are live today.
Add a `scripts` output that lists the `coder exp sync` names for every
`coder_script` this module actually creates, in run order. Absent
scripts (pre/post when their inputs are unset; start is never created
by claude-code) are filtered out, so downstream consumers can use
`${join(" ", module.claude-code.scripts)}` with `coder exp sync want`
to serialize their own scripts behind Claude Code's install without
having to know which optional scripts are present.

The README's Outputs section shows the composition pattern. Future
`claude-code-tasks` or `boundary` modules can declare a dependency on
this list to run after Claude is installed.

No app_id output: claude-code v5 does not create any `coder_app`
resource. That surface belongs on the dedicated Tasks module when it
ships.
@matifali matifali self-assigned this Apr 22, 2026
The most common template-admin use case is 'make Claude just work for
an agent or headless workspace without human interaction.' The existing
README shows envs, MCP, and a one-liner pre_install_script, but never
walks through how to skip the first-run wizard and the bypass-mode
consent banner.

Add a dedicated section documenting:

- settings.json with permissions.defaultMode = bypassPermissions,
  permissions.deny allowlist, and skipDangerousModePermissionPrompt
  (verified live against Claude Code CLI v2.1.117 on a workspace).
- ~/.claude.json hasCompletedOnboarding merged via jq so installer
  keys (userID, firstStartTime, installMethod, autoUpdates,
  migrationVersion) are preserved.
- A runtime-flag alternative for one-off 'claude -p' runs.

Verified the complete example end-to-end on dev.coder.com: fresh
workspace lands with the expected settings.json keys, onboarding
skipped, and installer-managed state intact.
Review feedback on the README:

- AI Gateway description drops 'MCP policy enforcement' because it is not
  shipping yet; keeps the auditing and token usage claims that are live.
- Add a first-class AWS Bedrock example using the env map with either a
  bearer token (AWS_BEARER_TOKEN_BEDROCK) or access key pair. Mirrors what
  v4 had but composed via env, not dedicated variables.
- Add a first-class Google Vertex AI example. Requires a pre_install_script
  to drop the SA JSON and point GOOGLE_APPLICATION_CREDENTIALS at it; keep
  gcloud installation as the template author's choice.
- Clarify 'Using a pre-installed binary': claude_binary_path is only
  consulted when install_claude_code = false; the official installer
  drops the binary at $HOME/.local/bin and does not accept a destination
  override.
- Drop the 'Scripts produced' section. It restated an implementation
  detail that duplicates the Outputs section and the pre/post-install
  extension docs.
- Simplify the Unattended mode section: keep the example and runtime-flag
  alternative, drop the keys-verified table and the human-user note.
  Point at upstream Claude Code docs for canonical key definitions.
- Drop the Outputs table; keep the composition example. The type and
  description already live in the module's output block.
Replace the local `for name in [...] : name if name != ""` filter with
a one-line passthrough of `module.coder-utils.scripts`, the new
upstream output that returns the run-ordered, filtered list of
`coder exp sync` names.

The filtering logic is generic to any module that wraps coder-utils, so
it lives better there than in every consumer. claude-code just forwards
it under its own `output "scripts"` so downstream templates keep the
same surface.

Requires PR #842 on coder/registry (feat/coder-utils-optional-install-start).
…claude-code

Follow the shared convention proposed in #782 for
stable per-module persistent storage. coder-utils threads this value
through into the generated pre_install, install, and post_install
wrapper scripts, so scripts and logs now live at
$HOME/.coder-modules/claude-code/ instead of $HOME/.claude-module/.

Namespacing under $HOME/.coder-modules/ keeps $HOME from getting
polluted with one dotdir per module when a workspace uses several,
and matches what boundary, tasks, and agentapi are standardizing on.

Updates the README troubleshooting block and the 8 log-path
expectations in the bun test suite to match.

Related to #782.
Drop three items from the intro that were noise rather than signal:

- "Exports environment variables to the Coder agent." The `env` map
  has its own dedicated section a few lines down; the bullet restated
  the obvious in agent-speak that users don't care about.
- "It does not start Claude, create a web app, or orchestrate
  Tasks..." The "Upgrading from v4.x" section already explains
  the v5 scope change with concrete variable names, which is what
  returning v4 users need.
- "Declare your Terraform variable with `sensitive = true`..."
  Generic Terraform hygiene, not claude-code specific.
Rewrite phrases that leaked Terraform lifecycle or Coder internals into
plain English. The README is read by template authors who don't need
to know that each env map pair becomes a `coder_env` resource or that
validation happens "at plan time" rather than before the workspace
deploys.

Also replaces two em dashes around the "Unattended mode" intro with
parentheses, matching the no-emdash convention used in coder/coder.
…er inputs

Add four dedicated inputs that cover the most common Claude Code
configurations:

- model: sets ANTHROPIC_MODEL. Replaces writing it manually in env.
- claude_code_oauth_token: sets CLAUDE_CODE_OAUTH_TOKEN for Claude.ai
  subscription users. Marked sensitive.
- enable_ai_gateway: wires ANTHROPIC_BASE_URL to the workspace access
  URL + /api/v2/aibridge/anthropic and ANTHROPIC_AUTH_TOKEN to the
  workspace owner's session token, collapsing the 10-line manual
  setup to a single flag.
- disable_auto_updater: sets DISABLE_AUTOUPDATER=1.

The env map stays as the escape hatch for any Claude Code env var
(ANTHROPIC_API_KEY, ANTHROPIC_BASE_URL for custom proxies,
CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX, and any user-defined
vars pre/post scripts consume).

Collisions between a convenience input and the env map fail at plan
time via cross-variable validation blocks on var.env. Silent
precedence was tempting but would hide misconfigurations; a hard-fail
surfaces the duplicate so the template author decides which route
wins.

Implementation notes:

- Unconditional data.coder_workspace.me and data.coder_workspace_owner.me
  so enable_ai_gateway can read access_url + session_token without
  count-indexed access. Both are cheap metadata reads.
- locals.merged_env deterministically merges each convenience input's
  derived map with var.env. Merge order puts var.env last so a future
  change that relaxes validation falls back to user-wins rather than
  silent-convenience-wins.
- 10 new tftest runs: 4 convenience happy paths, 1 merge-with-env,
  5 collision expect_failures (one per validation block).
- README rewritten: convenience inputs lead the env-map section, the
  Claude.ai / AI Gateway / Bedrock / Vertex examples switch to the new
  inputs, and the "Upgrading from v4.x" section reflects which v4
  variables came back and which stayed out.
…ater

Match the shape of the DISABLE_AUTOUPDATER env var and keep parity
with the v4 input name. Mechanical rename across main.tf, tftest, and
README, plus the internal locals.autoupdater_env identifier. Drops
the v4 rename bullet from the upgrade notes since it is now a no-op.
@matifali
Copy link
Copy Markdown
Member Author

closing in favor of #861

@matifali matifali closed this Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant