Skip to content

feat(cli): af install/run for agent nodes — encrypted secrets, env prompting, node-to-node deps#692

Open
AbirAbbas wants to merge 7 commits into
mainfrom
feat/af-node-install
Open

feat(cli): af install/run for agent nodes — encrypted secrets, env prompting, node-to-node deps#692
AbirAbbas wants to merge 7 commits into
mainfrom
feat/af-node-install

Conversation

@AbirAbbas

Copy link
Copy Markdown
Contributor

Summary

The af install / af run agent-node scaffolding has existed since the very first commit but was effectively unusable for real nodes: it required a top-level main.py (real nodes start via python -m pkg.app), hardcoded python main.py at launch, exported AGENTFIELD_SERVER_URL while the SDK reads AGENTFIELD_SERVER, and stored secrets as plaintext .env. This PR makes the flow actually work end-to-end and adds the pieces needed for day-to-day use.

What's new

  • Real-node install/run. A node is now defined by its agentfield-package.yaml manifest with an entrypoint.start (e.g. python -m pr_af.app) — no main.py required. The runner launches via the manifest entrypoint, honors the manifest healthcheck path, and exports AGENTFIELD_SERVER (+ legacy AGENTFIELD_SERVER_URL).
  • Encrypted secret store. Secrets live encrypted at rest (AES-256-GCM) under ~/.agentfield/secrets/ with a random 32-byte key in ~/.agentfield/keyring/master.key (0600). They are decrypted only into the child process' environment at start time — never written back to disk in plaintext. Global scope is shared across nodes; node scope overrides it.
  • Env prompting. On af run, required variables resolve in order: process env → node store → global store → manifest default → prompt (hidden for type: secret), persisting prompted secrets encrypted. Missing required vars in a non-interactive session produce a clean error instead of hanging.
  • af secrets command. set / ls (values masked) / rm, with --node scoping.
  • Node-to-node dependencies. A manifest may declare dependencies.nodes (e.g. af://registry/swe-plannergithub.com/Agent-Field/<name>, or a git URL). af install pulls them in recursively (skipping already-installed, which breaks cycles); af run starts a node's dependencies first, in order, before allocating its own port — and leaves already-running dependencies untouched.

Notable fixes uncovered while verifying locally

  • The CLI's install/run path runs through internal/core/services, a duplicate of the internal/packages logic — fixes are now applied there (the two layers share one ValidatePackage / ParsePackageMetadata / ShouldSkipCopy).
  • Dependencies were started after the parent allocated its port, causing a port collision; dependency startup now happens before port allocation.

Verification

Verified end-to-end against a live local control plane with two no-LLM nodes built on the real SDK:

  • af install an entrypoint-only node (no main.py) → registers → af call node.reasoner returns a real result.
  • A missing required secret errors cleanly; after af secrets set, af run injects AGENTFIELD_SERVER + the stored secret + manifest defaults into the process (confirmed via the node's own env dump; no plaintext on disk, files 0600).
  • Multi-agent: af run greeter-node auto-starts its dependency echo-node first (distinct ports), both register and execute; an already-running dependency is not restarted.

Docs

  • docs/installing-agent-nodes.md — full guide to install/run, the manifest schema, the encrypted secrets model, and af secrets.
  • cli-toolkit.md reference updated (+ embedded skill copy synced).

Test plan

  • go build ./... clean
  • full go test ./... for control-plane green (47 packages)
  • new unit tests for the secret store, env resolver, manifest validation, and node-dependency resolution / ordered-start helpers
  • manual end-to-end (single node + multi-agent auto-start) against a live control plane

Follow-ups (not in this PR)

  • Author agentfield-package.yaml manifests in the public node repos (SWE-AF, pr-af, sec-af, cloudsecurity-af, af-template).
  • af dev (a third copy of the launch logic) still hardcodes main.py; collapsing the duplicated install/run implementations into one is a good follow-up.

🤖 Generated with Claude Code

AbirAbbas and others added 6 commits June 26, 2026 10:57
…n for agent nodes

Adds the foundation for making 'af install'/'af run' usable for real agent
nodes (which start via 'python -m pkg.app' and have no top-level main.py):

- internal/packages/secrets.go: encrypted at-rest secret store. KeyfileProvider
  keeps a random 32-byte key at ~/.agentfield/keyring/master.key (0600);
  SecretStore encrypts global.enc + <node>.enc via AES-256-GCM, with node scope
  overriding global so shared keys (API tokens) are entered once.
- internal/packages/env_resolver.go: resolves declared env vars in order
  process-env -> node store -> global store -> manifest default -> prompt
  (hidden for type:secret), persisting prompted secrets encrypted. Injected only
  into the child process; never written to disk in plaintext.
- installer.go: manifest gains entrypoint{start,healthcheck}, dependencies.nodes,
  and per-var scope. Validation accepts entrypoint.start instead of requiring
  main.py; package copy excludes .git/venv/.env/__pycache__.
- runner.go: launches via manifest entrypoint, exports AGENTFIELD_SERVER (the
  var the SDK actually reads) alongside legacy AGENTFIELD_SERVER_URL, honors the
  manifest healthcheck path, and resolves env via the secret store.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- af secrets set/ls/rm manages the encrypted store (hidden input for set,
  masked listing, global + --node scopes).
- install resolves dependencies.nodes recursively (af://registry/<name>
  -> github.com/Agent-Field/<name>, or git URLs), skipping already-installed
  nodes to break cycles.
- af run brings up a node's installed node-dependencies first, in dependency
  order, with cycle protection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- docs/installing-agent-nodes.md: full guide to af install/run, the
  agentfield-package.yaml manifest (entrypoint, node deps, user_environment),
  the encrypted runtime-only secrets model, and af secrets.
- cli-toolkit.md reference: document af install, af run, af secrets (+ embedded
  skill_data copy synced).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Local end-to-end verification revealed the CLI's install/run path goes through
internal/core/services (DefaultPackageService/DefaultAgentService), which
duplicated — and so bypassed — the fixes previously made in internal/packages.
'af install' on an entrypoint-only node still failed with 'main.py not found',
and 'af run' still exported only AGENTFIELD_SERVER_URL and loaded plaintext .env.

- package_service: validate/parse/copy now delegate to the shared
  packages.ValidatePackage / ParsePackageMetadata / ShouldSkipCopy (entrypoint
  accepted, junk excluded). Install guidance points at 'af secrets set'.
- agent_service: buildProcessConfig launches via the manifest entrypoint,
  exports AGENTFIELD_SERVER, resolves env via the encrypted secret store
  (prompting for missing required), honors the manifest healthcheck path, and
  drops the plaintext .env loader. RunAgent starts node deps first with a
  threaded cycle guard.
- packages: export ValidatePackage + ShouldSkipCopy as the single source of truth.
- tests updated to the new contract (entrypoint validation, store-based env
  injection instead of .env).

Verified end-to-end: install entrypoint-only node -> missing-secret errors
cleanly -> af secrets set -> af run injects AGENTFIELD_SERVER + the stored
secret + manifest default into the process (confirmed via the node's env dump).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Local multi-agent verification showed a port collision: dependencies were
started after the parent allocated its port, so the parent's port (not yet
bound) was handed out again to a dependency, which then failed to bind. Move
dependency startup ahead of port allocation so each dependency fully binds its
own port first.

Verified end-to-end against a live local control plane: 'af run greeter-node'
auto-starts its dependency echo-node (distinct ports 8002/8003), both register,
both reasoners execute through the control plane, and an already-running
dependency is left untouched (same PID).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Unit tests for resolveNodeRef, installedNames, installNodeDependencies
(skip-already-installed), and startNodeDependencies (not-installed warning +
already-running skip) in both the service and packages layers — covering the
new patch lines and pinning the behaviors verified end-to-end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AbirAbbas AbirAbbas requested a review from a team as a code owner June 26, 2026 18:03
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 84%, aggregate ≥ 85%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface Current Baseline Δ
control-plane 86.70% 87.40% ↓ -0.70 pp 🟡
sdk-go 91.80% 92.00% ↓ -0.20 pp 🟢
sdk-python 93.87% 93.73% ↑ +0.14 pp 🟢
sdk-typescript 90.09% 90.42% ↓ -0.33 pp 🟢
web-ui 84.83% 84.79% ↑ +0.04 pp 🟡
aggregate 85.52% 85.75% ↓ -0.23 pp 🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface Touched lines Patch coverage Status
control-plane 720 67.00%
sdk-go 0 ➖ no changes
sdk-python 0 ➖ no changes
sdk-typescript 0 ➖ no changes
web-ui 0 ➖ no changes

❌ Patch gate failed

control-plane — 67.00% on 720 touched lines (231 uncovered):

File Patch coverage Missing lines
control-plane/internal/cli/secrets.go 37.6% 34, 35, 36, 45, 46, 47, 48, 49, 50, 51 …(+63 more)
control-plane/internal/packages/env_resolver.go 54.6% 53, 64, 65, 73, 74, 76, 77, 78, 80, 81 …(+34 more)
control-plane/internal/core/services/package_service.go 71.4% 73, 74, 87, 88, 92, 96, 103, 104, 105, 106 …(+8 more)
control-plane/internal/packages/secrets.go 73.4% 61, 62, 65, 66, 70, 71, 74, 75, 78, 79 …(+31 more)
control-plane/internal/packages/runner.go 76.0% 78, 79, 150, 151, 153, 203, 207, 208, 214, 215 …(+13 more)

How to fix

  1. For each file listed above, add tests that exercise the missing line numbers in this same PR.
  2. Re-run locally: ./scripts/coverage-summary.sh && ./scripts/patch-coverage-gate.sh.
  3. Do not lower min_patch in .coverage-gate.toml to silence this — the floor is the contract.

End-to-end install testing against the published node repos surfaced two gaps:

1. The git and GitHub install paths (git.go/github.go findPackageRoot) were a
   third and fourth copy of the 'main.py required' check, so 'af install
   <github-url>' failed for entrypoint-only nodes (no top-level main.py) such as
   SWE-AF and cloudsecurity-af. Both now delegate to the shared ValidatePackage
   (accepts a manifest entrypoint.start).
2. Dependency install only ran for requirements.txt projects, so pyproject-only
   nodes (pr-af, sec-af, cloudsecurity-af) installed with no venv and no deps.
   Dependency install is now a single shared InstallPythonDependencies that also
   runs 'pip install .' for pyproject.toml/setup.py projects.

Verified: all five published node repos now install from their GitHub URLs; a
pyproject node (sec-af) builds its venv and 'pip install .' succeeds, with
sec_af + agentfield importable from the node's venv. (Nodes that declare
requires-python >=3.11 need a matching interpreter on PATH — pip reports this
clearly.) Tests updated for the new validation contract; new unit tests cover
the pyproject branch and entrypoint-accepting findPackageRoot.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@santoshkumarradha

Copy link
Copy Markdown
Member

@AbirAbbas before in, can you test it with our swe/pr-af etc.. ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants