Skip to content

Commit 9a902f0

Browse files
committed
merge: PR #610 — kubelet-owned volume permissions + paid-smoke hardening
2 parents 56b178d + aef2799 commit 9a902f0

32 files changed

Lines changed: 929 additions & 916 deletions

.agents/skills/obol-stack-dev/SKILL.md

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
name: obol-stack-dev
3-
description: Obol Stack development and QA runbook. Use when working on obol-stack flows, x402 seller/buyer tests, live Base Sepolia OBOL smoke, Anvil fork regressions, ERC-8004 registration, LiteLLM paid routing, release-smoke, cloudflared, Renovate image bumps, or remote QA worktrees.
3+
description: CLI-first Obol Stack development and QA runbook. Use when working on obol-stack lifecycle, obol CLI surfaces, x402 seller/buyer tests, live Base Sepolia OBOL smoke, Anvil fork regressions, ERC-8004 registration, LiteLLM paid routing, release-smoke, cloudflared, Renovate image bumps, or remote QA worktrees.
44
metadata:
55
version: "3.1.0"
66
domain: infrastructure
@@ -12,6 +12,19 @@ metadata:
1212

1313
Operational router. Load only the reference for the task. **Do not delegate understanding** — read the relevant reference yourself; subagents lose context the next reference would have given them.
1414

15+
## CLI-First Rule
16+
17+
Prefer the supported `obol` CLI surface for every stack operation: `obol stack`,
18+
`obol model`, `obol agent`, `obol sell`, `obol buy`, `obol network`, `obol
19+
tunnel`, and `obol kubectl` for inspection. Do not create ad-hoc shell scripts
20+
or verifier scripts. Do not bypass CLI behavior with direct script execution,
21+
raw `kubectl` mutation, or ConfigMap surgery when an `obol` command exists.
22+
23+
Existing repository flow scripts are release-gate artifacts, not the default QA
24+
interface. Use them only when the user explicitly asks for release-smoke/full
25+
flow validation or when no CLI equivalent exists; otherwise decompose checks
26+
into CLI commands plus `obol kubectl` evidence.
27+
1528
## Reference Router
1629

1730
| Need | Read |
@@ -28,11 +41,11 @@ Operational router. Load only the reference for the task. **Do not delegate unde
2841
## First Actions on Any Task
2942

3043
1. Read existing files before changing anything.
31-
2. Use repo flows/helpers; don't invent ad-hoc scripts.
44+
2. Use `obol` CLI and `obol kubectl` first; do not invent or run ad-hoc scripts.
3245
3. New worktree per remote QA run.
3346
4. Never write hostnames, personal paths, passwords, private keys, or raw tokens into skill files, PR text, or commit messages.
3447
5. Validate with the narrowest command set that covers the change.
35-
6. On a dev branch (anything not at the latest release tag), set `OBOL_DEVELOPMENT=true` for `obolup.sh` and `obol stack up`. Without it, `obolup.sh` downloads the released binary and your branch changes never run. Replace the `go run` wrapper with a real binary before running flows (`go build -o .workspace/bin/obol ./cmd/obol`) — backgrounded port-forwards in flows false-FAIL if the wrapper is recompiling.
48+
6. On a dev branch (anything not at the latest release tag), set `OBOL_DEVELOPMENT=true` for `obolup.sh` and `obol stack up`. Without it, `obolup.sh` downloads the released binary and your branch changes never run. Replace the `go run` wrapper with a real binary before long QA (`go build -o .workspace/bin/obol ./cmd/obol`) so every CLI call uses the same branch build.
3649

3750
## Critical Invariants
3851

@@ -42,11 +55,11 @@ OBOL_TOKEN_BASE_SEPOLIA=0x0a09371a8b011d5110656ceBCc70603e53FD2c78
4255
# Source of truth: ObolNetwork/obol-stack#447
4356
```
4457

45-
**Buyer wallet (Bob)**: deterministic 2nd-derived key from `.env REMOTE_SIGNER_PRIVATE_KEY`. Flows 11/13/14 must pre-seed Bob's remote-signer before Bob's `stack up`, then assert `bobSigner == BOB_WALLET`. **Do not** transfer funds to a generated signer to make the test pass.
58+
**Buyer wallet (Bob)**: deterministic 2nd-derived key from `.env REMOTE_SIGNER_PRIVATE_KEY`. Flows 11/13/14 must seed Bob's remote-signer through `obol wallet import` after Bob's stack and LLM route are up, then assert `bobSigner == BOB_WALLET`. **Do not** transfer funds to a generated signer to make the test pass.
4659

4760
**Token/auth**: use `obol agent auth --runtime <runtime> obol-agent`. **Never** `obol hermes token obol-agent` — it can print CLI usage text and poison the Bearer token.
4861

49-
**Payment assertion**: don't bypass the agent buy step with a direct script exec. If the agent times out, diagnose Hermes/LiteLLM/model routing — don't relax the assertion. Required evidence: `PurchaseRequest Ready=True` + paid HTTP 200 + on-chain `Transfer` + exact balance deltas.
62+
**Payment assertion**: don't bypass the CLI/agent buy path with direct script exec. Prefer `obol buy inference`, `obol sell ...`, `obol agent ...`, and `obol kubectl` status checks. If the agent times out, diagnose Hermes/LiteLLM/model routing — don't relax the assertion. Required evidence: `PurchaseRequest Ready=True` + paid HTTP 200 + on-chain `Transfer` + exact balance deltas.
5063

5164
**QA LLM**: full seller/buyer QA must route Alice and Bob through `OBOL_LLM_ENDPOINT` (OpenAI-compatible vLLM or llama.cpp on the QA host). Default `OBOL_LLM_MODEL=qwen36-deep` (27B-class). The smaller `qwen36-fast` (~4B) was the previous default but flakes on the long single-shot agent-buy prompt at flow-13/14 step 46 — see the retry-wrapper rationale in `flows/lib-dual-stack.sh::agent_buy_with_retry`. Sequence: `obol model setup custom``obol model prefer` → one `obol model sync`. Local Ollama and cloud-fallback are **not** acceptable green substitutes for full-flow QA.
5265

@@ -70,7 +83,7 @@ When the smoke gate goes red, check these first — each was a multi-hour debug:
7083
| First request after fresh verifier deploy returns empty body | Traefik HTTPRoute is wired but verifier's serviceoffer-source watcher hasn't loaded the route yet. | `flows/flow-07-sell-verify.sh` + `flows/flow-08-buy.sh` — wrap 402-body fetch in 12×5s retry loop. |
7184
| facilitator arm64 image runs amd64 binary | Was an `ObolNetwork/x402-rs` prom-overlay arm64 manifest packaging bug. | **Fixed upstream**: `ObolNetwork/x402-rs#3` (merged 2026-05-13, `668b7bb`) dropped the redundant `--platform=$BUILDPLATFORM` pin from the prom-overlay builder stage. Registry image republished; arm64 manifest now ships an aarch64 ELF (digest `sha256:b209345c…`). The `X402_FACILITATOR_SKIP_PULL` knob has been removed from `flows/lib.sh`. |
7285

73-
**Diagnosis pattern**: a 503 from the verifier or 404 from a paid route almost never means the verifier is bad — it usually means the deployed image isn't what you think it is, the chain id form mismatched, or the upstream wasn't reachable. Confirm the running image first (`kubectl get deploy -n x402 x402-verifier -o jsonpath='{.spec.template.spec.containers[*].image}'`) before diving into x402 logic.
86+
**Diagnosis pattern**: a 503 from the verifier or 404 from a paid route almost never means the verifier is bad — it usually means the deployed image isn't what you think it is, the chain id form mismatched, or the upstream wasn't reachable. Confirm the running image first (`obol kubectl get deploy -n x402 x402-verifier -o jsonpath='{.spec.template.spec.containers[*].image}'`) before diving into x402 logic.
7487

7588
## Force a Fresh Local Image Build
7689

@@ -123,7 +136,7 @@ Examples:
123136
## Pre-Push Local Checks
124137

125138
```bash
126-
bash -n flows/*.sh # shell syntax
139+
bash -n flows/*.sh # only when flow scripts changed
127140
git diff --check # whitespace/conflict markers
128141
jq empty renovate.json # JSON valid
129142
helm lint internal/embed/infrastructure/cloudflared
@@ -135,6 +148,6 @@ go test ./cmd/obol ./internal/x402/... ./internal/defaults/... -count=1 # touc
135148

136149
## Editing This Skill
137150

138-
Do: keep `SKILL.md` short and operational; one fact lives in one place; references one hop from `SKILL.md`. Inline shell snippets directly in the markdown — don't ship parallel implementations of logic that already lives in `flows/lib.sh` or `internal/...`.
151+
Do: keep `SKILL.md` short and operational; one fact lives in one place; references one hop from `SKILL.md`. Prefer CLI examples. Do not add bundled scripts or parallel implementations of logic that belongs in `obol` CLI, `flows/lib.sh`, or `internal/...`.
139152

140153
Don't: README-style prose; duplicate the same procedure in `SKILL.md` and references; bury safety constraints below examples; copy host-specific names, credentials, or logs.

.agents/skills/obol-stack-dev/references/dev.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
# Dev Environment & CLI
22

3+
## Operating Mode
4+
5+
Use the product surface first: `obol ...` commands for lifecycle and mutations,
6+
`obol kubectl ...` for Kubernetes evidence, and `curl` only for endpoint probes.
7+
Do not write custom `.sh` helpers for stack checks. Existing `flows/*.sh` are
8+
release-gate artifacts; reach for them only when the user asks for full
9+
release-smoke or a named flow regression.
10+
311
## Bootstrap
412

513
Dev mode uses `.workspace/` instead of XDG dirs. Without `OBOL_DEVELOPMENT=true`, `obolup.sh` downloads the released binary and your branch changes never run.
@@ -24,7 +32,7 @@ go build -o .workspace/bin/obol ./cmd/obol
2432
.workspace/bin/obol version
2533
```
2634

27-
**Always replace the wrapper before running flows.** `obolup.sh` with `OBOL_DEVELOPMENT=true` installs a `go run -a` wrapper at `.workspace/bin/obol`. It recompiles on every invocation; backgrounded port-forwards (e.g. `flow-06` step 15) false-FAIL because the listener isn't ready in 5–8 seconds.
35+
**Always replace the wrapper before long QA.** `obolup.sh` with `OBOL_DEVELOPMENT=true` installs a `go run -a` wrapper at `.workspace/bin/obol`. It recompiles on every invocation and can make repeated CLI calls or port-forwards look flaky.
2836

2937
```bash
3038
mv .workspace/bin/obol .workspace/bin/obol.wrapper
@@ -75,6 +83,9 @@ go build -o .workspace/bin/obol ./cmd/obol # rebuild after every code chang
7583
| passthrough | `kubectl` `helm` `helmfile` `k9s` |
7684
| meta | `update` `upgrade` `version` |
7785

86+
Use passthrough tools through `obol` (`obol kubectl ...`, `obol helm ...`) so
87+
the active stack kubeconfig and dev paths are selected consistently.
88+
7889
## `obol sell http` — easy-to-misuse flag set
7990

8091
Common mistakes: `--model`, `--pay-to`, and `--network` do **not** exist on `sell http`.

.agents/skills/obol-stack-dev/references/integration-testing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ go test -tags integration -v -run TestBDDIntegration -timeout 10m ./internal/x40
6161
|---|---|---|
6262
| `obol openclaw sync` (first deploy) | 5–15 s | 60 s |
6363
| `obol openclaw sync` (re-deploy, no changes) | 2–5 s | 60 s |
64-
| Pod startup | 10–60 s | 180 s (`kubectl wait`) |
64+
| Pod startup | 10–60 s | 180 s (`obol kubectl wait`) |
6565
| Port-forward ready | 1–10 s | 30 s |
6666
| Chat completion (Ollama) | 1–30 s | 90 s |
6767
| Chat completion (cloud) | 2–10 s | 90 s |

.agents/skills/obol-stack-dev/references/llm-routing.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,14 @@ LiteLLM lives in the `llm` namespace, port 4000. Agents and skills in other name
1818

1919
## Reaching Services from the Mac Host
2020

21-
Only routes published through Traefik are reachable at `http://obol.stack:8080/`. Everything else needs `kubectl port-forward`:
21+
Only routes published through Traefik are reachable at `http://obol.stack:8080/`. Everything else needs `obol kubectl port-forward`:
2222

2323
| Service | Access |
2424
|---|---|
2525
| Traefik ingress (frontend, eRPC, x402 routes) | `http://obol.stack:8080/...` |
26-
| LiteLLM | `kubectl port-forward svc/litellm 14000:4000 -n llm` then `http://127.0.0.1:14000` |
27-
| x402-buyer sidecar (no Service — pod only) | `kubectl port-forward -n llm <litellm-pod> 18402:8402` then `http://127.0.0.1:18402` |
28-
| OpenClaw instance | `kubectl port-forward -n openclaw-<id> svc/openclaw 18789:18789` |
26+
| LiteLLM | `obol kubectl port-forward svc/litellm 14000:4000 -n llm` then `http://127.0.0.1:14000` |
27+
| x402-buyer sidecar (no Service — pod only) | `obol kubectl port-forward -n llm <litellm-pod> 18402:8402` then `http://127.0.0.1:18402` |
28+
| OpenClaw instance | `obol kubectl port-forward -n openclaw-<id> svc/openclaw 18789:18789` |
2929

3030
`http://obol.stack:8080/v1/...` does **not** hit LiteLLM — Traefik has no `/v1` route and returns the frontend 404. The `x402-buyer` sidecar is **distroless** — no `wget`/`curl`/shell. Always port-forward, never `kubectl exec`.
3131

@@ -63,7 +63,9 @@ obol model list # confirm the custom entry is the only local model
6363
obol model status # provider state
6464
```
6565

66-
The flow scripts (`flows/lib.sh::route_llm_via_obol_cli`) wrap this exact sequence behind `OBOL_LLM_ENDPOINT` / `OBOL_LLM_MODEL` / `OBOL_LLM_API_KEY` env vars so smoke tests target a GPU host without burning host CPU on local Ollama.
66+
Release flow internals wrap this sequence behind `OBOL_LLM_ENDPOINT` /
67+
`OBOL_LLM_MODEL` / `OBOL_LLM_API_KEY`. For manual QA, run the `obol model ...`
68+
commands directly.
6769

6870
## Paid Routing (`paid/<remote-model>`)
6971

@@ -97,7 +99,7 @@ maintain # alias for `process --all`
9799

98100
### Endpoint URLs inside pods vs the Mac host
99101

100-
`obol.stack:8080` only resolves on the Mac host (via the DNS resolver). From inside any pod (buy.py, kubectl exec, anything), use the Traefik cluster-internal address:
102+
`obol.stack:8080` only resolves on the Mac host (via the DNS resolver). From inside any pod (release flow internals, `obol kubectl exec`, anything), use the Traefik cluster-internal address:
101103

102104
- Host: `http://obol.stack:8080/services/<name>/...`
103105
- In-pod: `http://traefik.traefik.svc.cluster.local/services/<name>/...`
@@ -108,4 +110,4 @@ maintain # alias for `process --all`
108110

109111
## When LiteLLM Restart is Needed (Fallback Only)
110112

111-
The validated happy path is `buy.py buy` / `process --all` / same-name top-up **without** a manual LiteLLM restart. The hot-add/hot-delete plus buyer reload normally makes `paid/<model>` appear/disappear in place. Restart only as a fallback investigation step if the route doesn't appear after the controller reconciled and the buyer reports the upstream.
113+
The validated user path is `obol buy inference` / same-name top-up **without** a manual LiteLLM restart. The embedded `buy.py` path is for release flow internals or skill debugging. The hot-add/hot-delete plus buyer reload normally makes `paid/<model>` appear/disappear in place. Restart only as a fallback investigation step if the route doesn't appear after the controller reconciled and the buyer reports the upstream.

.agents/skills/obol-stack-dev/references/paid-flows.md

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
# Paid Flows (Live OBOL + Anvil Fork)
22

3-
Use this for `flow-11`, `flow-13`, `flow-14`, release-smoke OBOL gating, and demo validation.
3+
Use this for release-smoke OBOL gating, named flow regressions, and demo
4+
validation. For ordinary QA, prefer CLI-first checks with `obol sell`, `obol
5+
buy`, `obol model`, and `obol kubectl`. Use `flows/*.sh` only when the user asks
6+
for release-smoke/full-flow validation or the regression is specifically inside
7+
a named flow.
48

59
## Flow Selection
610

@@ -23,7 +27,7 @@ Keep these explicit in the run command. Don't hide live/fork behind one selector
2327

2428
## Wallet Invariant
2529

26-
Both Alice (seller/register) and Bob (buyer) derive from a single `.env REMOTE_SIGNER_PRIVATE_KEY`. Bob is the deterministic 2nd-derived key. The flow pre-seeds Bob's remote-signer with this key **before** Bob's `stack up` and asserts `bobSigner == BOB_WALLET`.
30+
Both Alice (seller/register) and Bob (buyer) derive from a single `.env REMOTE_SIGNER_PRIVATE_KEY`. Bob is the deterministic 2nd-derived key. The flow seeds Bob's remote-signer with this key through `obol wallet import` after Bob's stack and LLM route are up, then asserts `bobSigner == BOB_WALLET`.
2731

2832
**Do not** transfer funds to a generated signer to make the test pass. Keep live OBOL funding on the deterministic Bob address. Don't infer the canonical pair from balances on older duplicate token deployments.
2933

@@ -85,15 +89,18 @@ Do **not** treat raw `X-PAYMENT` through Traefik ForwardAuth as a supported prod
8589
Fixed automatically: `obol stack up` calls `x402verifier.PopulateCABundle` after infra deploy; `obol sell http` calls it before creating the ServiceOffer. Manual repopulate:
8690

8791
```bash
88-
kubectl create configmap ca-certificates -n x402 \
92+
obol kubectl create configmap ca-certificates -n x402 \
8993
--from-file=ca-certificates.crt=/etc/ssl/cert.pem \
90-
--dry-run=client -o yaml | kubectl replace -f -
94+
--dry-run=client -o yaml | obol kubectl replace -f -
9195
```
9296

9397
## Quick Full-Cycle Smoke
9498

95-
1. **Unpaid gate**: POST seller route without `X-PAYMENT` → expect 402 + accepts requirements.
96-
2. **Buy auths**: `buy.py buy <name> --endpoint <url> --model <id> --count N` → expect PurchaseRequest `Ready` and sidecar `/status` shows `remaining > 0`.
97-
3. **Paid call**: LiteLLM request with model `paid/<remote-model>` → expect 200.
98-
4. **Spend proof**: sidecar `/status` shows `remaining −1`, `spent +1`.
99-
5. **Auto-refill**: create with `--auto-refill ...`, run `buy.py process --all`, confirm the loop only signs when live `/status` is at or below threshold.
99+
1. **Configure model**: `obol model setup custom --endpoint <url>/v1 --model <id>`; then `obol model prefer <id>` and `obol model sync`.
100+
2. **Sell**: use `obol sell inference` or `obol sell demo <type>`; wait for `obol sell status <name> -n <ns>` to show `Ready=True`.
101+
3. **Unpaid gate**: `obol sell test <name> -n <ns>`; expect HTTP 402 + accepts requirements.
102+
4. **Buy**: use `obol buy inference <seller-url> --yes --count <N>` when testing buyer flow through the CLI. Confirm `PurchaseRequest Ready=True` with `obol kubectl get purchaserequest -A`.
103+
5. **Paid call and spend proof**: call LiteLLM with `paid/<model>` and verify HTTP 200, sidecar `/status` spend counters, and on-chain balance deltas when the test is live-settlement gated.
104+
105+
Direct `buy.py` execution is reserved for existing release flow internals or
106+
debugging a bug in that embedded skill itself.

0 commit comments

Comments
 (0)