Skip to content

Commit 56b178d

Browse files
committed
merge: PR #614 — agent purchase type within hermes (rc13 fixes included)
2 parents 04bebbc + c07c441 commit 56b178d

32 files changed

Lines changed: 1018 additions & 702 deletions

.agents/skills/obol-stack-dev/references/release-smoke-debugging.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,21 @@ When buying from external x402 sellers (sellers running outside our k3d cluster
107107
- **Fix in repo**: `c2dddc1` — added module-level `USER_AGENT = os.environ.get("OBOL_BUYER_USER_AGENT", "obol-buy-x402/1.0 (+https://github.com/ObolNetwork/obol-stack)")` to `internal/embed/skills/buy-x402/scripts/buy.py`, applied in `_probe_endpoint` (kind=http), `_probe_endpoint` (kind=inference), and the paid `X-PAYMENT` request in `buy_paid_oneshot`. Tested four UAs against v1337 (`curl/*`, generic `Mozilla/*`, `Chrome/*`, custom `obol-buy-x402/*`) — all four returned 402 cleanly. The fix is "send anything that isn't `Python-urllib`", not "send a specific browser UA". Operator override: `OBOL_BUYER_USER_AGENT`.
108108
- **Follow-up (not yet confirmed)**: the same WAF block likely affects the Go-side controller probe at `internal/serviceoffercontroller/purchase.go:183`, since Go's `http.Client` defaults to `User-Agent: Go-http-client/1.1`. Verify against v1337 and apply the same UA override on the Go side if reproduced.
109109

110+
### 11. "0 spent / N remaining" from the sidecar is NOT proof no debit happened
111+
112+
The buyer sidecar's `/status` (and `PurchaseRequest.status`, and verifier logs) all report the **same local view** — they will agree with each other even when the chain disagrees with all three. rc13 mainnet OBOL self-test (2026-06-09) caught a 0.001 OBOL on-chain debit from a request that returned `HTTP 503 "Payment settlement failed"` while every signal the stack produced said "nothing was paid." The facilitator submitted the Permit2 settle tx, it mined successfully (`0xb5122d818a058e8bf529380260fa2584ba3d50bfc800f1e906faca34d3932307`), and **then** the facilitator's post-submit step returned 500.
113+
114+
- **Fix in repo (this branch)**:
115+
- Verifier preserves the facilitator's `transaction` field via `X-PAYMENT-RESPONSE` even on a non-200 `/settle` (`internal/x402/forwardauth.go` + `TestForwardAuth_SettleErrorPreservesTxHashInHeader`).
116+
- Buyer sidecar treats any error response (>= 400) with `X-PAYMENT-RESPONSE.transaction != ""` as **spent on-chain**: `ConfirmSpend` the held auth, fire `OnPaymentUnsettled`, log the hash (`internal/x402/buyer/proxy.go` + `TestProxy_UpstreamErrorWithTxHash_PersistsConsume`).
117+
- `buy.py` `_print_paid_request_failure` prints `⚠️ SETTLEMENT MAY HAVE COMPLETED ON-CHAIN` with the tx hash + the exact `balance --chain <X>` command when a paid call fails (>= 400) with a settle header.
118+
- **Not yet fixed (follow-up PRs)**:
119+
- Verifier doing a receipt lookup against eRPC before returning 200 vs 5xx (would let the verifier serve the upstream response if settle landed on-chain).
120+
- Settle idempotency on retry (today guarded only by Permit2 nonce reuse reverting on-chain, which burns gas).
121+
- Facilitator-side: why does mainnet OBOL `/settle` return 500 *after* a successful submit? That's the hosted service (`x402.gcp.obol.tech`), not in this repo.
122+
- **Operator debugging recipe** (when a buyer-reported "0 spent" disagrees with a suspected debit): see `docs/observability.md` § "Verify settlement against the chain, never the sidecar snapshot" — has the exact `eth_getLogs` curl to confirm.
123+
- **Rule of thumb**: chain is canonical, sidecar status is a derived snapshot. The CRD itself documents this (`PurchaseRequest.status` is the controller's last reconciled snapshot, not a live counter — `CLAUDE.md` "Quick full-cycle smoke test"). For real-time auth pool state, always query the sidecar `/status`; for real-money truth, always query the chain.
124+
110125
## Diagnostic Patterns
111126

112127
- **Don't confuse 503 with "verifier broken"** — almost always one of #1, #2, #5, #6, or a missing CA bundle (`paid-flows.md`).

docs/observability.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,77 @@ Prometheus to answer a lifetime question, you've picked the wrong tool.
181181

182182
---
183183

184+
## Verify settlement against the chain, never the sidecar snapshot
185+
186+
The same "chain is canonical, metrics are derived" rule applies to **live
187+
debugging**, not just lifetime aggregates. When a paid request errors and
188+
the buyer reports `remaining=N, spent=0`, that is **not** evidence that no
189+
money moved — it is evidence that the sidecar's local counter, the
190+
`PurchaseRequest.status` snapshot, and the verifier's logs all agree with each
191+
other. The on-chain transfer event can still tell you otherwise.
192+
193+
**This is not theoretical.** The rc13 mainnet OBOL self-test (2026-06-09)
194+
recorded a 0.001 OBOL on-chain debit from a request that 503'd with
195+
`"Payment settlement failed"`, while the buyer sidecar reported `0 spent / 2
196+
remaining` and the verifier logged `facilitator settle failed (500)`. The
197+
failure happened in the facilitator's post-submit step — the Permit2 settle
198+
tx had already mined. By every signal the stack gave the operator, nothing
199+
happened. The chain disagreed.
200+
201+
The defenses that landed (this PR):
202+
203+
- **Verifier**: when `facilitatorSettle` returns non-200 with a parseable body
204+
that includes a `transaction` field, the tx hash is surfaced via
205+
`X-PAYMENT-RESPONSE` *before* the 503 is written. Without this the on-chain
206+
hash is invisible to the buyer. See `internal/x402/forwardauth.go` and
207+
`TestForwardAuth_SettleErrorPreservesTxHashInHeader`.
208+
- **Buyer sidecar**: any error response (>= 400) with `X-PAYMENT-RESPONSE` carrying a tx hash is
209+
treated as "spent on-chain" — the held auth is `ConfirmSpend`-ed (not
210+
released back to the pool), `OnPaymentUnsettled` fires, and the operator
211+
warning logs the hash. See `internal/x402/buyer/proxy.go` and
212+
`TestProxy_UpstreamErrorWithTxHash_PersistsConsume`.
213+
- **buy.py CLI**: `_print_paid_request_failure` decodes the settle header on
214+
any failed paid call (>= 400) and prints a loud `⚠️ SETTLEMENT MAY HAVE COMPLETED ON-CHAIN` warning
215+
with the exact balance-check command.
216+
217+
The defenses that are **deferred** (and worth flagging in any future debugging
218+
session):
219+
220+
- Full receipt verification (verifier queries an RPC for the receipt status
221+
before deciding 200 vs 503). The forensic fix surfaces enough for an
222+
operator to reconcile manually; programmatic reconciliation is a bigger
223+
plumbing change.
224+
- Settle idempotency on retry (today guarded only by Permit2 nonce reuse
225+
reverting on-chain — that surfaces as cascading 503s, but burns gas).
226+
- Facilitator-side fix for the 500-after-on-chain-submit failure mode on
227+
mainnet OBOL specifically. That's a hosted-service bug, not in this repo.
228+
229+
**Debugging checklist**, when a buyer-reported "0 spent" disagrees with a
230+
suspected debit:
231+
232+
```bash
233+
RPC=https://ethereum-rpc.publicnode.com
234+
blk=$(curl -s -X POST $RPC -H 'content-type: application/json' \
235+
-d '{"jsonrpc":"2.0","id":1,"method":"eth_blockNumber","params":[]}' \
236+
| python3 -c "import json,sys;print(int(json.load(sys.stdin)['result'],16))")
237+
from=$(printf '0x%x' $((blk-50000)))
238+
# Topic 0 = ERC-20 Transfer(address,address,uint256)
239+
# Topic 2 = recipient (32-byte left-padded)
240+
PAD=000000000000000000000000<RECIPIENT_HEX_NO_0x>
241+
curl -s -X POST $RPC -H 'content-type: application/json' -d "{
242+
\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"eth_getLogs\",\"params\":[{
243+
\"fromBlock\":\"$from\",\"toBlock\":\"latest\",
244+
\"address\":\"<TOKEN_CONTRACT>\",
245+
\"topics\":[\"0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef\",null,\"0x$PAD\"]
246+
}]}"
247+
```
248+
249+
A `Transfer` to the expected recipient that exists while the buyer reports
250+
`0 spent` is the bug. Reference: rc13 mainnet OBOL tx
251+
[`0xb5122d818a058e8bf529380260fa2584ba3d50bfc800f1e906faca34d3932307`](https://etherscan.io/tx/0xb5122d818a058e8bf529380260fa2584ba3d50bfc800f1e906faca34d3932307).
252+
253+
---
254+
184255
## Recording rule conventions
185256

186257
Naming follows the standard Prometheus pattern:

flows/flow-16-sell-agent.sh

Lines changed: 67 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,20 @@ else
5050
fail "Skills mismatch: got=$got want=$expected"
5151
fi
5252

53+
# §1.1.1: .no-bundled-skills marker landed on the host PVC path.
54+
# SeedHostFiles writes it; the cluster mounts HostHomePath into the pod via
55+
# the data PVC. Without the marker, Hermes' sync_skills() seeds ~24 stock
56+
# categories (~1 MB of SKILL.md text) into /data/.hermes/skills on every
57+
# launch — see internal/agentcrd/agent_contract_integration_test.go for the
58+
# v2026.5.28 → v2026.6.5 root cause.
59+
step ".no-bundled-skills marker present on host PVC"
60+
marker_file="$host_root/.no-bundled-skills"
61+
if [ -f "$marker_file" ]; then
62+
pass ".no-bundled-skills marker present at $marker_file"
63+
else
64+
fail ".no-bundled-skills marker missing at $marker_file — Hermes will re-seed bundled skills on every launch"
65+
fi
66+
5367
# §1.2: Agent CR observable
5468
step "kubectl get agent $AGENT_NAME -n $AGENT_NS"
5569
phase=$("$OBOL" kubectl get agent "$AGENT_NAME" -n "$AGENT_NS" \
@@ -69,19 +83,62 @@ case "$phase" in
6983
;;
7084
esac
7185

72-
# §1.3: Hermes pod check
73-
step "Hermes pod running in $AGENT_NS"
74-
pod_phase=""
86+
# §1.3: Hermes pod check — gate on the Ready condition, not phase. Phase
87+
# flips to Running while containers are still booting; §1.3.1 execs into the
88+
# pod and would false-pass on an empty skills dir mid-boot.
89+
step "Hermes pod ready in $AGENT_NS"
90+
pod_ready=""
7591
for i in $(seq 1 30); do
76-
pod_phase=$("$OBOL" kubectl get pods -n "$AGENT_NS" -l app.kubernetes.io/name=hermes \
77-
-o jsonpath='{.items[0].status.phase}' 2>/dev/null || true)
78-
[ "$pod_phase" = "Running" ] && break
92+
pod_ready=$("$OBOL" kubectl get pods -n "$AGENT_NS" -l app.kubernetes.io/name=hermes \
93+
-o jsonpath='{.items[0].status.conditions[?(@.type=="Ready")].status}' 2>/dev/null || true)
94+
[ "$pod_ready" = "True" ] && break
7995
sleep 4
8096
done
81-
if [ "$pod_phase" = "Running" ]; then
82-
pass "Hermes pod Running"
97+
if [ "$pod_ready" = "True" ]; then
98+
pass "Hermes pod Ready"
99+
else
100+
fail "Hermes pod did not become Ready within 120s (ready=$pod_ready)"
101+
fi
102+
103+
# §1.3.1: Bundled-skills contract — the marker on the host PVC was honored
104+
# inside the pod. This is the load-bearing CI check that should have caught
105+
# the v2026.5.28 bundled-skills bloat ($1 MB SKILL.md text re-seeded on every
106+
# launch by sync_skills(), which ignored the marker before v2026.6.5 / commit
107+
# 2ed96372a "blank-slate skills"). We assert the same two halves the Go
108+
# integration test does (internal/agentcrd/agent_contract_integration_test.go):
109+
# (a) /data/.hermes/obol-skills is populated with the operator-chosen subset
110+
# — proof our seeding path worked.
111+
# (b) /data/.hermes/skills (the native bundled-skills dir) is absent or empty
112+
# — proof Hermes honored the marker and did NOT re-seed bundled skills.
113+
HERMES_POD=$("$OBOL" kubectl get pods -n "$AGENT_NS" -l app.kubernetes.io/name=hermes \
114+
-o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)
115+
116+
step "Pod-side obol-skills external_dirs populated"
117+
if [ -n "$HERMES_POD" ]; then
118+
ext_out=$("$OBOL" kubectl exec -n "$AGENT_NS" "$HERMES_POD" -c hermes -- \
119+
ls -A /data/.hermes/obol-skills 2>/dev/null || true)
120+
if [ -n "$(echo "$ext_out" | tr -d '[:space:]')" ]; then
121+
pass "obol-skills dir in pod = $(echo "$ext_out" | tr '\n' ',' | sed 's/,$//')"
122+
else
123+
fail "obol-skills external_dirs is empty in pod — operator subset did not land"
124+
fi
125+
else
126+
fail "Hermes pod name not resolvable in $AGENT_NS — cannot assert pod-side skills"
127+
fi
128+
129+
step "Pod-side bundled-skills dir is absent or empty"
130+
if [ -n "$HERMES_POD" ]; then
131+
# Tolerate "No such file or directory" — that's the strongest signal the
132+
# marker was honored. Any non-empty listing is a contract violation.
133+
bundled_out=$("$OBOL" kubectl exec -n "$AGENT_NS" "$HERMES_POD" -c hermes -- \
134+
sh -c 'ls -A /data/.hermes/skills 2>/dev/null || true' 2>/dev/null || true)
135+
if [ -z "$(echo "$bundled_out" | tr -d '[:space:]')" ]; then
136+
pass "/data/.hermes/skills absent/empty — bundled-skill seeding skipped"
137+
else
138+
fail "/data/.hermes/skills is non-empty (Hermes re-seeded bundled skills despite the marker); contents: $(echo "$bundled_out" | head -c 200)"
139+
fi
83140
else
84-
fail "Hermes pod did not reach Running within 120s (phase=$pod_phase)"
141+
fail "Hermes pod name not resolvable in $AGENT_NS — cannot assert bundled-skills empty"
85142
fi
86143

87144
# §1.4: Remote-signer pod (only when wallet was requested)
@@ -189,4 +246,4 @@ if [ "${FLOW_CLEANUP:-0}" = "1" ]; then
189246
"$OBOL" agent delete --force "$AGENT_NAME" >/dev/null 2>&1 || true
190247
fi
191248

192-
summary
249+
emit_metrics

flows/release-smoke.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,13 @@ main() {
222222
"$SCRIPT_DIR/flow-10-anvil-facilitator.sh"
223223
"$SCRIPT_DIR/flow-08-buy.sh"
224224
"$SCRIPT_DIR/flow-09-lifecycle.sh"
225+
# flow-16 declares a CRD sub-agent and gates it via `obol sell agent`,
226+
# asserting (among other things) that the `.no-bundled-skills` marker
227+
# is honored inside the Hermes pod — the contract that v2026.5.28
228+
# silently broke (~100k tokens of bundled-skill bloat re-seeded on
229+
# every launch, undetected because this flow wasn't in CI). Adding it
230+
# here is follow-up #2 from the rc13 report PR.
231+
"$SCRIPT_DIR/flow-16-sell-agent.sh"
225232
)
226233

227234
for flow in "${flows[@]}"; do

internal/agentcrd/agent_contract_integration_test.go

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,20 @@ import (
2121
// The unit tests in agent_test.go and serviceoffercontroller/agent_render_test.go
2222
// only prove that we *render* the `.no-bundled-skills` marker and the capped
2323
// hermes-config keys. They do NOT prove the Hermes image
24-
// (nousresearch/hermes-agent:v2026.5.28) actually honors them. This test closes
25-
// that gap end-to-end against a LIVE cluster:
24+
// (nousresearch/hermes-agent:v2026.6.5) actually honors them. v2026.5.28
25+
// shipped the marker check on the install/CLI path only; the per-launch
26+
// sync_skills() call ignored it and re-seeded ~24 categories from the
27+
// image-baked /opt/hermes/skills source on every boot, regardless of the
28+
// marker. v2026.6.5 (commit 2ed96372a, "blank-slate skills") added the marker
29+
// check to sync_skills() itself (tools/skills_sync.py:467), so the in-cluster
30+
// contract is honored end-to-end. This test closes that gap on a LIVE cluster.
31+
//
32+
// CI coverage: flow-16-sell-agent.sh in release-smoke asserts the same
33+
// pod-side bundled-skills-empty + marker-present invariants in bash. The
34+
// two are belt-and-braces — flow-16 catches regressions in CI, this Go test
35+
// is the developer-runnable white-box version. A Hermes image bump should
36+
// re-run both before merging (the Renovate package rule for
37+
// nousresearch/hermes-agent in renovate.json reminds the reviewer):
2638
//
2739
// (1) the .no-bundled-skills marker exists on the agent's host PVC path
2840
// (agentcrd.HostNoBundledSkillsMarkerPath), so Hermes' installer/sync

internal/buy/discover.go

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -222,10 +222,13 @@ func PickCatalogEntry(entries []CatalogEntry, sellerURL string) (*CatalogEntry,
222222
return nil, err
223223
}
224224

225-
var inferenceOnly []CatalogEntry
225+
var inferenceOnly, agentOnly []CatalogEntry
226226
for _, e := range entries {
227-
if strings.EqualFold(strings.TrimSpace(e.Type), "inference") {
227+
switch {
228+
case strings.EqualFold(strings.TrimSpace(e.Type), "inference"):
228229
inferenceOnly = append(inferenceOnly, e)
230+
case strings.EqualFold(strings.TrimSpace(e.Type), "agent"):
231+
agentOnly = append(agentOnly, e)
229232
}
230233
}
231234

@@ -234,6 +237,12 @@ func PickCatalogEntry(entries []CatalogEntry, sellerURL string) (*CatalogEntry,
234237
for i, e := range entries {
235238
if endpointPath(e.Endpoint) == wantPath {
236239
if !strings.EqualFold(strings.TrimSpace(e.Type), "inference") {
240+
if strings.EqualFold(strings.TrimSpace(e.Type), "agent") {
241+
return nil, fmt.Errorf(
242+
"seller offer %q has type=agent; `obol buy inference` only supports type=inference.\n%s",
243+
e.Name, payAgentHint(e.Endpoint),
244+
)
245+
}
237246
return nil, fmt.Errorf("seller offer %q has type=%q; obol buy inference only supports type=inference", e.Name, e.Type)
238247
}
239248
return &entries[i], nil
@@ -244,6 +253,18 @@ func PickCatalogEntry(entries []CatalogEntry, sellerURL string) (*CatalogEntry,
244253

245254
switch len(inferenceOnly) {
246255
case 0:
256+
// Storefront-base probe of an agent-only seller: point at pay-agent
257+
// instead of the bare type=inference refusal.
258+
if len(agentOnly) > 0 {
259+
names := make([]string, 0, len(agentOnly))
260+
for _, e := range agentOnly {
261+
names = append(names, e.Name)
262+
}
263+
return nil, fmt.Errorf(
264+
"seller advertises no inference offers (type=inference), only type=agent offers (%s).\n%s",
265+
strings.Join(names, ", "), payAgentHint(agentOnly[0].Endpoint),
266+
)
267+
}
247268
return nil, errors.New("seller advertises no inference offers (type=inference)")
248269
case 1:
249270
e := inferenceOnly[0]
@@ -258,6 +279,20 @@ func PickCatalogEntry(entries []CatalogEntry, sellerURL string) (*CatalogEntry,
258279
}
259280
}
260281

282+
// payAgentHint renders the canonical pointer for type=agent offers: they are
283+
// bought with the buy-x402 skill's `pay-agent` command, which streams the
284+
// response directly to the calling agent (memory, tool-call traces, partial
285+
// results) instead of pushing it behind LiteLLM as a paid alias.
286+
func payAgentHint(endpoint string) string {
287+
return fmt.Sprintf(
288+
"For type=agent offers use the buy-x402 skill's `pay-agent` command instead — it streams the response\n"+
289+
"directly to the calling agent (memory, tool-call traces, partial results) without pushing it behind\n"+
290+
"LiteLLM as a paid alias:\n"+
291+
" python3 ${OBOL_SKILLS_DIR:-/data/.openclaw/skills}/buy-x402/scripts/buy.py pay-agent %s --model <model> --message '<prompt>'",
292+
endpoint,
293+
)
294+
}
295+
261296
// VerifyAgentID returns nil iff at least one of reg.Registrations matches the
262297
// expected ERC-8004 tokenId. The seller may publish multiple registrations
263298
// (one per chain); a match on any of them is sufficient.

internal/embed/embed_crd_test.go

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -819,7 +819,10 @@ func TestX402VerifierImage_CarriesAgentAuthFix(t *testing.T) {
819819
t.Fatalf("ReadInfrastructureFile: %v", err)
820820
}
821821

822-
const ref = "ghcr.io/obolnetwork/x402-verifier:46e63fd@sha256:a8cd7946884c9a702b5cfcfad28d1f5eac1037899303eb4e0157e3ffab7a572c"
822+
// Bumped to 04bebbc (current main HEAD as of rc13) to also carry ab71481
823+
// (suppress verifyOnly=false warning on the in-process settle path). The
824+
// agent upstream auth fix from abfd55a remains in scope.
825+
const ref = "ghcr.io/obolnetwork/x402-verifier:04bebbc@sha256:a80f72c89341a422724ad1b5d5d5da0c8cdd246b9dcabc6560e369b48ed5d775"
823826
if !strings.Contains(string(data), "image: "+ref) {
824827
t.Fatalf("x402-verifier image must carry agent upstream auth fix: %s", ref)
825828
}
@@ -841,7 +844,7 @@ func TestServiceOfferControllerImage_CarriesSecretCreateOnlyFix(t *testing.T) {
841844
t.Fatalf("ReadInfrastructureFile: %v", err)
842845
}
843846

844-
const ref = "ghcr.io/obolnetwork/serviceoffer-controller:b39bcaa@sha256:f5afbba041f83c52c1d48c61db443138da76a12afed0bd29ba719984fc73b189"
847+
const ref = "ghcr.io/obolnetwork/serviceoffer-controller:04bebbc@sha256:286d07604c001006d54a5f89ef854210ab805859c072e7b8dd89fe0c6f130d7d"
845848
if !strings.Contains(string(data), "image: "+ref) {
846849
t.Fatalf("serviceoffer-controller image must carry the Secret-create-only reconciler fix "+
847850
"(else per-agent provisioning 403s under the no-update/patch Secret RBAC): %s", ref)

0 commit comments

Comments
 (0)