Skip to content

fix(stack): preload openclaw image in dev k3d#352

Merged
OisinKyne merged 1 commit intofeat/x402-pre-merge-followupfrom
feature/dev-preload-openclaw-image
Apr 19, 2026
Merged

fix(stack): preload openclaw image in dev k3d#352
OisinKyne merged 1 commit intofeat/x402-pre-merge-followupfrom
feature/dev-preload-openclaw-image

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented Apr 17, 2026

Summary

Mitigates first-pull OpenClaw image failures in dev-mode k3d clusters by pre-pulling and importing the pinned OpenClaw image before the default agent bootstrap relies on it.

Why

Production-readiness testing reported a case where the first in-cluster OpenClaw pull through the k3d mirror path failed and only recovered after a manual docker pull + k3d image import. This PR automates that recovery path in development mode.

What changed

  • export openclaw.ImageRef() from the pinned local OpenClaw version
  • in dev mode, buildAndImportLocalImages() now also:
    • docker pull the pinned OpenClaw image
    • k3d image import it into the cluster

Scope

This is a mitigation for local development / test clusters. It does not claim to root-cause or repair registry cache corruption in general.

Validation

  • go test ./internal/stack -run TestDoesNotExist

@bussyjd bussyjd marked this pull request as ready for review April 19, 2026 03:45
@OisinKyne OisinKyne merged commit bf57cdc into feat/x402-pre-merge-followup Apr 19, 2026
@OisinKyne OisinKyne deleted the feature/dev-preload-openclaw-image branch April 19, 2026 20:38
OisinKyne pushed a commit that referenced this pull request Apr 19, 2026
…y warning (#345)

* feat(x402/buyer): detect 2xx without X-PAYMENT-RESPONSE and expose via metric

Post-#343, settlement moved off the Traefik ForwardAuth hop and became the
seller's responsibility. The buyer sidecar calls ConfirmSpend on any upstream
2xx regardless of whether X-PAYMENT-RESPONSE is present, so a seller that
returns 200 without settling silently consumes the payer's voucher with no
observable signal. This matches the W2/W9 gap flagged in the PR #343 review.

- Add OnPaymentUnsettled callback to replayableX402Transport. Fires exactly
  when the upstream returns 2xx but no successful X-PAYMENT-RESPONSE is
  emitted, logs a WARN, and increments a new counter.
- Add PaymentEventUnsettled event type.
- Add obol_x402_buyer_payment_unsettled_confirmations_total metric with
  upstream/remote_model labels. Operators should alert on any non-zero value.
- Pin invariant with two new tests:
  - TestProxy_UpstreamSuccessNoSettlementHeader_IncrementsUnsettledMetric
  - TestProxy_UpstreamSuccessWithSettlementHeader_DoesNotIncrementUnsettledMetric
- Pin mux symmetry invariant that both /chat/completions and
  /v1/chat/completions route identically — catches the class of regression
  that produced the PR #343 /v1 add/revert/re-add churn.

* fix(x402/forwardauth): warn on verifyOnly=false, shrink facilitator timeout to 5s

Addresses W7 and W8 from the PR #343 review.

W7 — verifyOnly=false footgun: VerifyOnly is the right name for the flag in
the in-process gateway context but is semantically load-bearing for Traefik
ForwardAuth, where the auth hop cannot observe the upstream response. If an
operator flips x402-pricing.yaml verifyOnly=false believing it enables "real"
settlement, the verifier will debit the payer before the upstream serves the
request. We cannot remove the flag without a broader refactor of
internal/inference/gateway.go, so instead:
  - NewForwardAuthMiddleware now logs a loud WARNING at construction when
    VerifyOnly=false, explaining the safe usage.
  - cmd/x402-verifier/main.go emits the same warning on startup and log-scrub
    filters will surface it.
  - ForwardAuthConfig.VerifyOnly documents the invariant ("MUST be true
    behind Traefik ForwardAuth"), so a contributor flipping it gets the
    explanation inline.

W8 — facilitator timeout: reduce http.Client.Timeout from 30s to 5s.
/verify is a cheap signature check; anything beyond 5s is a network problem
the caller should see quickly rather than having every paid request hang
for half a minute on a slow facilitator.

Tests:
- TestForwardAuth_VerifyOnlyFalse_EmitsStartupWarning pins the warning text.
- TestForwardAuth_VerifyOnlyTrue_NoStartupWarning is the negative control so
  operators don't train themselves to filter the warning out.

* chore(embed): lint :latest image tags with pin-by-digest policy

Addresses W4 from the PR #343 review. The /v1 back-and-forth on PR #343
(add → revert → re-add) was consistent with a deployed x402-buyer:latest
image lagging behind main, and the fix hardcoded /v1 in the LiteLLM template
instead of pinning the image. Same risk applies to x402-verifier and
serviceoffer-controller which also ship as :latest.

- New internal/embed/embed_image_pin_test.go scans every embedded template
  and fails when a new :latest appears without an allowlist entry. The
  allowlist currently covers the three obolnetwork images pending digest
  pinning; each entry carries a short reason. Removing an entry without
  replacing :latest in the YAML fails the test (stale-allowlist check).
- Inline TODO(image-pin) comments in llm.yaml and x402.yaml explain the
  policy at the point of violation so contributors who touch the deployment
  spec see it.

This does not pin the images (that requires GHCR access to produce the
digest) — it establishes the contract and makes drift visible.

* feat(x402): route sell http through seller gateway

* fix(obolup): harden installer writes and tty prompts

* docs: keep seller gateway report in pr body only

* feat(model): add master token accessor (#347)

* feat(sell): register by default with explicit opt-out (#349)

* feat(sell): warn that --register is off-chain only

* feat(sell): register by default with explicit opt-out

* feat(sell): show registration summary in sell status

* test(sell): cover registration defaults and sync skill docs

* feat(openclaw): surface generated agent wallet (#348)

* fix(stack): preload openclaw image in dev k3d (#352)

* feat(x402-buyer): expose confirm-spend persistence failures (#351)

---------

Co-authored-by: bussyjd <bussyjd@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants