Skip to content

revert(vtex): roll back to kubernetes-bun deploy#429

Merged
viktormarinho merged 1 commit intomainfrom
viktormarinho/revert-vtex-workers
May 6, 2026
Merged

revert(vtex): roll back to kubernetes-bun deploy#429
viktormarinho merged 1 commit intomainfrom
viktormarinho/revert-vtex-workers

Conversation

@viktormarinho
Copy link
Copy Markdown
Contributor

@viktormarinho viktormarinho commented May 6, 2026

Summary

Reverts the Workers migration (#424#427). The runtime caching fix landed in 1.6.2 but warm `tools/list` latency stayed at ~5s/call — likely Workers-isolate-recycling for low-traffic MCPs, but not worth more iteration cycles. Going back to the working kubernetes-bun pipeline.

What's restored

  • `deploy.json` — vtex entry re-added
  • `vtex/server/main.ts` — bun `serve()` entrypoint
  • `vtex/package.json` — deco-cli / openapi-ts deps and scripts
  • `vtex/tsconfig.json`, `vtex/app.json` — pre-Workers state
  • `vtex/bun.lock` — vendored back

What's deleted

  • `.github/workflows/deploy-vtex.yml`
  • `vtex/wrangler.toml`

Manual cleanup needed (Cloudflare side)

  • Tear down the `vtex-mcp` Worker
  • Remove the `vtex-mcp.decocms.com` custom domain

🤖 Generated with Claude Code


Summary by cubic

Rolls back VTEX MCP from Cloudflare Workers to the kubernetes-bun runtime to fix high warm-request latency. Restores the bun serve() entrypoint and the previous build/deploy pipeline.

  • Refactors

    • Restore vtex deploy target in deploy.json with platformName: "kubernetes-bun" and point app.json to https://sites-vtex.decocache.com/mcp.
    • Remove Workers artifacts: .github/workflows/deploy-vtex.yml and vtex/wrangler.toml.
    • Revert tooling: pin @decocms/runtime to 1.3.1, bring back deco-cli and @hey-api/openapi-ts, update scripts for bun build/deploy, add vtex/bun.lock.
    • Switch server entry to serve(runtime.fetch) and adjust tsconfig/tools for the non-Workers environment.
  • Migration

    • Decommission the Cloudflare Worker vtex-mcp and remove the vtex-mcp.decocms.com custom domain.

Written for commit b1c16be. Summary will update on new commits.

Reverts the Workers migration (#424#427). The runtime cache fix from
decocms/studio#3299/#3300 didn't restore acceptable tools/list latency
(~5s/call still observed after 1.6.2). Going back to the working
kubernetes-bun pipeline; vtex stays on @decocms/runtime 1.3.1 with the
serve()-style entrypoint.

Restores: deploy.json entry, wrangler.toml deleted, deploy-vtex.yml
workflow deleted, server/main.ts back to bun serve(), package.json
back to deco-cli/openapi-ts deps, tsconfig/app.json restored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@viktormarinho viktormarinho merged commit 855f2c0 into main May 6, 2026
2 checks passed
viktormarinho added a commit that referenced this pull request May 6, 2026
…on (#430)

The kubernetes-bun rollback in #429 dropped @decocms/runtime from ^1.6.2
back to 1.3.1. With 1.3.1, requests reach the pod with a populated
MESH_REQUEST_CONTEXT envelope (token/connectionId/meshUrl all set) but
state arrives as an empty object — so state.accountName is null and
every tool call fails with "VTEX accountName is missing".

Confirmed in the deployed pod logs:
  hasMeshContext: true, hasToken: true, hasConnectionId: true,
  hasMeshUrl: true, stateKeys: [], stateAccountNamePresent: false

The Workers latency that prompted the revert was startup-CPU-budget
specific to Cloudflare Workers, not a Bun problem, so this only bumps
the runtime/bindings/sdk versions and keeps the kubernetes-bun deploy
and serve()-style entrypoint intact.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
viktormarinho added a commit that referenced this pull request May 6, 2026
…tory env (#431)

* fix(vtex): bump @decocms/runtime to ^1.6.2 to restore state propagation

The kubernetes-bun rollback in #429 dropped @decocms/runtime from ^1.6.2
back to 1.3.1. With 1.3.1, requests reach the pod with a populated
MESH_REQUEST_CONTEXT envelope (token/connectionId/meshUrl all set) but
state arrives as an empty object — so state.accountName is null and
every tool call fails with "VTEX accountName is missing".

Confirmed in the deployed pod logs:
  hasMeshContext: true, hasToken: true, hasConnectionId: true,
  hasMeshUrl: true, stateKeys: [], stateAccountNamePresent: false

The Workers latency that prompted the revert was startup-CPU-budget
specific to Cloudflare Workers, not a Bun problem, so this only bumps
the runtime/bindings/sdk versions and keeps the kubernetes-bun deploy
and serve()-style entrypoint intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(vtex): read MESH_REQUEST_CONTEXT from runtimeContext, not factory closure

Every tool was capturing `env` in its factory closure and reading
`env.MESH_REQUEST_CONTEXT` from inside execute. The @decocms/runtime
caches tool registrations after the first request (see tools.ts:
`let cached: Registrations | null`) and creates a fresh `bindings` env
per request — so the captured env is the FIRST request's snapshot,
frozen for the lifetime of the pod.

When a pod's first request happened to carry an `x-mesh-token` with
populated state, every subsequent call reused that captured state
(seemingly worked). When the first request was an unauthenticated
`tools/list` (e.g. just after a Knative scale-up), every later call
saw `state: {}` and failed with "VTEX accountName is missing" — even
though studio was correctly forwarding the JWT with the connection's
configuration_state. Verified end-to-end: studio's `buildRequestHeaders`
mints a JWT containing `state: { accountName, appKey, appToken }` for
this connection, the JWT reaches the pod, but the cached tool closure
ignores it.

The runtime expects `execute` to read per-request env from
`runtimeContext.env` (filled from AsyncLocalStorage on every call) — see
the comment in @decocms/runtime tools.ts:821 ("Tool *execution* reads
per-request context from State (AsyncLocalStorage), so reusing
definitions is safe"). Switch all four execute paths
(createToolFromOperation + the three custom tools) to read from
`runtimeContext.env` and discard the captured factory env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant