Skip to content

refactor(cloud-agent-next): migrate to git-token-service RPC#2747

Open
eshurakov wants to merge 1 commit intomainfrom
eshurakov/lush-heart
Open

refactor(cloud-agent-next): migrate to git-token-service RPC#2747
eshurakov wants to merge 1 commit intomainfrom
eshurakov/lush-heart

Conversation

@eshurakov
Copy link
Copy Markdown
Contributor

Summary

Replaces the in-worker GitHubTokenService and InstallationLookupService in cloud-agent-next with calls to the shared git-token-service Worker via a new GIT_TOKEN_SERVICE service binding, and removes the now-redundant token pre-fetching in the web app tRPC routers.

Architectural changes

  • cloud-agent-next no longer talks to Postgres (Hyperdrive) or KV to resolve GitHub App installations / cache tokens. A single GIT_TOKEN_SERVICE.getTokenForRepo / getToken / getGitLabToken RPC replaces both services.
  • Token resolution for GitHub repo installs and managed GitLab integrations is now consolidated in a new helper services/cloud-agent-next/src/services/git-token-service-client.ts, used from both the synchronous session-prepare path and the autoInitiate async preparation path.
  • A new gitlabTokenManaged flag is persisted in CloudAgentSession metadata. When set, startExecutionV2 refreshes the GitLab token on every execution via a new refreshManagedGitLabToken helper on the DO.
  • Web routers (cloud-agent-next-router.ts + organization-cloud-agent-next-router.ts) stop fetching GitHub/GitLab tokens for prepareSession and sendMessage. cloud-agent-next handles token resolution and refresh centrally. The No GitLab integration found. Please connect your GitLab account first. BAD_REQUEST previously surfaced by the web app is now raised inside cloud-agent-next at prepare time, preserving the UX.
  • wrangler.jsonc: wires the GIT_TOKEN_SERVICE binding (prod + dev) and drops GITHUB_APP_ID, GITHUB_LITE_APP_ID, the GITHUB_TOKEN_CACHE KV namespace and the HYPERDRIVE binding.

Net change: −549/+286. The old KV namespace (GITHUB_TOKEN_CACHE) and Hyperdrive config (HYPERDRIVE) in Cloudflare are now unbound; they can be decommissioned after rollout.

Verification

  • Manual verification: not performed in this session. The change is deploy-gated on the GIT_TOKEN_SERVICE binding being available in the cloud-agent-next + cloud-agent-next-dev Worker environments; rollout should be coordinated with git-token-service deployment. Suggested manual checks before/after deploy to staging:
    • Start a GitHub-repo session via prepareSession (web) → DO initialises with a resolved installation token; /stream receives ready.
    • Start a GitLab-repo session with a managed integration → clone succeeds using the RPC-resolved token.
    • Start a GitLab session with no integration configured → prepareSession returns BAD_REQUEST: No GitLab integration found….
    • Follow-up sendMessage on a long-running GitLab-managed session after token rotation → the DO refreshes via RPC without the web app forwarding a token.
  • Additional manual verification (user to fill in):

Visual Changes

N/A

Reviewer Notes

  • Deploy ordering: git-token-service must expose GitTokenRPCEntrypoint (with getTokenForRepo, getToken, getGitLabToken) in both prod and dev before this worker is deployed.
  • Orphaned resources: the KV namespace id ab4d777d134a43248639044613ea29ef (prod) / 33b5f1f1be064e919934bee83df4067c (dev) and Hyperdrive id 624ec80650dd414199349f4e217ddb10 are no longer referenced — follow up to delete if nothing else uses them.
  • Risk areas:
    • CloudAgentSession.refreshManagedGitLabToken is best-effort: on RPC failure it logs a warning and returns the last-known token rather than failing the execution. Worth eyeballing whether that's the desired behaviour for all callers (startSession and resumeSession).
    • async-preparation.ts now re-resolves the GitLab token even though session-prepare.ts already resolved it on the autoInitiate fast path — a small duplicate RPC, kept for simplicity (resolved values aren't plumbed through PreparationInput).
    • The web app no longer validates GitLab integration presence before calling prepareSession; the error now comes from cloud-agent-next. The message text is identical so UX should match.
  • Not changed: the legacy services/cloud-agent (V1) still uses its own token path; this refactor is scoped to cloud-agent-next and the web routers that target it.

Replace the in-worker GitHubTokenService and InstallationLookupService
with calls to the shared git-token-service Worker via a GIT_TOKEN_SERVICE
service binding, and drop the now-redundant token fetching in the web
app routers.

- Wire GIT_TOKEN_SERVICE binding in wrangler.jsonc; drop GITHUB_APP_ID,
  GITHUB_LITE_APP_ID, GITHUB_TOKEN_CACHE KV and HYPERDRIVE bindings.
- Resolve GitHub tokens for repo + managed GitLab tokens through a new
  shared helper (src/services/git-token-service-client.ts) used from
  both session-prepare and async-preparation paths.
- Persist gitlabTokenManaged in session metadata so the DO can refresh
  GitLab tokens on startExecutionV2 via refreshManagedGitLabToken.
- Restore the 'No GitLab integration found' BAD_REQUEST at session
  prepare so the failure surfaces early instead of as an opaque git
  clone error.
- Web routers (personal + org) no longer fetch GitHub/GitLab tokens
  for prepareSession/sendMessage — cloud-agent-next handles token
  resolution and refresh centrally.
githubToken,
gitToken,
});
return await client.sendMessage(input);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Existing GitLab sessions lose token refresh after rollout

gitlabTokenManaged is only written for sessions prepared under the new code path. Sessions that were already prepared before this deploy still rely on the web app to send a fresh gitToken on each sendMessage; after this change they fall back to the original prepare-time token in DO metadata, so long-lived sessions can start failing once that token rotates or expires.

GITHUB_APP_PRIVATE_KEY: env.GITHUB_APP_PRIVATE_KEY,
GITHUB_LITE_APP_ID: env.GITHUB_LITE_APP_ID,
GITHUB_LITE_APP_PRIVATE_KEY: env.GITHUB_LITE_APP_PRIVATE_KEY,
const result = await env.GIT_TOKEN_SERVICE.getGitLabToken({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: RPC exceptions still break the execution

This helper is documented as best-effort, but getGitLabToken() is awaited without a try/catch. If the service binding throws during a transient outage, startExecutionV2() falls into its outer error path and returns INTERNAL instead of continuing with the last stored token.

orgId: metadata.orgId,
});
if (result.success) {
return result.token;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Successful refreshes are not persisted

When this returns a newer token, the DO metadata still keeps the original metadata.gitToken. A later refresh failure will therefore fall back to the stale prepare-time token, not the last known working token described in the comment.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Apr 23, 2026

Code Review Summary

Status: 3 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 3
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
apps/web/src/routers/cloud-agent-next-router.ts 156 Existing prepared GitLab sessions stop refreshing tokens after rollout because sendMessage no longer forwards gitToken.
services/cloud-agent-next/src/persistence/CloudAgentSession.ts 2288 refreshManagedGitLabToken does not catch RPC exceptions, so a transient service-binding failure aborts execution instead of falling back to the current token.
services/cloud-agent-next/src/persistence/CloudAgentSession.ts 2293 Refreshed GitLab tokens are not persisted, so the advertised last-known-token fallback can regress to the original stale token after a refresh succeeds and a later refresh fails.

Fix these issues in Kilo Cloud

Other Observations (not in diff)

No additional observations.

Files Reviewed (11 files)
  • apps/web/src/routers/cloud-agent-next-router.ts - 1 issue
  • apps/web/src/routers/organizations/organization-cloud-agent-next-router.ts - 0 issues
  • services/cloud-agent-next/src/persistence/CloudAgentSession.ts - 2 issues
  • services/cloud-agent-next/src/persistence/async-preparation.ts - 0 issues
  • services/cloud-agent-next/src/persistence/schemas.ts - 0 issues
  • services/cloud-agent-next/src/persistence/types.ts - 0 issues
  • services/cloud-agent-next/src/router.test.ts - 0 issues
  • services/cloud-agent-next/src/router/handlers/session-prepare.ts - 0 issues
  • services/cloud-agent-next/src/services/git-token-service-client.ts - 0 issues
  • services/cloud-agent-next/src/types.ts - 0 issues
  • services/cloud-agent-next/wrangler.jsonc - 0 issues

Reviewed by gpt-5.4-20260305 · 1,036,451 tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant