Skip to content

fix: OAuth token refresh retry and error handling for idle timeout#133

Merged
anandgupta42 merged 1 commit intomainfrom
fix/oauth-token-refresh-idle-timeout
Mar 14, 2026
Merged

fix: OAuth token refresh retry and error handling for idle timeout#133
anandgupta42 merged 1 commit intomainfrom
fix/oauth-token-refresh-idle-timeout

Conversation

@anandgupta42
Copy link
Copy Markdown
Contributor

@anandgupta42 anandgupta42 commented Mar 14, 2026

What does this PR do?

Fixes the issue where leaving altimate-code idle for 20+ minutes causes every subsequent prompt to show just "Error" with no context or recovery path. The root cause is OAuth tokens expiring after idle, and the token refresh throwing a plain Error with no retry logic.

Changes:

  • anthropic.ts / codex.ts: Add 3-attempt retry with exponential backoff for token refresh, fail-fast on 4xx (permanent auth failures), 30s proactive refresh buffer before expiry, fix currentAuth.expires not being updated after successful refresh
  • message-v2.ts: Classify OAuth token refresh failures as ProviderAuthError instead of UnknownError, improve generic Error display (no more bare "Error" in TUI)
  • retry.ts: Make auth errors retryable at session level with actionable user message including recovery instructions
  • retry.test.ts: Add 8 new tests covering auth error retry, token refresh failure classification, and generic error handling

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other

Issue for this PR

Closes #118

How did you verify your code works?

  • All 24 retry tests pass (bun test test/session/retry.test.ts)
  • All 19 message-v2 tests pass (bun test test/session/message-v2.test.ts)
  • TypeScript typecheck passes (bunx tsc --noEmit)
  • Manual verification: confirmed error messages now show descriptive text with recovery instructions instead of bare "Error"

Checklist

  • Tests added/updated
  • I have tested this locally
  • No unrelated changes included
  • Documentation updated (if needed)
  • CHANGELOG updated (if user-facing)

)

After 20+ minutes idle, OAuth tokens expire and subsequent prompts show
unhelpful "Error" with no context or retry. This commit fixes the issue
across Anthropic and Codex OAuth plugins:

- Add 3-attempt retry with backoff for token refresh (network/5xx only)
- Fail fast on 4xx auth errors (permanent failures like revoked tokens)
- Add 30-second proactive refresh buffer to prevent mid-request expiry
- Update `currentAuth.expires` after successful refresh
- Classify token refresh failures as `ProviderAuthError` for actionable
  error messages with recovery instructions
- Make auth errors retryable at session level with user-facing guidance
- Improve generic `Error` display (no more bare "Error" in TUI)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions
Copy link
Copy Markdown

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions
Copy link
Copy Markdown

This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window.

Feel free to open a new pull request that follows our guidelines.

@github-actions github-actions Bot closed this Mar 14, 2026
@anandgupta42 anandgupta42 reopened this Mar 14, 2026
@anandgupta42 anandgupta42 merged commit 2a09639 into main Mar 14, 2026
20 of 24 checks passed
anandgupta42 added a commit that referenced this pull request Mar 17, 2026
) (#133)

After 20+ minutes idle, OAuth tokens expire and subsequent prompts show
unhelpful "Error" with no context or retry. This commit fixes the issue
across Anthropic and Codex OAuth plugins:

- Add 3-attempt retry with backoff for token refresh (network/5xx only)
- Fail fast on 4xx auth errors (permanent failures like revoked tokens)
- Add 30-second proactive refresh buffer to prevent mid-request expiry
- Update `currentAuth.expires` after successful refresh
- Classify token refresh failures as `ProviderAuthError` for actionable
  error messages with recovery instructions
- Make auth errors retryable at session level with user-facing guidance
- Improve generic `Error` display (no more bare "Error" in TUI)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@anandgupta42 anandgupta42 deleted the fix/oauth-token-refresh-idle-timeout branch March 17, 2026 00:59
anandgupta42 added a commit that referenced this pull request May 1, 2026
Comprehensive audit found 16 user-impacting regressions across 28 files.
Fixes batched here. Each change is wrapped in altimate_change markers
(`upstream_fix:` tag) so the next bridge merge re-evaluates them.

CRITICAL — would crash or hang at runtime
- `control-plane/workspace-router-middleware.ts` — `(adaptor as any).fetch(...)`
  threw TypeError when `OPENCODE_EXPERIMENTAL_WORKSPACES=1`. The Adaptor API was
  renamed `fetch → target` in v1.4.0; the misleading marker comment said the
  opposite. Rewrote to use `adaptor.target()` + `ServerProxy.http()` (same
  pattern as `src/server/router.ts:98`); skip routing for local targets so they
  fall through to next().

HIGH — user-visible regressions
- `cli/cmd/serve.ts`, `web.ts`, `uninstall.ts` — describe/log strings said
  "opencode server" / "opencode" (yargs help text shown to every user).
- `cli/error.ts` — three error messages referenced `opencode` /
  `opencode models` / `opencode.json`.
- `cli/cmd/tui/component/dialog-status.tsx` — auth hint said
  "run: opencode mcp auth".
- `cli/cmd/mcp.ts` — `Remote MCP servers ... opencode.json` info text;
  `opencode x ...` placeholder; `opencode-debug` MCP client name (sent to MCP
  servers as the client identity); resolveConfigPath wrote new MCP entries to
  `opencode.json`. Restored main's `altimate-code.json` precedence with
  `opencode.json` as fallback for existing installs.
- `cli/cmd/tui/component/error-component.tsx` — bug-report titles prefixed
  "opentui: fatal:" instead of "altimate-code: fatal:".
- `cli/ui.ts` — non-TTY path used a hardcoded "opencode" wordmark for piped
  output / CI logs / the WebCommand banner. Replaced with the same
  ALTIMATE | CODE glyphs used by the TTY path (sans ANSI).
- `cli/cmd/github.ts` — sweeping brand revert: `AGENT_USERNAME`,
  `WORKFLOW_FILE` (committed to user repos), GitHub App slug, provider
  priority key, workflow YAML name + job + mention triggers + step name,
  console logs, OIDC audience, branch prefix, share-image URL, share link
  text, "opencode infrastructure" in the system prompt. Verified domain
  changes (`altimate-code.dev` → `altimate.ai`) match the rest of our
  codebase and are intentional — kept those.
- `server/routes/global.ts` — new `/upgrade` route OpenAPI summary +
  description said "opencode" (ships in SDK codegen).
- `plugin/codex.ts` — three User-Agent strings sent to OpenAI auth servers
  reverted to `opencode/${VERSION}`; OAuth refresh retry loop (3 attempts,
  4xx-vs-5xx aware) removed; 30s token-expiry skew buffer removed.
- `session/.../message-v2.ts` (via `util/error.ts`) — opaque-error
  augmentation from PR #118/#133 replaced with bare `errorMessage(e)`, which
  only handles empty messages, not bare-name messages like "Error" /
  matching `error.name`. Centralized the augmentation in `errorMessage()` so
  every caller benefits, plus 4 new tests.
- `provider/models.ts` — `setTimeout(..., 0)` deferral on initial refresh was
  removed, re-introducing the circular-dep risk that altimate commit
  `980efaab64` was added to fix.
- `project/bootstrap.ts` — `Truncate.init()` call dropped. Without it the
  hourly scheduler that prunes `Global.Path.data/tool-output/tool_*` never
  registers, so the directory grows unboundedly.

MEDIUM — observability / robustness
- `cli/cmd/tui/util/clipboard.ts` (6 sites) — `Log.Default.debug` replaced
  with `console.log`, which writes directly to the terminal mid-render and
  corrupts the TUI display. Restored structured logger.
- `cli/cmd/tui/component/dialog-workspace-list.tsx` — same pattern, plus a
  stray `console.log(JSON.stringify(result, null, 2))` debug-print of every
  workspace creation result. Removed. Restored "workspace created" info log.
- `cli/cmd/tui/component/dialog-mcp.tsx` — same pattern.
- `control-plane/workspace.ts` — fetch lost its defensive
  `.catch(() => undefined)`, so a transient network blip kills the SSE
  reconnect loop forever; local workspaces also `return`ed out of the loop
  permanently. Restored both.
- `plugin/install.ts` — new plugin installs wrote to `.opencode/opencode.json`
  instead of `.altimate-code/altimate-code.json`. Now writes to
  `.altimate-code/` for new installs, keeps using `.opencode/` if it exists
  to avoid orphaning existing plugin configs.

LOW
- Catppuccin themes (3 variants) — `textMuted` darkened from `Subtext1` to
  `Overlay2` (upstream a11y regression). Restored.
- `cli/cmd/tui/context/route.tsx` — navigate debug log dropped.
- `plugin/github-copilot/copilot.ts` (4 sites) — User-Agent reverted to
  `opencode/${VERSION}`.
- `server/instance.ts:37` — `import("opencode-web-ui.gen.ts")` is dead in
  our build (no embedded UI; we proxy to app.altimate.ai). Marker added so
  the next merge knows it's intentional.

False positives flagged in audit but verified out of scope:
- `tool/plan.ts` PlanEnter `answer === "No"` — the entire PlanEnterTool is
  commented out (`/* */`) in both main and current branch. Dead code.
- `cli/cmd/pr.ts` `spawn("opencode", ...)` — pre-existing in main, not a
  v1.4.0 regression. Track separately.

Verified
- bun turbo typecheck — 5/5 packages clean
- bun test (focused: util, upstream, permission, server) — 553 pass / 0 fail
- Marker check — 107 warnings before/after (no new warnings introduced;
  every edit wrapped in `altimate_change` markers)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Leave altimate-code on for 20+ minutes, and later it shows error

1 participant