Skip to content

[WEBXP-747] circleci signup: server-side handshake polling#1199

Open
Fab10-CircleCi wants to merge 7 commits intoCircleCI-Public:mainfrom
Fab10-CircleCi:fabio/webxp-469-magic-path-cli-v2
Open

[WEBXP-747] circleci signup: server-side handshake polling#1199
Fab10-CircleCi wants to merge 7 commits intoCircleCI-Public:mainfrom
Fab10-CircleCi:fabio/webxp-469-magic-path-cli-v2

Conversation

@Fab10-CircleCi
Copy link
Copy Markdown

@Fab10-CircleCi Fab10-CircleCi commented Apr 2, 2026

Summary

Rewrites circleci signup to use the server-side handshake pattern. Instead of running a localhost HTTP server, the CLI generates a UUID v4 handshake_id, opens the browser to /cli-auth?handshake_id=UUID, and polls the backend until the token is delivered.

Key changes

  • UUID generation: handshake_id is a UUID v4 generated client-side
  • Browser open: launches app.circleci.com/cli-auth?handshake_id=UUID (respects CIRCLECI_APP_URL env override for enterprise / testing)
  • Polling loop: GET /api/v1/cli-handshake/{handshake_id} every 3s, 10-minute overall client-side timeout (independent from the server-side Redis TTL — see "Server coupling" below)
  • Reuses cfg.HTTPClient: preserves the custom CA bundle (cfg.TLSCert) and transport config set up at startup — required for self-hosted installs. Per-request deadlines come from context.WithTimeout so the shared client's Timeout field is never mutated
  • Content-Type guard: on a 200 response with a non-JSON body (Cloudflare HTML error page, misrouted proxy), surfaces a readable error with a body snippet instead of the stdlib's cryptic invalid character '<'
  • Transient-error retry: 429 Too Many Requests and 5xx responses are retried under the same budget as transport errors (resets on any 202). Hard-fails only on genuinely unexpected statuses
  • Poll intervals are function parameters, not package globals — tests drive the loop deterministically, safe under ginkgo -p
  • No 404 handling: server returns 202 for both pending and expired keys (deliberate — no info leak on unauthenticated GET, no server-side negative cache). CLI timeout is the sole expiry path
  • Graceful cancellation: Ctrl+C during polling exits cleanly via context.CancelFunc
  • Network resilience: tolerates up to 3 consecutive transient errors (transport + 429 + 5xx) before aborting

Constants

  • handshakeTimeout = 10 minutes (user-facing wait budget; independent from server TTL)
  • handshakePollWait = 3 seconds
  • handshakeHTTPTO = 10 seconds (per-request deadline, applied via context.WithTimeout)
  • handshakeMaxNetErrs = 3

Server coupling

The CLI's 10-minute timeout and the auth-svc Redis TTL are independent contracts:

  • CLI timeout = how long the user waits before the CLI gives up (10 min)
  • Server TTL = how long the handshake entry survives after the browser POST (currently 60s — see authentication-service#1711)

The only structural requirement is that the server TTL comfortably exceed handshakePollWait (3s) so a transient blip + one retry still finds the entry. The shorter server TTL minimizes at-rest exposure in Redis; the longer client timeout accommodates slow user journeys (email verification, MFA).

Known edge case accepted by tech leads: if the CLI is fully suspended (laptop lid closed, Docker paused) at the exact moment the browser POSTs, the 60s server TTL may expire before the CLI resumes. The CLI will see 202s until handshakeTimeout fires; the user re-runs circleci signup. Low-frequency, no code change required.

Files changed

  • cmd/signup.go — full rewrite of signup command
  • cmd/signup_unit_test.go — 16 specs covering polling (202→200, 429/5xx retry, sustained 429, unexpected status, cancellation, timeout, sustained network errors, non-JSON 200 body), base URL resolution, already-authenticated guard, save-token, telemetry event

Related

Test plan

  • All 16 unit specs pass (go test ./cmd -ginkgo.focus "Signup")
  • go vet ./... clean
  • go tool golangci-lint run ./... clean
  • go build ./cmd/... succeeds
  • Manual: run circleci signup, verify browser opens with handshake_id in URL
  • Manual: complete auth in browser, verify CLI receives token and saves to ~/.circleci/cli.yml
  • Manual: Ctrl+C during polling → clean exit, no panic
  • Manual: 10-minute timeout → exits with timeout message
  • Manual (self-hosted): set CIRCLECI_APP_URL to an enterprise host with a custom CA — verify TLS succeeds (exercises the cfg.HTTPClient reuse)

🤖 Generated with Claude Code

@Fab10-CircleCi Fab10-CircleCi force-pushed the fabio/webxp-469-magic-path-cli-v2 branch from 6db9f76 to 352a958 Compare April 2, 2026 07:20
Fab10-CircleCi and others added 2 commits April 16, 2026 21:24
Opens browser directly to /cli-auth with CLI params as top-level
query params. The frontend handles session detection: authenticated
users get instant PAT creation; unauthenticated users are redirected
to signup first. Bypasses auth-svc return-to bug entirely.

Security hardening:
- Strict Origin validation (reject missing/wrong with 403)
- Constant-time state comparison via crypto/subtle
- CORS pinned to https://app.circleci.com (static, never reflected)
- Access-Control-Allow-Private-Network for Chrome PNA
- Method validation (GET only on /token)
- 127.0.0.1 binding only

Features:
- Unique PAT label (hostname + timestamp) prevents 422 duplicates
- Already-authenticated guard with --force bypass
- --no-browser fallback for headless/SSH
- JSON responses for structured error handling
- Human-readable URL display in terminal
- 24 unit tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the localhost HTTP callback server with a client-initiated polling
loop against the auth-svc handshake endpoint. The CLI generates a UUID v4
handshake_id, opens the browser to /cli-auth?handshake_id=..., and polls
GET /api/v1/cli-handshake/{id} every 3 seconds until the token arrives
(200), the handshake expires (404), or the 10-minute timeout elapses.

This eliminates the cross-origin / Private-Network / same-browser / MFA-
interrupt failure modes of the prior localhost approach — the browser and
CLI never communicate directly, so any device can complete auth.

- Remove localhost listener, CORS/PNA middleware, state-nonce machinery,
  and the cli_port/cli_state/cli_label URL params.
- Add UUID-based handshake_id, pollHandshake with retry on transient
  network errors, ctx-based Ctrl+C cancellation, 10-minute overall
  deadline, and a CIRCLECI_APP_URL env override for enterprise/testing.
- --no-browser now prints the URL for cross-device use instead of
  prompting for a manual token paste (the handshake makes manual paste
  unnecessary).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Fab10-CircleCi Fab10-CircleCi force-pushed the fabio/webxp-469-magic-path-cli-v2 branch from 352a958 to d249bc6 Compare April 17, 2026 03:25
Fab10-CircleCi and others added 2 commits April 16, 2026 21:31
Auth-svc returns 202 on cache miss, not 404 — the dedicated "authentication
expired" branch could never fire in normal operation, and its message was
misleading since the server has no 404-for-expiry path. A truly unexpected
404 (e.g. routing misconfiguration) still surfaces via the default
"unexpected response: 404" handler, which is honest about what happened
without inventing a reason. Timeout is the sole expiry path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Fab10-CircleCi Fab10-CircleCi changed the title [WEBXP-469] Add circleci signup command — Magic Path v2 [WEBXP-747] circleci signup: server-side handshake polling Apr 17, 2026
Fab10-CircleCi and others added 3 commits April 16, 2026 23:52
Three PR-review follow-ups:

1. Reuse cfg.HTTPClient instead of constructing a bare http.Client. The
   configured client carries the custom CA pool built from cfg.TLSCert
   plus cfg.TLSInsecure — without it, self-hosted / enterprise users
   who point CIRCLECI_APP_URL at their own app host would hit TLS
   verification failures. Per-request deadlines now come from
   context.WithTimeout(ctx, requestTimeout), so the shared client's
   Timeout field stays untouched.

2. De-globalize handshakePollWait. The poll interval and per-request
   timeout are now function parameters on pollHandshake; production
   passes the package constants, tests pass shortened durations.
   Removes the var + restore-defer pattern that would have raced under
   `ginkgo -p` parallel execution.

3. Guard against non-JSON 200 bodies (e.g. Cloudflare / gateway HTML
   error pages that return 200 on the happy-path URL). handshakePoll
   now checks Content-Type on a 200 and returns a readable error with
   a truncated body snippet instead of surfacing the stdlib's
   "invalid character '<'" JSON error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CLI's 10-minute poll deadline is coupled to auth-svc's handshake
cache TTL (CLI_HANDSHAKE_TTL, default 10m, authentication-service PR
#1711). Document the constraint so a future change to either value
can't silently regress the laptop-sleep / Docker-pause case where the
server evicts the entry before the CLI finishes polling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes following the auth-svc decision to revert the Redis
handshake TTL from 10 min back to 60s (authentication-service PR #1711
commit ddfcd807):

1. Rewrite the handshakeTimeout inline comment. The prior version
   asserted that the CLI timeout must stay ≤ the auth-svc TTL — that
   invariant no longer holds and never really did. The only structural
   requirement is that the server TTL exceed handshakePollWait; the
   client-side timeout is an independent UX knob.

2. Treat 429 Too Many Requests and 5xx responses as transient rather
   than hard-failing on the default "unexpected response" branch.
   Rate limiting is expected once WEBXP-751 lands Gubernator on the
   unauthenticated GET, and 5xx covers backend blips. They share the
   existing handshakeMaxNetErrs budget (3 consecutive) and the counter
   still resets on any 202.

Two new specs cover the retry-then-succeed and sustained-429 paths;
the existing "unexpected status" spec is switched from 500 → 418 now
that 5xx is transient.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant