Skip to content

Commit 379e95e

Browse files
authored
Connection health checks: liveness, identity, and expired-credential visibility (#1251)
* Connection health checks (liveness) with OpenAPI backing A connection can declare one authenticated probe operation that answers "is this credential still alive?". Core owns the vocabulary (HealthStatus, HealthCheckSpec, HealthCheckResult, HealthCheckCandidate) and dispatch (integrations.healthCheck.{get,candidates,set}, connections.{checkHealth, validate}); the OpenAPI plugin implements the probe against its stored operation bindings, ranking candidates non-destructive-first. The UI gets a status dot + Check now per connection, a health-check editor sheet, and a validate-the-key-first step in the Add Connection modal. * Derive connection account info from the health-check probe The same probe that answers "is this credential alive?" also answers "whose account is this?": HealthCheckSpec gains an identityField dot-path into the probe response, the editor gets a typed identity picker (response schema fields, breadth-first so shallow scalars surface first) plus a live preview against a pasted test key, and the Add Connection flow auto-fills the display name from the probed identity. Two corrections over the first attempt at this: - The name auto-fill reads the label through the functional updater rather than the closure snapshot, so a name typed while the probe is in flight is never clobbered. - The response sample is only taken from healthy responses; error bodies stay out of the preview (non-healthy runs carry the classified detail). A healthy probe whose spec-save fails now surfaces the failure instead of silently pretending the check was configured. * MCP connection liveness health checks MCP connections get the liveness half of health checks: the probe dials the server and lists tools (the same path tool discovery uses), so a live token reads healthy and a revoked one reads expired. MCP has no usable identity source, so no identity is derived and the operation/identity editor stays hidden - only the status dot and Check now render. The auth-wall signal is carried structurally instead of being fished out of message strings: McpConnectionError/McpToolDiscoveryError gain an httpStatus field populated from the transport cause at the handshake boundary, and the liveness classifier reads it directly. This also fixes the transport bug that made expiry undetectable on default-configured connections: remoteTransport "auto" used to fall back from streamable-http to SSE on ANY error, including a definitive 401/403; the SSE retry then failed with an opaque error and the auth wall read as merely degraded. The fallback now propagates 401/403 as-is (the credential is the problem, not the transport). The e2e scenario runs on the DEFAULT auto transport to pin exactly that, and failed MCP tool calls whose handshake hits a 401/403 now surface the actionable auth failure instead of a generic connection_rejected message. * Health checks for Google and Microsoft Graph Both providers wire the OpenAPI health-check backing (same store, config superset) and get a ZERO-CONFIG default probe at add time, so the answer to "has this connection expired?" - the original ask, born of Google's 7-day dev-token policy - works out of the box: - Google: when the bundle includes the People API, the default check is people.get with resourceName=people/me pinned and the account email as the identity field. - Microsoft Graph: when the selected workloads include GET /me, the default check is /me with userPrincipalName as the identity field. (The first attempt at this left Graph manual-only; there is no reason to - /me is as canonical for Graph as people/me is for Google.) Both plugins declare healthCheck in their own config schemas so the spec survives each provider's read-modify-write updateBundle/updateGraph cycles (Schema.Struct decode drops undeclared keys). * Use tagged guards for the MCP connect-failure status extraction * Assert provider health-check defaults in plugin unit tests The Google auto-default (People API people.get, email identity) and the Microsoft Graph auto-default (GET /me, userPrincipalName) are pinned in each plugin's own unit suite against the existing stubbed-fetch harness, replacing the standalone provider e2e whose hand-served discovery doc never matched the plugin's URL allowlist. Also updates the connect-modal UI scenarios for the merged affix credential field (placeholder is now the bare "token"). * Own health-check storage in core; persist verdicts for at-a-glance expiry Two structural fixes over the inherited design: Core-owned spec storage. The declared health check moves out of each plugin's opaque config blob into its own column on the integration row. The old shape required every plugin to declare healthCheck in its config schema or any config write silently stripped it, and its read-modify-write persistence was exposed to lost updates. Now core writes the column directly (a plain UPDATE, no read-modify-write), plugins shrink to two hooks (listHealthCheckCandidates + checkHealth with the spec passed in), and the per-plugin describeHealthCheck/setHealthCheck hooks and schema declarations are gone. Plugins declare zero-config defaults through ctx.core.integrations.setHealthCheck. Persisted verdicts. Every checkHealth run writes its result onto the connection row (last_health), and connections.list returns it - so the accounts list shows alive/expired at a glance with no per-row clicking, which was the actual customer ask. A live Check now still overrides in-session. Also addressed from design review: - Mutating operations are hard-blocked as probes (not just ranked last): a health check runs unattended and repeatedly, and this path has no approval gate, so POST/PUT/PATCH/DELETE probes refuse to run. - Credential values are scrubbed from probe error details before they leave the server - upstream error bodies can echo the request back, including query-param-carried keys. - Google and Microsoft Graph account panels now mount the health-check editor, so their auto-configured probes are visible and adjustable (previously the flagship integrations had a working probe and no UI for it). - The editor uses the shared health-display labels instead of a local drifting copy. Cloud migration 0008 adds integration.health_check and connection.last_health. * Cover the review-driven guarantees in e2e Three scenarios close the verification gaps: - The connections list renders a persisted expired verdict on a FRESH page load with no per-row clicking (browser scenario) - the at-a-glance behavior the feature exists for. - A mutating operation declared as the health check refuses to run (unknown-with-reason, nothing reaches the upstream). - A probe's error detail never echoes the credential back, even when the upstream reflects the Authorization header into its error body. * Probe-first key check in Add Connection The old flow front-loaded a form: pick an operation and an identity field from a schema before anything had shown you what the API returns, in a block that dominated the modal. Inverted: - The probe is AUTO-PICKED (top-ranked read-only zero-argument candidate) and shown as a one-line "Calls GET me.getMe · change" caption; the full operation form only opens behind "change", or when no zero-arg candidate exists. - Identity is chosen AFTER the probe, from the response the key actually returned: the sample fields render as clickable path/value rows, and clicking one upgrades the saved check with that identityField and adopts the value as the display name. "Skip - status only" dismisses. No schema guessing, and the pre-probe UI is one line instead of a form. * Use the shared Button for the identity-picker affordances * One checking signal, no layout shift on the key-check verdict The in-flight state was double-communicated (button label swap to "Checking..." plus a separate "Checking the key..." line) and the verdict appearing pushed the layout down. Now the button carries the only in-flight signal via its width-preserving loading spinner (label never changes), and the verdict line's height is reserved from the start - the reveal fills space that was always there. * One Validate control: identity is a default, not a question The key-check UI still asked for too much attention: a caption naming the probe operation, a verdict line, and a whole ask-first panel ("Which field names this account?"). Collapsed to a single control: - One "Validate" button. The probe operation stays an invisible default (top-ranked read-only zero-arg candidate). - On healthy, the identity field is AUTO-PICKED from the response via a shared heuristic (email > login/username > display name > id, shallower paths first) and saved with the check. - Everything lands in one verdict line beside the button: "Healthy · alice@example.com · change". "change" opens the response field list as a correction, not a question; the ask-first panel is gone. pickIdentitySample lives in the core health-check vocabulary with unit coverage; the connect e2e drives the collapsed flow including the correction path. * Pick mode: the key check becomes its own focused view Two structural changes to the Add Connection modal, from review: - The credential is step 1 and takes first focus; the display name moves below it, framed as derived ("filled from the account when you check the key") — you don't name a thing before proving it exists. - "Check the key works" on an integration with no configured check now swaps the modal body into pick mode, the same view-swap the OAuth app registration uses. The user picks the read-only call (a deliberate, taught choice — no auto-pick), runs it, sees the REAL response, and clicks the field that names the account (or skips for status-only). Picking returns to the main modal with the verdict and derived name. The main modal's key-check footprint is now a button and a verdict line; all teaching density lives in the focused subview. The "Confirms the key authenticates" filler copy is gone, as is the auto-pick heuristic's UI role (pickIdentitySample stays in the SDK for other surfaces). * Two-step Add Connection: prove the key, then name and place it The pick-mode submodal was jarring — it hid the key exactly when a failed probe makes you want to edit it. Replaced with a two-step wizard in the same modal (credential methods only; OAuth is unchanged): Step 1 — get the key into a valid state. Auth method + key (first focus, visible and editable throughout), and "Check the key works": a configured check probes directly; with none the pick-a-call block expands INLINE below the key — choose the call, run it, see the real response, click the identity field. Continue is the only exit forward. Step 2 — name it and place it. The verdict travels along as a one-line recap, the display name arrives derived from the picked identity, the saved-to picker and Add connection live here. Back returns to step 1 with everything intact. The pick block is gated on its expansion alone, not on hasHealthCheck: a healthy probe saves the spec, which flips that flag mid-flow, and the block must outlive its own success until the identity pick. * Hoist the key check below the auth-method tabs The check button and pick-a-call block lived inside the API-key tab's content, so they moved with (and were clipped by) the tabs card. They now render as a sibling below the whole tabs section: same position no matter which credential method tab is active. canCheckKey already hides them for OAuth/no-auth methods, where a pasted-key check doesn't apply. * Redesign the key check as one request/response panel The check was a button that grew a form that grew a bordered box of prose and lists — hostile density. It is now the system's code-window pattern, because that is literally what this is: - ONE hairline-framed panel, always visible below the credential. Its titlebar is the request line: a mono GET chip, the operation (pre-seeded with the best read-only zero-arg candidate, editable in place; static when the integration already has a check), and one Check button. - The response renders inside the same frame: a mono status line (dot · http status · verdict · identity) and, on a healthy first-time run, the response fields as rows. Clicking a row makes it the label — marked LABEL in place, no separate picker panel, no skip link (Continue is the skip). - Destructive candidates are filtered out of the picker entirely instead of listed with a warning. Nothing expands, nothing swaps views, no duplicate verdicts, and the copy is one hint line under a mono section label. * Form-pattern pass over the Add Connection wizard Applied the form-design fundamentals to the flow: - Credential fields get a show/hide reveal toggle (constant label, aria-pressed state) — masked keys make typos invisible, and most keys are pasted then eyeballed. One toggle state per field group. - Placeholders no longer carry instructions ("paste the value / token", "token"): the visible labels and the field frame do that job; placeholder text that disappears on focus taught nothing. - Continue is never disabled. A disabled submit hides the reason and drops out of the tab order; clicking Continue with no key now says exactly what's missing (role=alert line above the footer) and clears as soon as a value lands. - The wizard position is plain text in the dialog title ("STEP 1 OF 2", mono sec-label style) — a text step indicator, not a progress bar. e2e selectors moved off placeholders (gone) onto roles and labels. * Identity-aware ranking, seamless request line, step-2 name picker, modal scroll test Four refinements to the key-check panel from review: - Candidates rank by what their response can NAME: calls whose schema carries an email beat login beat display-name beat id, ahead of the generic fewest-args/GET-first order (compareHealthCheckCandidatesByIdentity, identityPathTier shared with the sample picker). The pre-seeded request line now lands on the identity call, not an arbitrary list endpoint. - The request line is seamless: the operation combobox renders frameless inside the titlebar, so METHOD + operation + Check read as one request rather than a form row in a box. - The identity pick moved to step 2 where naming belongs: the display name is a combobox seeded with the response's identity-looking fields (value shown, path as the description); picking one also stores the path as the check's identityField. Step 1's response is read-only, identity fields ranked first, capped at 8 rows with a +N more line. - The response view no longer nests a scroller (nested scroll areas trap the wheel mid-modal); the modal is the one scroll context, and a new e2e drives a short viewport, asserts real overflow, wheel-scrolls the dialog, and reaches the footer. * Memoize the response sample off the result object * Rank every rendered response identity-first via one shared helper rankResponseSample joins the core health-check vocabulary: rows whose leaf key names the account (email > login > name > id) lead, the rest keep response order, stable within tiers. The request panel, the step-2 name options, and the health-check editor's live preview all use it, so what shows up first is always the field you came to see. Replaces the panel's inline sort; unit-covered. * Rank whoami calls above lists; free the modal wheel for real Two fixes from testing against the actual Vercel spec: - candidateIdentityTier now ignores identity keys under array segments: aliases.listAliases exposes aliases.0.creator.email (people in a collection, not the caller) and was outranking user.getAuthUser, whose user.email names the account probing. Only singular paths count toward the tier; reproduced against the full 9MB Vercel spec where user.getAuthUser now ranks first, and unit-covered with that shape. - The Add Connection dialog is now non-modal (with an explicit dim overlay, since Radix renders none in non-modal mode): a modal dialog's react-remove-scroll wrapped the wheel to the dialog subtree, so the PORTALED combobox popup, and the modal body while it was open, could not wheel-scroll at all. Same fix and rationale as the health-check editor sheet. Outside-click still dismisses; option clicks are still guarded by the portaled-popup check. The scroll e2e now also opens the operation popup in a 420px viewport and asserts the dialog still wheel-scrolls underneath it. * Automatic health checks: stale-while-revalidate on page load Manual Check now undermined the at-a-glance promise. The connections list now revalidates itself: - Core checkHealth gains ifStaleMs: return the persisted verdict when younger than the window, probe otherwise. The SERVER owns freshness, so N open tabs revalidating on load collapse to one probe per window instead of stampeding the upstream. Exposed as an optional query param on the checkHealth endpoint (bounded to a day). - AccountRow revalidates on mount: a healthy verdict younger than 5 minutes renders as-is (the cache); anything else probes in the background and corrects the dot in place. Non-healthy verdicts ALWAYS revalidate — an expired dot is exactly the state the user is waiting to see change, so recovery shows on the next load, not after the window. Check now stays as the force-refresh. - e2e: an API scenario pins the SWR contract (fresh window returns the seeded verdict verbatim, zero window probes and sees the rotated key), and the at-a-glance browser scenario now also restores the key and asserts a reload flips expired back to healthy with no clicks. * Simplify pass over the health-check diff From a four-angle cleanup review (reuse / simplification / efficiency / altitude), applied: - Removed pickIdentitySample and its tests: the auto-pick era ended when the identity moved to step 2's name options; rankResponseSample + identityPathTier cover every live consumer. IDENTITY_KEY_TIERS goes module-private (no external consumer). - Candidate ranking sorts via decorate-sort-undecorate (sortHealthCheckCandidatesByIdentity): tiers computed once per candidate instead of inside the comparator, which re-walked response fields O(n log n) times on Graph-sized specs. - Retired the "pick mode" vocabulary (handleCandidateProbe, hcCandidateReady) and rewrote comments narrating deleted UI iterations; removed the write-only hcPickedPath state and the dead KeyValidationStatus.validating prop. - health-check-editor's local STATUS_CLASS map replaced by a shared HEALTH_TEXT_CLASS in health-display.ts, next to the dot/ring maps it duplicated. - The six repeated (!wizardActive || wizardStep === X) conditions collapsed into showValidateStep/showPlaceStep. Noted but deliberately skipped: the RequestCheckPanel vs HealthCheckConfigFields overlap (intentional UX divergence: request-line panel vs form fields — unifying them would couple two surfaces that are diverging on purpose), a batched checkHealth endpoint for N-row lists (real but a follow-up: needs a new API shape), candidates-list spec recompile caching (pre-existing known trade-off, same follow-up bucket), StepHeader's four-variant scaffold and accounts-section trackEvent repetition (pre-existing on main, not this diff's debt). * Fold the remaining reuse findings - One HealthStatusLine renders every verdict (step-2 recap and the request panel's response status) instead of two hand-copied dot+label rows; the panel passes variant=response for mono + http status. - mcpLivenessFailureStatus delegates its status-code branch to the shared classifyHttpStatus, so "which HTTP statuses mean expired" has exactly one definition (the message-substring fallback stays local — it exists for causes with no status at all). * Show connection health at a glance on the integrations list Every row on the integrations list now carries a worst-of health summary: loading the page auto-checks each of the integration's connections (both owners) with the same stale-while-revalidate guard the detail page uses, and paints one status dot per row, with a mono EXPIRED/DEGRADED label when something needs attention. Rows with no connections, or nothing but never-probed ones, render nothing. The revalidation logic moves out of AccountRow into a shared use-connection-health hook (single- and multi-connection variants) so the two surfaces cannot drift; AccountRow behavior is unchanged. Adds a worst-of aggregation helper next to the health display maps, plus a browser scenario seeding a dead-token MCP server and asserting the list row reads Expired with no clicks. * Fix health-check follow-up failures and stale connection cache Do not block the OpenAPI add flow when saving the drafted health check fails: the integration already exists server-side, so return through onComplete and let the user fix the check from the detail page. Forward the http client layer through the MCP checkHealth hook so it dials the connector the same way resolveTools and invokeTool do. Invalidate the connections cache after a health check so the persisted last_health verdict is not served stale within the atom TTL. The manual Check now path invalidates unconditionally; the automatic mount-time probe invalidates only when the verdict actually changed, so an unchanged reconfirm never churns the cache (the per-mount guard already blocks a re-probe on the resulting refetch). Drop the incidental definition.name varchar(255) narrowing from the health-check migration: it is pre-existing schema drift, not part of this feature, so the migration now adds only connection.last_health and integration.health_check. * Give credential inputs accessible names, fix e2e locators after placeholder removal The add-account-modal rewrite dropped placeholder text from credential inputs (the affix already carries the instruction visually), which left the single-input cases with no accessible name. Add aria-label={input.label} to both the affixed and non-affixed single-input branches; the multi-input grid already had a proper Label htmlFor pairing. Repoint the e2e tests that used to find these fields by placeholder to role-based textbox locators scoped to the dialog, matching the pattern used elsewhere. Also thread through the "Continue" step of the credential wizard's two-step flow (validate, then place) that these tests hadn't been updated for, and scope the "Add connection" submit click to the dialog since the page has its own same-named trigger button. * Replace em-dashes in new comments and docs * Harden flaky e2e waits under CI shard load * Bound MCP discovery with a probe deadline discoverTools (the shared connect+listTools path behind resolveTools, detect, probeEndpoint, and checkHealth) had no timeout of its own, and neither the MCP SDK's connect handshake nor listTools call one either. An unresponsive endpoint (a closed loopback port after its e2e scenario's scope exits, a server wedged mid-handshake) hung the calling fiber, and with it the server-side request handling it ran under, indefinitely. Under CI shard load this showed up as auth-methods-ui.test.ts sitting on its 90s auto-probe wait and then hitting the full 120s vitest timeout, followed by every later test failing "login did not redirect to AuthKit (500)": the dev server's request-handling capacity was pinned by fibers stuck in an unbounded MCP connect, starving unrelated routes. Give discoverTools a default 15s deadline (Effect.timeoutOrElse), mapping a timeout to the same McpToolDiscoveryError("connect") shape a failed connect already produces, so callers' existing handling (auth classification, incomplete-tools fallback, health "degraded" status) all keeps working unchanged. Connections that DID get established before the deadline still close via the existing Effect.onExit handler, which fires on interruption too. Note: discoverTools already closed its connection deterministically via Effect.onExit — there was no connector/session leak. The bug was purely missing deadline, not missing cleanup. * Cache compiled specs for request-path OpenAPI fallbacks The health-check endpoints, candidate listing, and the tools/invoke fallbacks recompiled the full OpenAPI document on every request. The UI now auto-fires health checks on page mounts, so a large spec was parsed into a fresh multi-MB object graph over and over, growing the dev server heap until the process hit the V8 limit and every subsequent request failed (the CI shard wedge behind the login 500 cascade). Compile through a small module-level LRU instead, keyed by the config's content-addressed specHash: same hash means byte-identical text, and a spec update writes a new hash so stale entries age out. Capacity is four compiled documents; legacy configs without a hash bypass the cache. Add and update paths keep compiling fresh input directly. * Capture dev-stack logs in e2e runs and raise CI heap headroom * Use a non-hidden dir for captured server logs actions/upload-artifact skips hidden files by default, so the .server directory never made it into the CI artifact. Rename to server-logs so the boot log actually ships with failed runs. * Pre-bundle late-discovered deps so the dev worker never program-reloads mid-suite vite was discovering effect/Match, effect/Predicate, and js-yaml during test runs instead of at boot, forcing a re-optimize and full program reload on both the client and SSR (workerd) environments. Each reload strands the previous worker program's heap inside workerd, and a handful of them exhausts its heap limit and kills the dev server mid-shard. Adding these to optimizeDeps.include (client) and environments.ssr.optimizeDeps.include (SSR) in apps/cloud and apps/host-selfhost's vite configs makes vite bundle them at boot instead, so no mid-run discovery happens. js-yaml is a transitive dependency via @executor-js/plugin-openapi that bun's isolated install doesn't hoist into either app's node_modules, so a bare "js-yaml" specifier silently fails to resolve in optimizeDeps.include. Used vite's "<pkg> > <dep>" nested resolution syntax to resolve it from the owning package instead.
1 parent a380e34 commit 379e95e

66 files changed

Lines changed: 7496 additions & 188 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
ALTER TABLE "connection" ADD COLUMN "last_health" json;--> statement-breakpoint
2+
ALTER TABLE "integration" ADD COLUMN "health_check" json;

0 commit comments

Comments
 (0)