Skip to content

feat: Add an expect query parameter to the status endpoints for server-side health checks#710

Draft
keelerm84 wants to merge 3 commits into
v9from
mk/SDK-2557/queryable-status-endpoints
Draft

feat: Add an expect query parameter to the status endpoints for server-side health checks#710
keelerm84 wants to merge 3 commits into
v9from
mk/SDK-2557/queryable-status-endpoints

Conversation

@keelerm84

Copy link
Copy Markdown
Member

Customers used /status as a health gate by fetching the JSON body,
extracting fields with jq, and comparing values in a shell. They asked
us to ship jq in the relay Docker image; a vendored binary is an attack
surface we do not control and cannot go in the distroless image.

Instead, all status endpoints (/status and the per-environment routes)
now accept a repeatable, AND-ed expect= query parameter
and answer with an HTTP status code: 200 when every clause holds, 412
when a well-formed clause does not (including an absent field), and 400
when a clause is malformed. The verdict body summarizes each clause for
debugging, but the status code is the contract, so a probe or monitor
needs no body parsing and nothing installed.

The clause grammar is a small bounded path evaluator -- not an embedded
jq engine -- since the endpoints are unauthenticated. It supports map
keys, bracket-quoted keys, and array index/field-filter access, the last
of which is defined now for forward compatibility with the upcoming
concurrent-keys arrays.

Customers used /status as a health gate by fetching the JSON body,
extracting fields with jq, and comparing values in a shell. They asked
us to ship jq in the relay Docker image; a vendored binary is an attack
surface we do not control and cannot go in the distroless image.

Instead, all status endpoints (/status and the per-environment routes)
now accept a repeatable, AND-ed expect=<path><op><value> query parameter
and answer with an HTTP status code: 200 when every clause holds, 412
when a well-formed clause does not (including an absent field), and 400
when a clause is malformed. The verdict body summarizes each clause for
debugging, but the status code is the contract, so a probe or monitor
needs no body parsing and nothing installed.

The clause grammar is a small bounded path evaluator -- not an embedded
jq engine -- since the endpoints are unauthenticated. It supports map
keys, bracket-quoted keys, and array index/field-filter access, the last
of which is defined now for forward compatibility with the upcoming
concurrent-keys arrays.
Review of the expect-query feature surfaced an unauthenticated panic and
a parser robustness gap:

- parseBracket sliced inner[1:0] and panicked when a bracket contained a
  lone quote character (e.g. ?expect=a["]=1). On the public /status
  endpoint this aborted the connection and logged a stack trace per
  request -- a log-flood vector. Guard the quoted-key branch with a
  length check so a lone quote falls through to a normal 400.
- parseClause decremented bracket depth with no floor, so a stray ']'
  could drive depth negative and hide the real top-level operator. Clamp
  at zero.
- Document the first-']' limitation for bracketed keys/filter values.

Also corrects the docs example for the environments map key (it is the
display name, which contains spaces/parentheses and needs bracket-quoting
and URL-encoding) and notes that an empty expect= value returns 400.

Adds regression tests: the lone-quote panic inputs, the stray-']' case,
explicit-null rendering, the != operator through the handler, and a
not-ready 503-before-evaluation ordering test.
Locks down the documented behavior that a present-but-empty expect=
value returns 400 end-to-end, not just at the evaluator level.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant