Skip to content

starknet_transaction_prover: /health returns 503 when service is saturated#14171

Open
avi-starkware wants to merge 1 commit into
avi/prover-v3/panic-counterfrom
avi/prover-v3/saturation-health
Open

starknet_transaction_prover: /health returns 503 when service is saturated#14171
avi-starkware wants to merge 1 commit into
avi/prover-v3/panic-counterfrom
avi/prover-v3/saturation-health

Conversation

@avi-starkware

Copy link
Copy Markdown
Collaborator

Adds SaturationMonitor (shared by ProvingRpcServerImpl and
HealthLayer) that tracks whether the concurrency semaphore has been
continuously rejecting proving requests. Once that has held for the
configured window (health_max_saturated_ms, default 10s), /health
returns 503 with an opaque body so load balancers can drain the pod
before in-flight requests start failing.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

@cursor

cursor Bot commented May 24, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes load-balancer health semantics and pod draining behavior; logic is localized to admission/health with tests, but mis-tuned thresholds could drain pods prematurely or leave saturated pods marked healthy.

Overview
Adds SaturationMonitor, shared between ProvingRpcServerImpl and HealthLayer, so sustained concurrency shedding (queue full or worker wait timeout) can mark the pod unhealthy for load balancers.

/health still short-circuits GET /health, but now returns 503 with an opaque {"status":"unhealthy","reason":"saturated"} body once rejections have continued for health_max_saturated_ms (default 10s, configurable via CLI/env). A successful worker-slot acquire clears the window so brief bursts do not drain the pod.

Admission path updates record rejected_queue_full / rejected_wait_timeout on the prove outcome counter, bump mark_rejected on those paths, and add queue depth and wait duration Prometheus metrics (with pre-registration at startup).

HealthLayer is no longer a zero-config default: main builds one monitor clone for RPC and one for health, and HTTP/HTTPS start_server / middleware wiring pass the configured layer through.

Reviewed by Cursor Bugbot for commit f7a4037. Bugbot is set up for automated code reviews on this repo. Configure here.

avi-starkware commented May 24, 2026

Copy link
Copy Markdown
Collaborator Author

@reviewable-StarkWare

Copy link
Copy Markdown

This change is Reviewable

@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from cbd1def to e503ebd Compare May 24, 2026 16:48
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from 318c9c2 to 53381dd Compare May 24, 2026 16:48
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from e503ebd to db503b7 Compare May 26, 2026 08:43
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch 2 times, most recently from d477f5e to ef3cf0b Compare May 26, 2026 12:16
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 1da27e9 to ac98d86 Compare May 26, 2026 12:17
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from ef3cf0b to eb8da8d Compare May 26, 2026 12:17
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from ac98d86 to e4bbbdc Compare May 26, 2026 12:58
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from eb8da8d to e084131 Compare May 26, 2026 12:58
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from e4bbbdc to 06bb59e Compare May 26, 2026 16:14
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch 2 times, most recently from 171e482 to 158a680 Compare May 26, 2026 16:47
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 06bb59e to 0b2c8cc Compare May 26, 2026 16:47
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from 158a680 to b385d86 Compare May 26, 2026 16:59
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 0b2c8cc to 4b1caba Compare May 26, 2026 16:59
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from b385d86 to a462e96 Compare May 27, 2026 10:01
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from a462e96 to a83176f Compare May 27, 2026 10:35
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 05ed9b4 to 72918b7 Compare May 27, 2026 10:35
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from a83176f to b4c05a6 Compare May 27, 2026 12:55
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch 2 times, most recently from 74f4f46 to 728f22c Compare May 27, 2026 13:11
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from b4c05a6 to 2739271 Compare May 27, 2026 13:11
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 728f22c to 966f499 Compare May 27, 2026 14:04
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from 2739271 to 89534f1 Compare May 27, 2026 14:04
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from 89534f1 to d3f1139 Compare May 27, 2026 14:20
Comment thread crates/starknet_transaction_prover/src/server/rpc_impl.rs Outdated
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 646fb1e to a77477b Compare May 31, 2026 10:23
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch 2 times, most recently from d15dc19 to def7ea4 Compare June 1, 2026 08:17
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from a77477b to 8017e9e Compare June 1, 2026 08:17
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from def7ea4 to b321f22 Compare June 1, 2026 11:18
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 8017e9e to 23ed570 Compare June 1, 2026 11:18
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 23ed570 to b7a8e8e Compare June 7, 2026 10:11
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from b321f22 to 5cd174d Compare June 7, 2026 10:11
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from b7a8e8e to b1ad51d Compare July 1, 2026 17:32
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from 5cd174d to 0989241 Compare July 1, 2026 17:32

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0989241. Configure here.

Comment thread crates/starknet_transaction_prover/src/server/saturation.rs Outdated
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from b1ad51d to f5f25a2 Compare July 2, 2026 09:53
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from 0989241 to 47374eb Compare July 2, 2026 09:53
…rated

Adds `SaturationMonitor` (shared by `ProvingRpcServerImpl` and
`HealthLayer`) that tracks whether the concurrency semaphore has been
continuously rejecting proving requests. Once that has held for the
configured window (`health_max_saturated_ms`, default 10s), `/health`
returns 503 with an opaque body so load balancers can drain the pod
before in-flight requests start failing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from 47374eb to f7a4037 Compare July 2, 2026 11:53
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from f5f25a2 to 9e0ab62 Compare July 2, 2026 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants