feat: add --max-requests for worker process recycling#1354
Conversation
Add support for recycling worker processes after a configurable number of requests, similar to Gunicorn's max_requests. This helps contain memory leaks from long-running Python processes. - Add global atomic request counter in Rust, incremented on every HTTP request and exposed to Python via get_request_count(). - Add --max-requests CLI argument to Config. - In child processes, schedule a periodic check (every 5s) that stops the event loop when the request count exceeds the threshold. - In the parent process (multi-process mode), run a supervisor loop that detects exited workers and respawns them with fresh state. - Also includes graceful shutdown (terminate + join with timeout) since the supervisor loop requires it. Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a CLI Changes
Sequence DiagramsequenceDiagram
participant Client
participant HTTP as Rust HTTP Server
participant Worker as Python Worker Process
participant PoolMgr as Process Pool Manager
Client->>HTTP: send HTTP request
HTTP->>HTTP: REQUEST_COUNT += 1
HTTP-->>Client: respond
loop periodic poll (if max_requests set)
Worker->>HTTP: call get_request_count()
HTTP-->>Worker: return N
alt N >= max_requests
Worker->>Worker: initiate graceful shutdown
Worker-->>PoolMgr: exit observed
PoolMgr->>PoolMgr: clone socket & spawn replacement (pass max_requests)
else
Worker->>Worker: continue serving
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
robyn/processpool.py (1)
135-151:⚠️ Potential issue | 🔴 CriticalDon't enable recycling in the inline worker path.
When
processes == 1or on Windows,init_processpool()executesspawn_process()in the current process and returns no supervisor pool. Passingmax_requeststhrough this branch makes the only server process stop itself once the threshold is hit, with nothing to replace it. That contradicts the single-process behavior described in the PR objective.🛠️ Minimal fix
if sys.platform.startswith("win32") or processes == 1: + if max_requests and max_requests > 0: + logger.warning("--max-requests is ignored when worker recycling is unavailable.") spawn_process( directories, request_headers, routes, global_middlewares, @@ response_headers, excluded_response_headers_paths, client_timeout, keep_alive_timeout, - max_requests, + None, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@robyn/processpool.py` around lines 135 - 151, In the branch guarded by "if sys.platform.startswith('win32') or processes == 1" (the inline worker path), stop passing the recycling/max-requests behavior into spawn_process; call spawn_process without forwarding max_requests (or explicitly pass None/0) so the single inline server cannot self-terminate once the threshold is hit. Update the spawn_process invocation in that branch (referenced as spawn_process) to omit/neutralize the max_requests argument while leaving the other parameters unchanged so recycling only applies to the supervisor-managed pool.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@robyn/argument_parser.py`:
- Around line 87-94: The --max-requests argument currently accepts non-positive
integers which silently become "disabled" later; update the parser for the
argument (the parser.add_argument call that sets dest="max_requests") to
validate at parse time by rejecting values <= 0: implement a custom type or
argparse check that raises an argparse.ArgumentTypeError for values <= 0 so the
CLI fails fast with a clear error message (e.g., "max-requests must be a
positive integer"); ensure the validation is applied where max_requests is
defined so downstream code (e.g., logic in process pool that expects > 0) no
longer sees ambiguous inputs.
In `@robyn/processpool.py`:
- Around line 84-109: The worker-recycling loop is replacing dead Process
objects in process_pool without reaping them, causing zombie processes; modify
the loop in the block that iterates over process_pool (where Process is
constructed with target=spawn_process and socket.try_clone()) to call
process.join() on the exited process before creating/assigning new_process and
replacing process_pool[i]; ensure you still log the replacement (logger.info)
and then start new_process and assign it into process_pool[i] after the join
completes.
In `@src/server.rs`:
- Around line 490-491: The request counter (REQUEST_COUNT.fetch_add(...)) is
only called inside index(), so requests served by Files handlers, WebSocket
routes, or the default_service const-route path bypass it and under-report via
get_request_count(); move this increment into an app-wide Actix middleware (or
wrapper) that executes for every incoming HTTP request before route dispatch.
Implement a middleware struct (or use actix_web::middleware::Logger-style
middleware) that calls REQUEST_COUNT.fetch_add(1, Relaxed) for each request,
register it on the App (so it wraps all services including default_service,
Files handlers, and WS routes), and remove the manual increment from index() to
avoid double-counting.
---
Outside diff comments:
In `@robyn/processpool.py`:
- Around line 135-151: In the branch guarded by "if
sys.platform.startswith('win32') or processes == 1" (the inline worker path),
stop passing the recycling/max-requests behavior into spawn_process; call
spawn_process without forwarding max_requests (or explicitly pass None/0) so the
single inline server cannot self-terminate once the threshold is hit. Update the
spawn_process invocation in that branch (referenced as spawn_process) to
omit/neutralize the max_requests argument while leaving the other parameters
unchanged so recycling only applies to the supervisor-managed pool.
🪄 Autofix (Beta)
❌ Autofix failed (check again to retry)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b4ae53f6-8ba4-4604-83fe-879e3fa30025
📒 Files selected for processing (6)
robyn/__init__.pyrobyn/argument_parser.pyrobyn/processpool.pyrobyn/robyn.pyisrc/lib.rssrc/server.rs
Replace `type=int` with a custom type callback that raises argparse.ArgumentTypeError for zero and negative values. Made-with: Cursor
… inline workers - Call process.join() on exited workers before spawning replacements to prevent zombie processes in the recycling loop. - Move REQUEST_COUNT increment from index() into an app-wide actix middleware so requests served by Files handlers, WebSocket routes, and the const-route fast path are all counted toward max_requests. - Omit max_requests when running in single-process/Windows mode where there is no supervisor to respawn the worker. Made-with: Cursor
The Python-side _check_max_requests callback (loop.call_later) was scheduled after server.start(), but start() blocks internally on event_loop.run_forever() and never returns — making lines 270-283 dead code that never executes. Move the check into a background task on the actix runtime where it has direct access to REQUEST_COUNT and can call server_handle.stop(true) for a graceful shutdown. This also eliminates the cross-runtime coordination problem between Python asyncio and Rust actix. Made-with: Cursor
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
robyn/processpool.py (1)
33-55:⚠️ Potential issue | 🟠 MajorNormalize non-positive
max_requestsbefore handing it to children.Line 82 treats recycling as enabled only for values
> 0, but Lines 54, 105, and 173 still forward the raw value into Rust. With0, each worker satisfiesREQUEST_COUNT >= limiton its first poll and shuts down immediately. Normalizemax_requeststoNoneunless it is strictly positive, and reuse that normalized value for both initial spawns and replacements.Minimal normalization
def run_processes( url: str, port: int, directories: List[Directory], @@ max_requests: Optional[int] = None, ) -> List[Process]: import time + effective_max_requests = ( + max_requests if max_requests is not None and max_requests > 0 else None + ) socket = SocketHeld(url, port) process_pool = init_processpool( @@ - max_requests, + effective_max_requests, ) @@ - if max_requests and max_requests > 0 and len(process_pool) > 0: + if effective_max_requests is not None and len(process_pool) > 0: while not shutting_down: for i, process in enumerate(process_pool): if not process.is_alive() and not shutting_down: @@ - max_requests, + effective_max_requests, ), ) def init_processpool( @@ max_requests: Optional[int] = None, ) -> List[Process]: process_pool: List = [] + effective_max_requests = ( + max_requests if max_requests is not None and max_requests > 0 else None + ) @@ - max_requests, + effective_max_requests, ), )Also applies to: 82-105, 155-174
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@robyn/processpool.py` around lines 33 - 55, Normalize the max_requests parameter to None unless it's strictly positive and use that normalized value everywhere you pass it to child processes: create a local normalized_max_requests = max_requests if max_requests and max_requests > 0 else None, then replace raw uses of max_requests when calling init_processpool and any subsequent spawn/replacement logic so children receive normalized_max_requests; update all call sites that currently forward max_requests (including the initial init_processpool invocation and worker respawn paths) to use normalized_max_requests and ensure the existing > 0 check (REQUEST_COUNT logic) stays consistent with this normalization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/server.rs`:
- Around line 340-355: The spawned task that checks REQUEST_COUNT currently
calls handle.stop(true) but never stops the Python event loop, leaving
run_forever() blocked; after handle.stop(true).await in that task, acquire the
Python GIL and call event_loop.call_method0("stop") (or call a shutdown helper
that calls event_loop.call_method0("stop") / event_loop.call_method0("close"))
so the blocking event_loop.call_method0("run_forever") can return, or
alternatively call std::process::exit(0) to terminate the worker; update the
spawned task (where server_handle/handle and REQUEST_COUNT are used) to perform
this additional stop/exit so the worker fully recycles.
---
Outside diff comments:
In `@robyn/processpool.py`:
- Around line 33-55: Normalize the max_requests parameter to None unless it's
strictly positive and use that normalized value everywhere you pass it to child
processes: create a local normalized_max_requests = max_requests if max_requests
and max_requests > 0 else None, then replace raw uses of max_requests when
calling init_processpool and any subsequent spawn/replacement logic so children
receive normalized_max_requests; update all call sites that currently forward
max_requests (including the initial init_processpool invocation and worker
respawn paths) to use normalized_max_requests and ensure the existing > 0 check
(REQUEST_COUNT logic) stays consistent with this normalization.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 469d3a95-d214-437d-ac1f-5985479fab01
📒 Files selected for processing (3)
robyn/processpool.pyrobyn/robyn.pyisrc/server.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- robyn/robyn.pyi
Three bugs fixed: 1. The processes==1 path never passed max_requests to spawn_process, so the Rust server never knew about the request limit. 2. The processes==1 path ran spawn_process inline (not forked), so process_pool was empty and the parent recycling loop never ran. Now when max_requests is set, even processes==1 goes through the fork path so the parent can watch and respawn the child. 3. After the Actix server stops (handle.stop(true)), the Python event loop (loop.run_forever()) never returns because it has no knowledge of the Rust thread stopping. Added std::process::exit(0) in the Rust server thread after server.await completes when max_requests is configured, so the child process exits cleanly and the parent can detect it and spawn a replacement. Made-with: Cursor
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@robyn/processpool.py`:
- Around line 82-110: The truthy checks on max_requests are wrong: normalize by
testing for None explicitly so a value of 0 does not disable supervision;
replace conditions like "if max_requests and max_requests > 0 and
len(process_pool) > 0:" with "if max_requests is not None and len(process_pool)
> 0:" (and similarly update the other branch around the Windows/inline path at
the block referencing process_pool and spawn_process) so recycling supervision
is enabled for any non-None max_requests value.
- Around line 82-110: The parent loop currently sleeps for 5 seconds which
delays detection of exited workers and causes outages; change the recycling loop
so exited processes are reaped and replaced immediately by calling
process.join() as soon as process.is_alive() is False (or use
process.join(timeout=0) to non-blockingly reap), spawn the replacement Process
(using Process(target=spawn_process, ...), socket.try_clone(), etc.) right away,
and reduce or remove the coarse time.sleep(5) (replace with a short poll
interval like 0.1s if needed) so worker replacement happens immediately instead
of waiting on the 5s boundary.
In `@src/server.rs`:
- Around line 344-358: The periodic 5s polling allows a hot worker to far
overshoot max_requests; instead, detect the threshold immediately when
incrementing REQUEST_COUNT: in the code path that does
REQUEST_COUNT.fetch_add(1, Relaxed) (the one-shot transition referenced), check
the returned previous value and if previous + 1 >= max_requests (or previous <
max_requests && previous + 1 == max_requests) call
server_handle.clone().stop(true).await (or otherwise schedule an immediate
shutdown) so the worker begins recycling right when the limit is crossed; keep
or remove the existing spawn loop but ensure the atomic fetch_add branch is the
primary immediate trigger.
- Around line 361-366: The spawned Actix thread must not call
std::process::exit(0) when max_requests.is_some(); instead signal the Python
event loop to perform a graceful shutdown and return from the async block so the
process can continue cleanup. Replace the hard exit in the block that awaits
server (the code referencing server.await and max_requests) with logic that
notifies the Python shutdown mechanism used by your project (e.g., trigger the
existing shutdown_handler or send the same inter-thread signal the Python side
expects) and allow the spawned task to finish normally; ensure the Python-side
shutdown_handler and its event loop cleanup in robyn/processpool.py still run
and that any Events.SHUTDOWN handlers are invoked.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 58f09024-f20d-497f-a17d-52282788a423
📒 Files selected for processing (2)
robyn/processpool.pysrc/server.rs
| if max_requests and max_requests > 0 and len(process_pool) > 0: | ||
| while not shutting_down: | ||
| for i, process in enumerate(process_pool): | ||
| if not process.is_alive() and not shutting_down: | ||
| process.join() | ||
| logger.info("Worker process exited (recycling), spawning replacement.") | ||
| copied_socket = socket.try_clone() | ||
| new_process = Process( | ||
| target=spawn_process, | ||
| args=( | ||
| directories, | ||
| request_headers, | ||
| routes, | ||
| global_middlewares, | ||
| route_middlewares, | ||
| web_sockets, | ||
| event_handlers, | ||
| copied_socket, | ||
| workers, | ||
| response_headers, | ||
| excluded_response_headers_paths, | ||
| client_timeout, | ||
| keep_alive_timeout, | ||
| max_requests, | ||
| ), | ||
| ) | ||
| new_process.start() | ||
| process_pool[i] = new_process | ||
| time.sleep(5) |
There was a problem hiding this comment.
Normalize max_requests before these truthiness branches.
Line 82 and Line 136 use truthiness, but src/server.rs enables recycling for any non-None value. 0 therefore disables supervision here while still enabling immediate shutdown in Rust. Line 136 also forces all Windows runs through the inline path, so there is no supervisor to replace that exit. Both cases can turn recycling into a permanent shutdown.
💡 Minimal guardrail
def run_processes(
@@
keep_alive_timeout: int = 20,
max_requests: Optional[int] = None,
) -> List[Process]:
+ if max_requests is not None and max_requests <= 0:
+ max_requests = None
@@
- if max_requests and max_requests > 0 and len(process_pool) > 0:
+ if max_requests is not None and len(process_pool) > 0:
while not shutting_down:
... def init_processpool(
@@
keep_alive_timeout: int = 20,
max_requests: Optional[int] = None,
) -> List[Process]:
+ if max_requests is not None and max_requests <= 0:
+ max_requests = None
+ if sys.platform.startswith("win32") and max_requests is not None:
+ raise RuntimeError("--max-requests is not supported on Windows yet")
@@
- if sys.platform.startswith("win32") or (processes == 1 and not max_requests):
+ if sys.platform.startswith("win32") or (processes == 1 and max_requests is None):
spawn_process(...)Also applies to: 133-152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@robyn/processpool.py` around lines 82 - 110, The truthy checks on
max_requests are wrong: normalize by testing for None explicitly so a value of 0
does not disable supervision; replace conditions like "if max_requests and
max_requests > 0 and len(process_pool) > 0:" with "if max_requests is not None
and len(process_pool) > 0:" (and similarly update the other branch around the
Windows/inline path at the block referencing process_pool and spawn_process) so
recycling supervision is enabled for any non-None max_requests value.
Worker replacement starts late enough to cause outages.
The parent only notices exited workers on the next 5-second sleep boundary at Line 110. In processes=1 mode that is a full outage on every recycle; if several workers hit the limit together, it can briefly drop the whole server. Please wait on process sentinels, or at least poll much more frequently, so replacements start immediately.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@robyn/processpool.py` around lines 82 - 110, The parent loop currently sleeps
for 5 seconds which delays detection of exited workers and causes outages;
change the recycling loop so exited processes are reaped and replaced
immediately by calling process.join() as soon as process.is_alive() is False (or
use process.join(timeout=0) to non-blockingly reap), spawn the replacement
Process (using Process(target=spawn_process, ...), socket.try_clone(), etc.)
right away, and reduce or remove the coarse time.sleep(5) (replace with a short
poll interval like 0.1s if needed) so worker replacement happens immediately
instead of waiting on the 5s boundary.
| if let Some(limit) = max_requests { | ||
| let handle = server_handle.clone(); | ||
| actix_web::rt::spawn(async move { | ||
| loop { | ||
| actix_web::rt::time::sleep(std::time::Duration::from_secs(5)).await; | ||
| if REQUEST_COUNT.load(Relaxed) >= limit { | ||
| log::info!( | ||
| "Max requests ({}) reached, worker shutting down for recycling.", | ||
| limit | ||
| ); | ||
| handle.stop(true).await; | ||
| break; | ||
| } | ||
| } | ||
| }); |
There was a problem hiding this comment.
The recycle trigger is enforced far too late under load.
Line 348 only wakes every 5 seconds, so a hot worker can serve far more than max_requests before Line 354 starts shutdown. That weakens the leak-containment goal of this feature. You already have the exact threshold crossing at Line 90 via fetch_add; using that one-shot transition would keep the limit meaningful.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server.rs` around lines 344 - 358, The periodic 5s polling allows a hot
worker to far overshoot max_requests; instead, detect the threshold immediately
when incrementing REQUEST_COUNT: in the code path that does
REQUEST_COUNT.fetch_add(1, Relaxed) (the one-shot transition referenced), check
the returned previous value and if previous + 1 >= max_requests (or previous <
max_requests && previous + 1 == max_requests) call
server_handle.clone().stop(true).await (or otherwise schedule an immediate
shutdown) so the worker begins recycling right when the limit is crossed; keep
or remove the existing spawn loop but ensure the atomic fetch_add branch is the
primary immediate trigger.
| server.await.unwrap(); | ||
|
|
||
| if max_requests.is_some() { | ||
| log::info!("Actix server stopped, exiting worker process for recycling."); | ||
| std::process::exit(0); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Show the recycle exit path and the only Python-side shutdown/cleanup locations.
rg -n -C2 'std::process::exit\(0\)|shutdown_handler|run_forever|loop.close' src/server.rs robyn/processpool.pyRepository: sparckles/Robyn
Length of output: 2553
🏁 Script executed:
#!/bin/bash
# Get broader context around the exit call and the event loop handling
sed -n '355,380p' src/server.rs | cat -nRepository: sparckles/Robyn
Length of output: 1215
🏁 Script executed:
#!/bin/bash
# Also check the full structure of the run method to understand task spawning/awaiting
sed -n '150,175p' src/server.rs | cat -nRepository: sparckles/Robyn
Length of output: 1298
🏁 Script executed:
#!/bin/bash
# Get full context of the run method structure, including what spawns/awaits the async block
sed -n '140,200p' src/server.rs | cat -nRepository: sparckles/Robyn
Length of output: 2774
🏁 Script executed:
#!/bin/bash
# Check what happens in the Python processpool.py around the run_forever and finally block
sed -n '265,285p' robyn/processpool.py | cat -nRepository: sparckles/Robyn
Length of output: 426
Hard exit(0) from the spawned Actix thread terminates the process before Python shutdown handlers and event loop cleanup can execute.
When std::process::exit(0) is called on line 365 within the spawned thread's async block, it kills the entire process immediately. This prevents the Python event loop at line 370 from ever running, which means:
- The
shutdown_handlercheck at line 372 is never reached - The Python event loop cleanup in
robyn/processpool.pyline 276 (thefinallyblock withloop.close()) is skipped - Any registered
Events.SHUTDOWNhandlers are never invoked
A graceful recycle must signal shutdown to the Python event loop instead of terminating the process directly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server.rs` around lines 361 - 366, The spawned Actix thread must not call
std::process::exit(0) when max_requests.is_some(); instead signal the Python
event loop to perform a graceful shutdown and return from the async block so the
process can continue cleanup. Replace the hard exit in the block that awaits
server (the code referencing server.await and max_requests) with logic that
notifies the Python shutdown mechanism used by your project (e.g., trigger the
existing shutdown_handler or send the same inter-thread signal the Python side
expects) and allow the spawned task to finish normally; ensure the Python-side
shutdown_handler and its event loop cleanup in robyn/processpool.py still run
and that any Events.SHUTDOWN handlers are invoked.
Summary
--max-requestsCLI option (andConfig.max_requests) to recycle worker processes after N requests, similar to Gunicorn'smax_requests. Helps contain memory leaks from long-running Python processes.REQUEST_COUNT), incremented on every HTTP request in theindexhandler. Exposed to Python viaget_request_count().loop.call_later) stops the event loop when the threshold is exceeded, allowing the shutdown handler to fire and the process to exit cleanly.Test plan
max_requestsis not set)--max-requests 50 --processes 2, send 100+ requests, verify server stays alive and worker PIDs change--max-requestswith single process (no recycling, since single-process blocks inspawn_process)get_request_count()returns accurate count from PythonMade with Cursor
Summary by CodeRabbit
New Features
Improvements