Commit 1008af0
committed
fix(ai-proxy): yield to scheduler in streaming SSE loop to avoid worker CPU starvation
When an upstream LLM emits SSE chunks in a tight burst (e.g. a model
hallucinating and producing tokens at 100+ per second), the streaming
loop in parse_streaming_response can run for an extended period without
yielding to the nginx scheduler.
body_reader() (cosocket recv) only yields when the recv buffer is
empty; if the kernel has already buffered several chunks, successive
calls return immediately. ngx.flush(true) only yields when the
downstream send buffer is full; a fast client drains immediately. So
neither end of the loop guarantees a yield, and the SSE coroutine ends
up monopolizing the worker — starving health checks, concurrent
requests, and timer callbacks on the same worker.
Add an explicit ngx.sleep(0) at the end of each loop iteration. This
is a no-op timer that just yields the current coroutine, allowing
other ready coroutines to run. The cost is negligible: in normal AI
traffic chunks already arrive with inter-chunk gaps so an extra yield
per chunk is invisible; in burst scenarios it caps per-coroutine
runtime to one chunk's worth of work.1 parent 4223a07 commit 1008af0
1 file changed
Lines changed: 7 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
361 | 361 | | |
362 | 362 | | |
363 | 363 | | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
364 | 371 | | |
365 | 372 | | |
366 | 373 | | |
| |||
0 commit comments