fix: prod robustness#68
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a graceful shutdown mechanism for the orchestrator and virtual bots using promise tracking and timeouts. It also introduces a dormancy threshold for bots with consecutive task errors, adds validation for the demo time scale, and optimizes pool exhaustion logging. Feedback suggests explicitly cloning the bot list during shutdown to ensure iteration safety and removing a redundant error check in the bot's top-up logic.
| const loops = this.#bots.map((b) => { | ||
| b.stop(); | ||
| return b.loopPromise; | ||
| }); |
There was a problem hiding this comment.
The comment mentions taking a "snapshot" of the bot list, but the code iterates over this.#bots directly. While safe in this specific implementation because b.stop() is synchronous and doesn't trigger immediate removal, it is better practice to actually snapshot the array (e.g., using the spread operator) to ensure the set of bots being signaled is isolated from any concurrent mutations to the collection during iteration.
| const loops = this.#bots.map((b) => { | |
| b.stop(); | |
| return b.loopPromise; | |
| }); | |
| const loops = [...this.#bots].map((b) => { | |
| b.stop(); | |
| return b.loopPromise; | |
| }); |
| if (this.#consecutiveTaskErrors >= MAX_CONSECUTIVE_TASK_ERRORS) { | ||
| this.#goDormant("task_error_threshold"); | ||
| break; | ||
| } | ||
| this.#scheduleNextTopUp(Date.now()); |
There was a problem hiding this comment.
This error threshold check is redundant in the top-up block. The MAX_CONSECUTIVE_TASK_ERRORS counter is only incremented within #runTask(), and it is already checked immediately after that call (lines 208-211). Since #runTopUp() does not modify the error counter, the condition cannot become true during this block if it wasn't already true (and caught) previously.
this.#scheduleNextTopUp(Date.now());
No description provided.