Summary
When a RunningHub task is cancelled or removed on the server side, comfyui/runninghub_executor.py::_wait_for_task_completion enters an infinite retry loop that the user cannot escape with Ctrl+C in PowerShell / Windows Terminal. The process must be killed via Task Manager.
The bug is two layers of indiscriminate retry stacked on top of each other.
Version: comfykit 0.1.12
Platform: Windows 11 / PowerShell, Python 3.11
Reproduction
- Submit a workflow via
RunningHubExecutor and let it start polling status.
- Open the RunningHub web UI and cancel the task.
- The local process now logs
RunningHub API error: APIKEY_TASK_NOT_FOUND repeatedly forever.
Ctrl+C in PowerShell does not stop the process.
Log evidence
16:02:36 WARNING runninghub_client.py:115 _make_request - Request failed (attempt 1/4): RunningHub API error: APIKEY_TASK_NOT_FOUND. Retrying in 1s...
16:02:37 WARNING runninghub_client.py:115 _make_request - Request failed (attempt 2/4): ... Retrying in 2s...
16:02:40 WARNING runninghub_client.py:115 _make_request - Request failed (attempt 3/4): ... Retrying in 4s...
16:02:44 ERROR runninghub_client.py:118 _make_request - Request failed after 4 attempts: APIKEY_TASK_NOT_FOUND
16:02:44 ERROR runninghub_client.py:293 query_task_status - Failed to query task status...
16:02:44 ERROR runninghub_executor.py:371 _wait_for_task_completion - Error checking task status ...
16:02:46 WARNING runninghub_client.py:115 _make_request - Request failed (attempt 1/4): ... ← outer while True loops back
16:02:48 WARNING runninghub_client.py:115 _make_request - Request failed (attempt 2/4): ...
... (forever)
Each outer cycle is ~9 seconds (1+2+4 inner backoff + 2s outer sleep), and never terminates.
Root cause #1 — outer infinite loop (primary)
comfyui/runninghub_executor.py L321-373:
# If both are None, no timeout limit (wait indefinitely)
if max_wait_time is None:
max_wait_time = self.timeout
...
while True:
elapsed_time = time.time() - start_time
if max_wait_time is not None and elapsed_time >= max_wait_time:
break
try:
status_info = await self.client.query_task_status(task_id)
...
except Exception as e:
logger.error(f""Error checking task status {task_id}: {e}"", exc_info=True)
await asyncio.sleep(check_interval)
continue # ← swallows ALL errors, including terminal ones, and loops forever
TASK_NOT_FOUND / APIKEY_TASK_NOT_FOUND is a terminal state: the task is gone from the server and will never come back. It must not be retried — the executor should return an ExecuteResult(status=""error"") immediately.
When max_wait_time is None (the documented ""wait indefinitely"" default), this becomes a true infinite loop with no exit condition.
Root cause #2 — inner retry over business errors (amplifier)
comfyui/runninghub_client.py L82-120 in _make_request:
except Exception as e:
last_exception = e
...
if attempt < self.retry_count:
wait_time = 2 ** attempt
logger.warning(f""Request failed (attempt {attempt + 1}/{self.retry_count + 1}): {e}. Retrying in {wait_time}s..."")
await asyncio.sleep(wait_time)
The retry handler treats every Exception the same, including the business error raised at L98:
raise Exception(f""RunningHub API error: {result.get('msg', 'Unknown error')}"")
Permanent business errors like APIKEY_TASK_NOT_FOUND, APIKEY_INVALID, WORKFLOW_NOT_FOUND are retried 4 times for no benefit, wasting ~7 seconds and quota per outer cycle.
Root cause #3 — Ctrl+C cannot break the loop on Windows
The combination of:
- bare
except Exception (which doesn't catch KeyboardInterrupt, but...)
asyncio.sleep inside the inner loop
- aiohttp
session.request and Windows asyncio's known signal-delivery issues
means SIGINT delivery is unreliable while the executor is sleeping/awaiting inside the nested loops. Practically, Ctrl+C in PowerShell does nothing and the user must kill the process from Task Manager. Even if Python eventually delivers KeyboardInterrupt, the long backoff windows make the program feel unresponsive.
Suggested fix
A. Classify terminal errors in the executor
TERMINAL_ERROR_TOKENS = (
""TASK_NOT_FOUND"",
""APIKEY_TASK_NOT_FOUND"",
""APIKEY_INVALID"",
""TASK_CANCELLED"",
""WORKFLOW_NOT_FOUND"",
)
except Exception as e:
err_str = str(e)
if any(tok in err_str for tok in TERMINAL_ERROR_TOKENS):
logger.error(f""Task {task_id} terminated remotely: {e}"")
return ExecuteResult(
status=""error"",
prompt_id=task_id,
msg=f""Task cancelled or not found: {e}"",
)
# Transient — keep polling
logger.warning(f""Transient error checking task status {task_id}: {e}"")
await asyncio.sleep(check_interval)
continue
A cleaner long-term solution: define typed exceptions in runninghub_client.py (e.g. RunningHubTerminalError, RunningHubTransientError) instead of raising bare Exception, and have the executor catch them separately.
B. Don't retry permanent business errors in _make_request
Inspect result.get('code') / result.get('msg') before raising; if it's a known permanent error, raise a RunningHubTerminalError and have _make_request re-raise it without retrying.
C. Add a hard ceiling on consecutive identical errors
Even with the above, defensively break out of _wait_for_task_completion if the same error message has been seen N times in a row (e.g. 5). This protects against unknown future error codes.
D. Cooperative cancellation / signal handling
Document that callers should use asyncio.run with a SIGINT handler that cancels the running task, or wrap the executor call with a cancellable task. Current behavior on Windows + PowerShell is effectively unkillable by Ctrl+C, which is a major UX issue.
Why this matters
Any user who cancels a task from the RunningHub web UI (a totally normal action) ends up with a hung local process that fills the log and cannot be stopped without Task Manager. Discovered while running the Pixelle-Video project, which depends on comfykit.
Happy to send a PR if the maintainer agrees with the direction above.
Summary
When a RunningHub task is cancelled or removed on the server side,
comfyui/runninghub_executor.py::_wait_for_task_completionenters an infinite retry loop that the user cannot escape withCtrl+Cin PowerShell / Windows Terminal. The process must be killed via Task Manager.The bug is two layers of indiscriminate retry stacked on top of each other.
Version: comfykit 0.1.12
Platform: Windows 11 / PowerShell, Python 3.11
Reproduction
RunningHubExecutorand let it start polling status.RunningHub API error: APIKEY_TASK_NOT_FOUNDrepeatedly forever.Ctrl+Cin PowerShell does not stop the process.Log evidence
Each outer cycle is ~9 seconds (1+2+4 inner backoff + 2s outer sleep), and never terminates.
Root cause #1 — outer infinite loop (primary)
comfyui/runninghub_executor.pyL321-373:TASK_NOT_FOUND/APIKEY_TASK_NOT_FOUNDis a terminal state: the task is gone from the server and will never come back. It must not be retried — the executor should return anExecuteResult(status=""error"")immediately.When
max_wait_timeisNone(the documented ""wait indefinitely"" default), this becomes a true infinite loop with no exit condition.Root cause #2 — inner retry over business errors (amplifier)
comfyui/runninghub_client.pyL82-120 in_make_request:The retry handler treats every
Exceptionthe same, including the business error raised at L98:Permanent business errors like
APIKEY_TASK_NOT_FOUND,APIKEY_INVALID,WORKFLOW_NOT_FOUNDare retried 4 times for no benefit, wasting ~7 seconds and quota per outer cycle.Root cause #3 — Ctrl+C cannot break the loop on Windows
The combination of:
except Exception(which doesn't catchKeyboardInterrupt, but...)asyncio.sleepinside the inner loopsession.requestand Windows asyncio's known signal-delivery issuesmeans SIGINT delivery is unreliable while the executor is sleeping/awaiting inside the nested loops. Practically,
Ctrl+Cin PowerShell does nothing and the user must kill the process from Task Manager. Even if Python eventually deliversKeyboardInterrupt, the long backoff windows make the program feel unresponsive.Suggested fix
A. Classify terminal errors in the executor
A cleaner long-term solution: define typed exceptions in
runninghub_client.py(e.g.RunningHubTerminalError,RunningHubTransientError) instead of raising bareException, and have the executor catch them separately.B. Don't retry permanent business errors in
_make_requestInspect
result.get('code')/result.get('msg')before raising; if it's a known permanent error, raise aRunningHubTerminalErrorand have_make_requestre-raise it without retrying.C. Add a hard ceiling on consecutive identical errors
Even with the above, defensively break out of
_wait_for_task_completionif the same error message has been seen N times in a row (e.g. 5). This protects against unknown future error codes.D. Cooperative cancellation / signal handling
Document that callers should use
asyncio.runwith a SIGINT handler that cancels the running task, or wrap the executor call with a cancellable task. Current behavior on Windows + PowerShell is effectively unkillable byCtrl+C, which is a major UX issue.Why this matters
Any user who cancels a task from the RunningHub web UI (a totally normal action) ends up with a hung local process that fills the log and cannot be stopped without Task Manager. Discovered while running the Pixelle-Video project, which depends on comfykit.
Happy to send a PR if the maintainer agrees with the direction above.