Conversation
Added APIStatusError handling to manage transient errors during API calls.
There was a problem hiding this comment.
Pull request overview
This PR adds special handling for HTTP 499 status errors from the OpenAI API, treating them as transient errors that should be retried similar to timeout errors.
Changes:
- Imports
APIStatusErrorfrom the openai library to catch API status errors - Adds retry logic for 499 status code errors (client closed request/upstream cancelled)
- Updates error logging message from "Max retries for APITimeoutError reached" to the more generic "Max API retries reached"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except APIStatusError as e: | ||
| # Retry transient “client closed request / upstream cancelled” style errors | ||
| if getattr(e, "status_code", None) != 499: | ||
| raise # propagate non-499 errors | ||
| # 499: retry | ||
| if not max_retry: | ||
| logging.error(f"Max API retries reached") | ||
| raise | ||
| max_retry -= 1 |
There was a problem hiding this comment.
The new APIStatusError exception handler should be added to the outer exception handling block (lines 361-381) to provide consistent error handling and user feedback. Currently, if a non-499 APIStatusError is raised and propagated from the inner try-except block, it won't be caught by any outer handler, potentially resulting in an unhandled exception. Consider adding an exception handler for APIStatusError similar to the existing handlers for BadRequestError and APITimeoutError.
There was a problem hiding this comment.
Yea probably worth adding a handler in the outer scope like APITimeoutError
| except APIStatusError as e: | ||
| # Retry transient “client closed request / upstream cancelled” style errors | ||
| if getattr(e, "status_code", None) != 499: | ||
| raise # propagate non-499 errors | ||
| # 499: retry | ||
| if not max_retry: | ||
| logging.error(f"Max API retries reached") | ||
| raise | ||
| max_retry -= 1 |
There was a problem hiding this comment.
The new APIStatusError retry logic for 499 status codes lacks test coverage. Since the repository includes tests for other functionality (test_api_endpoint_config.py, test_cli_parser.py, test_yaml_parser.py), consider adding tests to verify that 499 errors are properly retried and that non-499 APIStatusErrors are propagated correctly.
|
smoke test |
Deployment Triggered 🚀p-, started a branch deployment to smoketest (branch: You can watch the progress here 🔗 Details{
"type": "branch",
"environment": {
"name": "smoketest",
"url": null
},
"deployment": {
"timestamp": "2026-01-20T16:27:06.876Z",
"logs": "https://github.com/GitHubSecurityLab/seclab-taskflow-agent/actions/runs/21179132080"
},
"git": {
"branch": "p--handle-499",
"commit": "c38b939e7ecb6d588fae9f5f5e223a346feca4cc",
"verified": true,
"committer": "web-flow",
"html_url": "https://github.com/GitHubSecurityLab/seclab-taskflow-agent/commit/c38b939e7ecb6d588fae9f5f5e223a346feca4cc"
},
"context": {
"actor": "p-",
"noop": false,
"fork": false,
"comment": {
"created_at": "2026-01-20T16:26:48Z",
"updated_at": "2026-01-20T16:26:48Z",
"body": "smoke test",
"html_url": "https://github.com/GitHubSecurityLab/seclab-taskflow-agent/pull/132#issuecomment-3773810578"
}
},
"parameters": {
"raw": null,
"parsed": null
}
} |
Deployment Results ✅p- successfully deployed branch Details{
"status": "success",
"environment": {
"name": "smoketest",
"url": null
},
"deployment": {
"id": 3666688678,
"timestamp": "2026-01-20T16:27:08.716Z",
"logs": "https://github.com/GitHubSecurityLab/seclab-taskflow-agent/actions/runs/21179132080",
"duration": 2
},
"git": {
"branch": "p--handle-499",
"commit": "c38b939e7ecb6d588fae9f5f5e223a346feca4cc",
"verified": true
},
"context": {
"actor": "p-",
"noop": false,
"fork": false
},
"reviews": {
"count": 1,
"decision": "APPROVED"
},
"parameters": {
"raw": null,
"parsed": null
}
} |
|
smoke test |
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except APIStatusError as e: | ||
| await render_model_output(f"** 🤖❗ API Status Error: {e}\n", | ||
| async_task=async_task, | ||
| task_id=task_id) | ||
| logging.error(f"API Status Error: Status={e.status_code}, Response={e.response}") |
There was a problem hiding this comment.
The APIStatusError exception handler is placed before the more specific exception types (BadRequestError, APITimeoutError, RateLimitError) in the exception hierarchy. Since these specific exceptions inherit from APIStatusError, this catch block will intercept them before they reach their intended handlers above (lines 372-381). The APIStatusError handler should be moved to the end of the exception chain, after all the more specific handlers.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/seclab_taskflow_agent/main.py:357
- In the APITimeoutError retry loop, the max-retries case now logs with
logging.errorand immediately re-raises, which drops the original stack trace/context compared tologging.exceptionand also loses the actual exception message (since it’s not captured asas e). Capture the exception and log with traceback (or include the exception details) before raising so failures remain diagnosable.
except APITimeoutError:
if not max_retry:
logging.error(f"Max API retries reached")
raise
max_retry -= 1
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| await render_model_output(f"** 🤖❗ API Status Error: {e}\n", | ||
| async_task=async_task, | ||
| task_id=task_id) | ||
| logging.error(f"API Status Error: Status={e.status_code}, Response={e.response}") |
There was a problem hiding this comment.
logging.error(f"API Status Error: Status={e.status_code}, Response={e.response}") logs the full response object, which can be very large and may include sensitive payloads. Prefer logging a minimal, stable subset (e.g., status_code + request id / error type/message) to avoid leaking content and to keep logs bounded.
| logging.error(f"API Status Error: Status={e.status_code}, Response={e.response}") | |
| logging.error( | |
| "API Status Error: status=%s, message=%s", | |
| getattr(e, "status_code", "unknown"), | |
| getattr(e, "message", str(e)), | |
| ) |
| except APIStatusError as e: | ||
| # Retry transient “client closed request / upstream cancelled” style errors | ||
| if getattr(e, "status_code", None) != 499: | ||
| raise # propagate non-499 errors | ||
| # 499: retry | ||
| if not max_retry: | ||
| logging.error(f"Max API retries reached") | ||
| raise | ||
| max_retry -= 1 |
There was a problem hiding this comment.
In the APIStatusError(499) retry path, the max-retries case logs with logging.error and re-raises without preserving traceback/context. Consider logging with traceback (or include exception details) before raising so repeated 499 failures are diagnosable.
This issue also appears on line 353 of the same file.
| from agents import Tool, RunContextWrapper, TContext, Agent | ||
| from openai import BadRequestError, APITimeoutError, RateLimitError, APIStatusError | ||
| from agents.mcp import MCPServerSse, MCPServerStdio, MCPServerStreamableHttp, create_static_tool_filter | ||
| from dotenv import find_dotenv, load_dotenv | ||
| from openai import APITimeoutError, BadRequestError, RateLimitError |
There was a problem hiding this comment.
Duplicate imports were introduced: Agent/RunContextWrapper/TContext/Tool are imported twice (line 15 and 21) and OpenAI exception classes are imported twice (line 22 and 25). Consolidate into a single import per module to avoid lint failures and keep imports maintainable.
| from agents import Tool, RunContextWrapper, TContext, Agent | |
| from openai import BadRequestError, APITimeoutError, RateLimitError, APIStatusError | |
| from agents.mcp import MCPServerSse, MCPServerStdio, MCPServerStreamableHttp, create_static_tool_filter | |
| from dotenv import find_dotenv, load_dotenv | |
| from openai import APITimeoutError, BadRequestError, RateLimitError | |
| from openai import BadRequestError, APITimeoutError, RateLimitError, APIStatusError | |
| from agents.mcp import MCPServerSse, MCPServerStdio, MCPServerStreamableHttp, create_static_tool_filter | |
| from dotenv import find_dotenv, load_dotenv |
| await render_model_output(f"** 🤖❗ Timeout Error: {e}\n", | ||
| async_task=async_task, | ||
| task_id=task_id) | ||
| logging.error(f"Bad Request: {e}") |
There was a problem hiding this comment.
The APITimeoutError outer handler logs Bad Request: {e} even though this is a timeout path. This is misleading when debugging and in logs/telemetry; update the log message to reflect a timeout (and consider logging.exception if you want traceback).
| logging.error(f"Bad Request: {e}") | |
| logging.exception("Timeout Error") |
|
shelving this fix. Never had that issue again like this. And the fix is too specific. |
No description provided.