frontend: stream /backend/update/ response to prevent Apache 504#4280
frontend: stream /backend/update/ response to prevent Apache 504#4280xsuchy wants to merge 1 commit into
Conversation
|
Disclaimer, I actually did not try it as I do not have the big repo that I can delete. |
There was a problem hiding this comment.
Code Review
This pull request introduces response streaming to the /update/ endpoint to prevent Apache 504 timeouts during potentially slow database operations, such as cascading deletes. The feedback suggests improving the effectiveness of this mechanism by yielding an initial newline at the start of the stream and yielding again immediately before the database commit. Additionally, the reviewer recommended adding validation to raise a ValueError if build IDs are missing and refactoring list constructions to use standard for-loops for consistency with the existing codebase.
| def _stream(): | ||
| result = {} | ||
| for typ, logic_cls in [("actions", actions_logic.ActionsLogic), | ||
| ("builds", BuildsLogic)]: | ||
| if typ not in request_data: | ||
| continue | ||
| to_update = {} | ||
| for obj in request_data[typ]: | ||
| to_update[obj["id"]] = obj | ||
| existing = {} | ||
| for obj in logic_cls.get_by_ids(to_update.keys()).all(): | ||
| existing[obj.id] = obj | ||
| non_existing_ids = list(set(to_update.keys()) - set(existing.keys())) | ||
| for i, obj in existing.items(): | ||
| logic_cls.update_state_from_dict(obj, to_update[i]) | ||
| db.session.commit() | ||
| result[f"updated_{typ}_ids"] = list(existing.keys()) | ||
| result[f"non_existing_{typ}_ids"] = non_existing_ids | ||
| # Keep the connection alive while the DB commits. | ||
| yield "\n" | ||
| yield flask.json.dumps(result) |
There was a problem hiding this comment.
The current implementation yields a newline after the db.session.commit() call. If the cascading delete triggered by update_state_from_dict is slow, Apache might still time out before the first newline is ever sent.
To effectively prevent the 504 timeout, it is better to:
- Yield an initial newline at the very start of the generator to ensure headers are sent and the connection is marked as active.
- Yield the newline before the db.session.commit() call to reset the Apache timeout timer just before the potentially slow operation starts.
Additionally, the code has been updated to comply with repository rules: raising a ValueError for empty build_ids and using standard for-loops for list construction instead of shorthand constructors or comprehensions.
def _stream():
# Send an initial newline to open the connection and prevent timeout
# during the first potentially slow database operation.
yield "\n"
result = {}
for typ, logic_cls in [("actions", actions_logic.ActionsLogic),
("builds", BuildsLogic)]:
if typ not in request_data:
continue
to_update = {}
for obj in request_data[typ]:
to_update[obj["id"]] = obj
if typ == "builds" and not to_update:
raise ValueError("The list of build_ids cannot be empty.")
existing = {}
for obj in logic_cls.get_by_ids(to_update.keys()).all():
existing[obj.id] = obj
non_existing_ids = []
for id_key in to_update.keys():
if id_key not in existing:
non_existing_ids.append(id_key)
for i, obj in existing.items():
logic_cls.update_state_from_dict(obj, to_update[i])
# Keep the connection alive while the DB commits.
yield "\n"
db.session.commit()
updated_ids = []
for key in existing.keys():
updated_ids.append(key)
result[f"updated_{typ}_ids"] = updated_ids
result[f"non_existing_{typ}_ids"] = non_existing_ids
yield flask.json.dumps(result)References
- When querying for content, if the list of build_ids is empty, raise a ValueError as it is considered an invalid state.
- For consistency with similar functions, prefer using a standard for-loop to build a list over a list comprehension.
|
/packit test |
|
Trying to fix CI in #4285 |
|
Can you please rebase on top of |
When update_state_from_dict triggers slow cascading deletes (e.g. removing large projects), Apache times out waiting for the response. Stream the response so newlines are sent after each db.session.commit(), keeping the connection alive. Fixes: fedora-copr#2928 Assisted-By: Claude Opus 4.6 <noreply@anthropic.com>
|
rebased, let's see /packit test |
Are we really talking about connection timeouts (as configured in wsgi)? If so, I do not believe this streaming response would help here (in a sense that it would make the response fit the limit). |
|
WSGIDaemonProcess backend user=copr-fe group=copr-fe {{ develizer(2, 15) }} display-name="httpd backend" maximum-requests=8000 graceful-timeout=20 Hmm, does this mean we have the default 60s timeout? |
| result[f"updated_{typ}_ids"] = list(existing.keys()) | ||
| result[f"non_existing_{typ}_ids"] = non_existing_ids | ||
| # Keep the connection alive while the DB commits. | ||
| yield "\n" |
There was a problem hiding this comment.
what about streamed_json_array_response ?
There was a problem hiding this comment.
(sending just one byte is going to likely just fill some buffer somewhere, we should be flushing)
|
The real problem probably lies here: copr/frontend/coprs_frontend/coprs/logic/builds_logic.py Lines 1409 to 1410 in 59a9034 |
|
Because: copr/frontend/coprs_frontend/coprs/logic/complex_logic.py Lines 112 to 115 in 59a9034 |
|
Check the last comment in #4280 (the update was just a transitive symptom of DoSed frontend). Unless we still see those /update/ tracebacks in Sentry? |

When update_state_from_dict triggers slow cascading deletes (e.g. removing large projects), Apache times out waiting for the response. Stream the response so newlines are sent after each db.session.commit(), keeping the connection alive.
Fixes: #2928
Assisted-By: Claude Opus 4.6 noreply@anthropic.com