backend: handle & log pulp publishing errors by praiskup · Pull Request #4292 · fedora-copr/copr

praiskup · 2026-04-29T05:12:02Z

Relates: #4290

gemini-code-assist

Code Review

This pull request modifies the create_publication method to return a response object and adds error logging in both pulp.py and storage.py. However, the review feedback points out that the added if not response.ok checks are unreachable because the underlying SafeRequest raises a RequestError for non-200 status codes. The reviewer suggests using try...except blocks to handle these errors and recommends logging response.text instead of response.json() to avoid potential decoding issues when the error response is not valid JSON.

Hmm, reading the Gemini comments now, and may have a point

Some callers of create_publication() only checked whether the POST request succeeded, but not whether the asynchronous Pulp task actually completed. Add a publish() method that creates the publication and waits for the task, so failures are detected and logged consistently. Relates: fedora-copr#4290 Closes: fedora-copr#4292

praiskup · 2026-05-14T05:24:11Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the Pulp backend to improve asynchronous task management, notably by updating wait_for_finished_task to use monotonic time and introducing a publish helper method. The changes also include updating storage.py to better handle distribution and publication tasks. Reviewers identified several high-severity issues, including potential KeyError crashes when accessing task IDs directly, a regression in init_project where a necessary task wait was removed, and unreachable error-handling logic in pulp.py caused by the exception-raising behavior of SafeRequest.

gemini-code-assist · 2026-05-14T05:26:35Z

                distribution_name,
                repository_href,
            )
+            task = response.json()["task"]


Accessing ["task"] directly is risky here. If create_distribution returns a 201 Created status (which can happen depending on Pulp's configuration or version), the "task" key will be missing, causing a KeyError. Since this is inside a try...except RequestError block, a KeyError will not be caught and will crash the backend process.

Suggested change

task = response.json()["task"]

task = response.json().get("task")

if task and not self.client.wait_for_finished_task(

task, f"create distribution {distribution_name}",

):

return False

gemini-code-assist · 2026-05-14T05:26:35Z

-            # Wait until we can get the publication
-            if task := response.json().get("task"):
-                self.client.wait_for_finished_task(task)
            response = self.client.get_publication(repository_href)


This change introduces a regression. The previous implementation waited for the task returned by create_repository (at line 312) before proceeding to fetch the publication. By removing that wait, get_publication might be called before the repository is fully initialized in Pulp, potentially leading to empty results or errors.

Suggested change

response = self.client.get_publication(repository_href)

if self.devel:

if task := response.json().get("task"):

self.client.wait_for_finished_task(task, f"create repository {repository_name}")

response = self.client.get_publication(repository_href)

we are still waiting, a few lines ago (actually a bug-fix, since we were not always waiting), or what am I missing gemini?

gemini-code-assist · 2026-05-14T05:26:35Z

+        if not response.ok:
+            self.log.error("Failed to update distribution %s: %s",
+                           distribution, response.text)
+            return False


According to the repository rules, SafeRequest (used in self.send) raises a RequestError for non-200 status codes. Therefore, this if not response.ok check is unreachable for error responses. If the intention is to return False instead of allowing the exception to propagate, you should wrap the self.send call in a try...except RequestError block. Note that RequestError would need to be imported from copr_common.request.

References

When using SafeRequest, an explicit call to response.raise_for_status() is not needed as SafeRequest raises a RequestError for non-200 status codes.

This is probably right; I think I just copy-pasted dead code.

gemini-code-assist · 2026-05-14T05:26:35Z

+        Create a publication and wait for the task to finish.
+        """
+        response = self.create_publication(repository)
+        task = response.json()["task"]


This access is unsafe. If create_publication returns a 201 Created (or any 2xx status other than 202), the response might not contain a "task" key, leading to a KeyError. It is safer to check the status code or use .get("task").

Suggested change

task = response.json()["task"]

task = response.json().get("task")

if task and not self.wait_for_finished_task(

task, f"publish {repository}",

):

return False

This returns 202.

Do we need both create_publication and publish? I'd prefer to have just one. And if we need to wait on some places and not on some other places, I'd rather control it using wait=True or wait=False parameter.

I in general hate large methods, please double check (no big deal, I can merge those into one).

FrostyX

I don't want to be negative, but to be honest, I am not happy about the proposed changes.

My first problem is that I don't know what we are trying to solve in this PR. The referenced issue #4290 was IIRC caused by us accidentally removing the Pulp repository+distribution for that project, which was solved in PR #4293 and PR #4289 added logs which would help us understand that problem immediately. I consider that problem solved.

I don't see any other referenced issues, so I am a bit lost what we are doing, which leads to the feeling of "if it works, don't touch it".

We have two valid issues #4313 and #4317, and it seems we are trying to solve them here?

FrostyX · 2026-05-14T11:31:16Z

+        Create a publication and wait for the task to finish.
+        """
+        response = self.create_publication(repository)
+        task = response.json()["task"]


Do we need both create_publication and publish? I'd prefer to have just one. And if we need to wait on some places and not on some other places, I'd rather control it using wait=True or wait=False parameter.

FrostyX · 2026-05-14T11:46:38Z

+        response = self.send("PATCH", url, data)
+        if not response.ok:
+            self.log.error("Failed to update distribution %s: %s",
+                           distribution, response.text)


I originally had two reasons why I didn't do the error handling in pulp.py methods:

Sometimes it depends on the caller if non-200 is really an error or not. For example when creating a distribution and it already exists:

copr/backend/copr_backend/storage.py

Lines 335 to 347 in 55e4faa

try:

response = self.client.create_distribution(

distribution_name,

repository_href,

)

except RequestError as ex:

if "This field must be unique" not in ex.response.text:

self.log.error(

"Failed to create a Pulp distribution %s because of %s",

public_distribution_name,

ex.response.text,

)

return False

I wanted the pulp.py to be the thinnest possible layer over the Pulp API, so that we can easily reason about it with Pulp folks.

But maybe it makes sense to move the error handling here, I need to think about it.

This is just a copy-paste from elsewhere, I need to take one more look, too.

Ok, I just moved that no-op if not response.ok: check. While we really should be handling RequestError (it’s a pain to duplicate that pattern, nah), my main goal was to address the 202 cases. It felt inconsistent to ignore the initial request error in one place while handling the background task error elsewhere.

FrostyX · 2026-05-14T11:51:08Z

        self.log.info("Pulp: updating distribution %s", distribution)
-        return self.send("PATCH", url, data)
+        response = self.send("PATCH", url, data)
+        if not response.ok:


Will this ever happen? We are using SafeRequest so if response is not ok then it shoulud raise exception:

copr/common/copr_common/request.py

Lines 102 to 120 in 55e4faa

if response.status_code >= 500:

# Server error. Hopefully this is only temporary problem, we wan't

# to re-try, and wait till the server works again.

raise RequestRetryError(

"Request server error on {}: {} {}".format(

url, response.status_code, response.reason))

if response.status_code == 408:

# 408 Request Timeout - We couldn't send the request within

# a reasonable time-frame, so let's retry.

raise RequestRetryError(

f"Request Timeout {response.status_code}: {response.reason}: {url}")

if response.status_code >= 400:

# Client error. The mistake is on our side, it doesn't make sense

# to continue with retries.

msg = "Request client error on {}: {} {}".format(

url, response.status_code, response.reason)

raise RequestError(msg, response)

There is not an explicit:

if not response.ok: raise ...

but those status code ranges should cover all not ok cases, no?

Yeah, similar feedback by Gemini above.

Thinking once more, and I'm tempted to create another commit removing a dozen of checks for response.ok :D I'd prefer having a brief 1:1 discussing what is going on here.

FrostyX · 2026-05-14T12:01:41Z

-        response = self.wait_for_finished_task(task)
-        data = response.json()
-        if response.ok and data["state"] == "completed":
-            self.log.info("Successfully created Pulp publication of repository %s", repository)


I will start the thread right here, but it concerns multiple other places.
This PR seems to remove a lot of log lines (both info and error). I don't like it at all because I've already debugged many Pulp-related problems and the logging was either helpful or I added new logging that would help to debug the same issue in the future. This feels like a big regression.

Edit: Okay they were just moved to wait_for_finished_task, but I still don't know if it is worth the gamble.

Yeah, I just wanted to de-duplicate the waiting pattern, including logging success/failure cases.

Not sure if gambling ...? Perhaps we can freeze here until we have the PULP tests in TF?

praiskup · 2026-05-14T17:13:03Z

I don't want to be negative, but to be honest, I am not happy about the proposed changes.

Thank you for the feedback! ❤️
This was a chain of thoughts, going to try to explain the bugs/inconsistencies/duplicities I was trying to fix. Let's find some common ground, and perhaps split the PR.

praiskup · 2026-05-14T17:28:42Z

Promised motivation for commits:

backend: wait for Pulp publication tasks to finish -- one? of the callers didn't wait, and it was a bug imvho (namely forking)
backend: handle async Pulp distribution updates (202 responses) - is a bugfix, I noticed 202 and 201 are valid outputs of this API call
backend: consolidate Pulp task error handling in wait_for_finished_task - deduplication mostly, but not yet correct (some info message is redundant) .. and that commit needs a fix (json response should be printed out in failure case) .. also, there's a bugfix - we should always wait for the create_distribution task to avoid races...
should be squashed with 3. :-( while 3. should be split (bugfix vs. deduplication)

Some callers of create_publication() only checked whether the POST request succeeded, but not whether the asynchronous Pulp task actually completed. Add a publish() method that creates the publication and waits for the task, so failures are detected and logged consistently. Relates: fedora-copr#4290 Closes: fedora-copr#4292

praiskup · 2026-05-21T06:26:21Z

Fixed conflict with 460e725. PTAL

praiskup · 2026-05-21T06:26:27Z

/packit test

Some callers of create_publication() only checked whether the POST request succeeded, but not whether the asynchronous Pulp task actually completed. Add a publish() method that creates the publication and waits for the task, so failures are detected and logged consistently. Relates: fedora-copr#4290 Closes: fedora-copr#4292

The update_distribution() PATCH request can return either 200 (synchronous) or 202 (asynchronous task). Handle the 202 case by waiting for the task to finish, and move error handling into the method so callers just check the bool return value.

Several callers of wait_for_finished_task() duplicated the same pattern: check response.ok, parse JSON, compare state to "completed", log error. This made it easy to miss failures or log them inconsistently (some callers didn't check the task result at all, just the initial POST). Move the success/failure check into wait_for_finished_task() itself so it returns the task data on success or None on failure, with consistent error logging including a caller-provided description. Also switch from time.time() to time.monotonic(), and retry on transient status-polling failures instead of giving up immediately. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>

Each caller logged its own "Successfully ..." message after wait_for_finished_task returned. Since wait_for_finished_task already has the description parameter and knows whether the task succeeded, log it there once instead of repeating it in every caller. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread backend/copr_backend/pulp.py Outdated

Comment thread backend/copr_backend/storage.py Outdated

Comment thread backend/copr_backend/pulp.py Outdated

FrostyX previously approved these changes Apr 29, 2026

View reviewed changes

praiskup force-pushed the praiskup-handle-publishing-errors branch from d71e32c to 87b39e0 Compare May 13, 2026 18:44

praiskup requested a review from FrostyX May 13, 2026 18:45

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

FrostyX reviewed May 14, 2026

View reviewed changes

praiskup force-pushed the praiskup-handle-publishing-errors branch from b129e2e to 9f249db Compare May 21, 2026 06:25

praiskup added 5 commits June 1, 2026 10:14

pylintrc: raise the max number of return statements

6651aca

praiskup force-pushed the praiskup-handle-publishing-errors branch from 9f249db to 6651aca Compare June 1, 2026 08:14

-            task = response.json()["task"]
+            task = response.json().get("task")
+            if task and not self.client.wait_for_finished_task(
+                task, f"create distribution {distribution_name}",
+            ):
+                return False

	try:
	response = self.client.create_distribution(
	distribution_name,
	repository_href,
	)
	except RequestError as ex:
	if "This field must be unique" not in ex.response.text:
	self.log.error(
	"Failed to create a Pulp distribution %s because of %s",
	public_distribution_name,
	ex.response.text,
	)
	return False

	if response.status_code >= 500:
	# Server error. Hopefully this is only temporary problem, we wan't
	# to re-try, and wait till the server works again.
	raise RequestRetryError(
	"Request server error on {}: {} {}".format(
	url, response.status_code, response.reason))

	if response.status_code == 408:
	# 408 Request Timeout - We couldn't send the request within
	# a reasonable time-frame, so let's retry.
	raise RequestRetryError(
	f"Request Timeout {response.status_code}: {response.reason}: {url}")

	if response.status_code >= 400:
	# Client error. The mistake is on our side, it doesn't make sense
	# to continue with retries.
	msg = "Request client error on {}: {} {}".format(
	url, response.status_code, response.reason)
	raise RequestError(msg, response)

Conversation

praiskup commented Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

praiskup commented May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FrostyX left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FrostyX May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FrostyX May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

praiskup commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

praiskup commented May 14, 2026

Uh oh!

praiskup commented May 21, 2026

Uh oh!

praiskup commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FrostyX May 14, 2026 •

edited

Loading

FrostyX May 14, 2026 •

edited

Loading

praiskup commented May 14, 2026 •

edited

Loading