fix(BA-6635): retry image rescan result commits by seedspirit · Pull Request #12449 · lablup/backend.ai

seedspirit · 2026-06-30T01:32:30Z

Summary

Retry the container registry image rescan commit phase with the repository resilience retry decorator.
Preserve pending image updates across retry attempts and emit progress only after a successful DB commit.
Add a unit test covering serialization failure retry during rescan result commit.

Test plan

pants fmt ::
pants fix ::
pants lint --changed-since=origin/main
pants fmt --changed-since=HEAD~1
pants fix --changed-since=HEAD~1
pants lint --changed-since=HEAD~1
pants check --changed-since=HEAD~1
pants test --changed-since=HEAD~1

Copilot

Pull request overview

This PR makes the container registry image rescan commit phase resilient to transient PostgreSQL serialization conflicts (SQLSTATE 40001). It extracts the DB-mutating portion of BaseContainerRegistry.commit_rescan_result() into a new _commit_rescan_result_once() helper wrapped with a Resilience retry policy, so a failed commit is retried rather than aborting the whole rescan. Crucially, the helper now copies the pending updates per attempt (pending_updates = dict(original_updates)) instead of mutating the shared all_updates ContextVar dict, so retries see the full update set again. Progress messages are collected during the transaction and only emitted (logs + ProgressReporter) after a successful commit.

Changes:

Added a module-level commit_rescan_result_resilience (RetryPolicy, max_retries=10, fixed 0.1s delay, BackendAIError non-retryable), matching the repository-layer retry shape.
Refactored commit_rescan_result() into an outer method that emits progress post-commit plus a retried _commit_rescan_result_once() that performs all DB work on a per-attempt copy of updates.
Added a unit test that simulates a DBAPIError serialization failure on first flush and verifies the retry succeeds without losing pending updates.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`src/ai/backend/manager/container_registry/base.py`	Adds retry resilience policy and splits commit logic into a retried helper that copies updates per attempt and defers progress emission until after a successful commit.
`tests/unit/manager/test_container_registry_base.py`	New unit test exercising serialization-failure retry of `commit_rescan_result()` via a fake DB/session, asserting two attempts and preserved update state.
`changes/12449.fix.md`	Towncrier changelog entry describing the retry-on-serialization-conflict fix.

I verified the key correctness aspects: the public commit_rescan_result() signature is unchanged so both call sites (rescan_single_registry, scan_single_ref) remain valid; the shared all_updates dict is no longer mutated (only the per-attempt shallow copy is, and inner update dicts are never mutated, so the copy is safe); the retried scope includes both session.flush() and the implicit commit on begin_session exit, so commit-time serialization failures are also retried; progress events are rebuilt each attempt and emitted exactly once on success, preserving the original ordering and reporter-increment behavior; and _TestRegistry implements the only @abstractmethod (fetch_repositories). I found no concrete defects to flag.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jopemachine · 2026-07-01T01:41:43Z

+        image_identifiers: list[tuple[str, str]],
+    ) -> tuple[list[ImageData], list[tuple[str, str]]]:
+        scanned_images: list[ImageData] = []
+        progress_events: list[tuple[str, str]] = []


Aggregating the logs doesn't seem like a good approach.

Wouldn't it be better to handle logging and reporter.update separately?

Even if we decide to aggregate logs, collecting raw strings such as "warning" and "info" and then dispatching them through a switch statement seems quite fragile.

Since retry logic has been implemented, it seems more appropriate to aggregate the logs rather than log them each time, and display them all at once when the operation succeeds. I’ve changed "warning" and "info" to enums for now.

Replace the (level, message) tuple buffer with a dedicated `_RescanProgressEvent` dataclass and reuse the shared `LogLevel` enum instead of raw "info"/"warning" strings, per review feedback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

seedspirit · 2026-07-02T06:47:04Z

Resolved by #12464

fix(BA-6635): retry image rescan result commits

d472c72

github-actions Bot assigned seedspirit Jun 30, 2026

github-actions Bot added size:L 100~500 LoC comp:manager Related to Manager component labels Jun 30, 2026

changelog: add news fragment for PR #12449

378aa65

seedspirit marked this pull request as ready for review June 30, 2026 07:39

seedspirit requested a review from a team as a code owner June 30, 2026 07:39

Copilot AI review requested due to automatic review settings June 30, 2026 07:39

Copilot started reviewing on behalf of seedspirit June 30, 2026 07:40 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

seedspirit requested a review from HyeockJinKim June 30, 2026 08:46

seedspirit added this to the 26.4 milestone Jun 30, 2026

jopemachine reviewed Jul 1, 2026

View reviewed changes

Comment thread src/ai/backend/manager/container_registry/base.py Outdated

jopemachine reviewed Jul 1, 2026

View reviewed changes

seedspirit and others added 2 commits July 1, 2026 12:07

fix(BA-6635): serialize image rescan commits per registry

de899be

seedspirit closed this Jul 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(BA-6635): retry image rescan result commits#12449

fix(BA-6635): retry image rescan result commits#12449
seedspirit wants to merge 4 commits into
mainfrom
fix/BA-6635

seedspirit commented Jun 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

jopemachine Jul 1, 2026

Uh oh!

seedspirit Jul 2, 2026

Uh oh!

seedspirit commented Jul 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

seedspirit commented Jun 30, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

jopemachine Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

seedspirit Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

seedspirit commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seedspirit commented Jul 2, 2026 •

edited

Loading