Skip to content

fix(test): stabilize bgprocess polling in backup/restore/IE/maintenance tests#9958

Merged
asheshv merged 2 commits into
masterfrom
fix/backup-test-polling
May 20, 2026
Merged

fix(test): stabilize bgprocess polling in backup/restore/IE/maintenance tests#9958
asheshv merged 2 commits into
masterfrom
fix/backup-test-polling

Conversation

@asheshv
Copy link
Copy Markdown
Contributor

@asheshv asheshv commented May 20, 2026

Summary

The shared bgprocess polling helpers in four test modules share the same race that just surfaced on the macos-latest / pg16 leg of #9955's CI run (run 26148720940, job 76930277702):

  • Wait budget was 2.5 s (5 iterations × 0.5 s; maintenance used 5 s).
  • Wrong break condition. Polling broke on 'execution_time' in the_process — but execution_time is the elapsed time of a running bgprocess. It's set before the wrapped pg_dump / pg_restore / psql / COPY actually finishes. The completion signal is exit_code becoming non-None.
  • Result: the helper could return control while the wrapped command was still running, and the next assertion — e.g. assert_equal(the_process['exit_code'] in [0, 1], True) — fired on None in [0, 1], i.e. False != True.

Some scenarios masked the bug by listing None in their expected_exit_code set (a tell that someone noticed the polling was unreliable and widened the accepted set instead of fixing the polling). Scenarios that didn't include None were the ones that flaked.

Files changed

All four helpers fixed identically:

  • web/pgadmin/tools/backup/tests/test_backup_utils.py
  • web/pgadmin/tools/import_export/tests/test_import_export_utils.py
  • web/pgadmin/tools/maintenance/tests/test_create_maintenance_job.py
  • web/pgadmin/tools/restore/tests/test_create_restore_job.py

Fix

  • Poll for up to 60 × 0.5 s = 30 s (generous enough for the slowest CI runner).
  • Break only when the_process.get('exit_code') is not None — the actual completion signal.
  • Narrow except Exception: to except StopIteration: — the only thing next(...) here can raise.

No call-site changes needed. The helper contract (returns once the job is done; raises if the bgprocess never finished) is unchanged in spirit and strictly more reliable in practice.

Test plan

Related

Summary by CodeRabbit

  • Tests
    • Improved polling logic for background job completion in backup, import/export, maintenance, and restore test utilities for more reliable test execution.

Review Change Stack

…ce tests

The shared polling helpers in:

  - web/pgadmin/tools/backup/tests/test_backup_utils.py
  - web/pgadmin/tools/import_export/tests/test_import_export_utils.py
  - web/pgadmin/tools/maintenance/tests/test_create_maintenance_job.py
  - web/pgadmin/tools/restore/tests/test_create_restore_job.py

all share the same race that surfaced on macos-latest / pg16 in
PR #9955's CI run:

  - Wait budget was 2.5s (5 iterations x 0.5s; maintenance used 5s).
  - The break condition was `execution_time' in the_process`, but
    `execution_time` is the elapsed time of a *running* bgprocess --
    it is set before the wrapped pg_dump / pg_restore / psql / COPY
    actually finishes. The completion signal is `exit_code` becoming
    non-None.
  - So the helper could return control while the wrapped command was
    still running, and the next assertion -- e.g.
    `assert_equal(the_process['exit_code'] in [0, 1], True)` -- would
    fire on `None in [0, 1]`, i.e. `False != True`.

Some scenarios masked the bug by listing `None` in their
`expected_exit_code` set (a tell that someone noticed the polling was
unreliable and worked around it by widening accepted exit codes).
Scenarios that didn't include `None` were the ones that flaked.

Fix all four helpers identically:

  - Poll for up to 60 iterations x 0.5s = 30s, generous enough for
    the slowest CI runner.
  - Break only when `the_process.get('exit_code') is not None`, the
    actual completion signal.
  - Narrow `except Exception` to `except StopIteration`, which is the
    only thing `next(...)` here can raise.

No call-site changes needed; the helper contract (returns once the
job is done; raises if the bgprocess never finished) is unchanged in
spirit and strictly more reliable in practice.

Verified:

  - pycodestyle on the four files: 0 violations.

This fixes the failure observed in the macos-latest / pg16 leg of
PR #9955's CI run (run 26154521710, job 76930277702), which was
unrelated to that PR's lockfile-only changes.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Walkthrough

This PR standardizes polling logic across four background job test helpers (backup, import/export, maintenance, and restore). Each test's job completion detection is refactored from relying on execution_time presence or unbounded counter loops to using a bounded 60-iteration wait (∼30 seconds) that detects completion when the background process's exit_code becomes non-None. Missing-job scenarios are now explicitly handled by catching StopIteration.

Changes

Background Job Polling Completion Detection

Layer / File(s) Summary
Backup job polling update
web/pgadmin/tools/backup/tests/test_backup_utils.py
Loop initialization changed from unbounded while True to bounded for _ in range(60). Polling condition now breaks when process exit_code is non-None. StopIteration exception caught when job not found; process handle reset to None before continuing.
Import/Export job polling update
web/pgadmin/tools/import_export/tests/test_import_export_utils.py
Loop changed from open-ended while True with counter to bounded for _ in range(60) iteration. Completion detection switched from execution_time presence to exit_code non-None. Explicit StopIteration handler sets the_process to None when job lookup fails.
Maintenance job polling update
web/pgadmin/tools/maintenance/tests/test_create_maintenance_job.py
Polling loop replaced with bounded 60-iteration loop waiting for exit_code to become non-None. StopIteration exception explicitly caught to clear the_process reference before continuing poll iterations.
Restore job polling update
web/pgadmin/tools/restore/tests/test_create_restore_job.py
Wait loop refactored from unbounded while True with cnt and execution_time break logic to bounded 60-iteration loop. Completion detected when exit_code becomes non-None. Missing job id handled by catching StopIteration and resetting the_process to None.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: stabilizing background process polling logic across four test modules for backup, restore, import/export, and maintenance tools.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/backup-test-polling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
web/pgadmin/tools/import_export/tests/test_import_export_utils.py (1)

46-60: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Polling timeout can still return before import/export completion.

The loop break condition is correct, but there is no mandatory post-loop check that completion actually happened. Add explicit asserts that the_process exists and exit_code is non-None before continuing.

Proposed fix
     for _ in range(60):
         ...
         if the_process and the_process.get('exit_code') is not None:
             break
         time.sleep(0.5)

+    assert_equal(the_process is not None, True)
+    assert_equal(the_process.get('exit_code') is not None, True)
+
     assert_equal('execution_time' in the_process, True)
     assert_equal('stime' in the_process, True)
     assert_equal('exit_code' in the_process, True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/pgadmin/tools/import_export/tests/test_import_export_utils.py` around
lines 46 - 60, The test's polling loop may exit due to timeout without actually
finding the job, so after the for-loop that queries via tester.get and sets
the_process (variable job_id, the_process, and its 'exit_code'), add explicit
assertions to guarantee completion: assert that the_process is not None and that
the_process.get('exit_code') is not None (or assertTrue/ assert_is_not_none
depending on the test helpers) so the test fails if the import/export never
completed before continuing.
web/pgadmin/tools/restore/tests/test_create_restore_job.py (1)

173-187: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore polling still allows timeout without proving completion.

Please assert completion right after the loop. Otherwise, timed-out runs can continue with exit_code is None, which undermines the new completion contract.

Proposed fix
         for _ in range(60):
             ...
             if the_process and the_process.get('exit_code') is not None:
                 break
             time.sleep(0.5)

+        self.assertIsNotNone(the_process)
+        self.assertIsNotNone(the_process.get('exit_code'))
+
         self.assertTrue('execution_time' in the_process)
         self.assertTrue('stime' in the_process)
         self.assertTrue('exit_code' in the_process)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/pgadmin/tools/restore/tests/test_create_restore_job.py` around lines 173
- 187, After the polling loop that checks the background process list (the loop
that queries '/misc/bgprocess/' and searches for the_process by job_id), add
assertions immediately after the loop to guarantee the job completed: assert
that the_process is not None and that the_process.get('exit_code') is not None
(e.g., using self.assertIsNotNone on both). This ensures timed-out runs fail the
test instead of continuing with exit_code == None.
web/pgadmin/tools/maintenance/tests/test_create_maintenance_job.py (1)

81-95: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Maintenance helper still needs an explicit “completed” assertion after polling.

for _ in range(60) can finish without hitting the break, and the test then proceeds without guaranteeing exit_code is set. Add explicit post-loop checks.

Proposed fix
         for _ in range(60):
             ...
             if the_process and the_process.get('exit_code') is not None:
                 break
             time.sleep(0.5)

+        self.assertIsNotNone(the_process)
+        self.assertIsNotNone(the_process.get('exit_code'))
+
         self.assertTrue('execution_time' in the_process)
         self.assertTrue('stime' in the_process)
         self.assertTrue('exit_code' in the_process)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/pgadmin/tools/maintenance/tests/test_create_maintenance_job.py` around
lines 81 - 95, The polling loop may exit without finding a completed job, so
after the for loop add explicit assertions to ensure the maintenance job was
found and finished: assert that the_process is not None, that
the_process.get('exit_code') is not None, and optionally that the exit_code
indicates success (e.g., equals 0). Update the test in
test_create_maintenance_job (the block that uses job_id and the_process) to
perform these post-loop checks so the test fails if the job never completed.
web/pgadmin/tools/backup/tests/test_backup_utils.py (1)

33-47: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Timeout path still permits an unfinished backup job.

If the 60-iteration loop exhausts while the_process exists but exit_code is still None, the helper continues and may still pass because expected_exit_code can include None. Add an explicit completion assertion immediately after the loop.

Proposed fix
     for _ in range(60):
         ...
         if the_process and the_process.get('exit_code') is not None:
             break
         time.sleep(0.5)

+    assert_equal(the_process is not None, True)
+    assert_equal(the_process.get('exit_code') is not None, True)
+
     assert_equal('execution_time' in the_process, True)
     assert_equal('stime' in the_process, True)
     assert_equal('exit_code' in the_process, True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/pgadmin/tools/backup/tests/test_backup_utils.py` around lines 33 - 47,
The loop that polls bgprocess may exit by timeout while the job is still
running; after the for-loop that uses tester.get('/misc/bgprocess/?_={0}'),
ensure you explicitly assert that the job completed by checking that the_process
(found by job_id) is not None and that the_process.get('exit_code') is not None,
then assert that the_process['exit_code'] is among expected_exit_code; reference
the variables the_process, job_id, exit_code, expected_exit_code and the polling
call to enforce completion before continuing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@web/pgadmin/tools/backup/tests/test_backup_utils.py`:
- Around line 33-47: The loop that polls bgprocess may exit by timeout while the
job is still running; after the for-loop that uses
tester.get('/misc/bgprocess/?_={0}'), ensure you explicitly assert that the job
completed by checking that the_process (found by job_id) is not None and that
the_process.get('exit_code') is not None, then assert that
the_process['exit_code'] is among expected_exit_code; reference the variables
the_process, job_id, exit_code, expected_exit_code and the polling call to
enforce completion before continuing.

In `@web/pgadmin/tools/import_export/tests/test_import_export_utils.py`:
- Around line 46-60: The test's polling loop may exit due to timeout without
actually finding the job, so after the for-loop that queries via tester.get and
sets the_process (variable job_id, the_process, and its 'exit_code'), add
explicit assertions to guarantee completion: assert that the_process is not None
and that the_process.get('exit_code') is not None (or assertTrue/
assert_is_not_none depending on the test helpers) so the test fails if the
import/export never completed before continuing.

In `@web/pgadmin/tools/maintenance/tests/test_create_maintenance_job.py`:
- Around line 81-95: The polling loop may exit without finding a completed job,
so after the for loop add explicit assertions to ensure the maintenance job was
found and finished: assert that the_process is not None, that
the_process.get('exit_code') is not None, and optionally that the exit_code
indicates success (e.g., equals 0). Update the test in
test_create_maintenance_job (the block that uses job_id and the_process) to
perform these post-loop checks so the test fails if the job never completed.

In `@web/pgadmin/tools/restore/tests/test_create_restore_job.py`:
- Around line 173-187: After the polling loop that checks the background process
list (the loop that queries '/misc/bgprocess/' and searches for the_process by
job_id), add assertions immediately after the loop to guarantee the job
completed: assert that the_process is not None and that
the_process.get('exit_code') is not None (e.g., using self.assertIsNotNone on
both). This ensures timed-out runs fail the test instead of continuing with
exit_code == None.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0cf58215-6041-4990-b5ec-18fc36912ec9

📥 Commits

Reviewing files that changed from the base of the PR and between aad2dfd and 249f6f5.

📒 Files selected for processing (4)
  • web/pgadmin/tools/backup/tests/test_backup_utils.py
  • web/pgadmin/tools/import_export/tests/test_import_export_utils.py
  • web/pgadmin/tools/maintenance/tests/test_create_maintenance_job.py
  • web/pgadmin/tools/restore/tests/test_create_restore_job.py

@asheshv asheshv merged commit ae6dc4b into master May 20, 2026
47 of 48 checks passed
@asheshv asheshv deleted the fix/backup-test-polling branch May 20, 2026 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant