Skip to content

perf: replace OFFSET pagination with ID-range batching in async jobs#148

Open
kneckinator wants to merge 3 commits into19.0from
worktree-perf+phase6-keyset-pagination
Open

perf: replace OFFSET pagination with ID-range batching in async jobs#148
kneckinator wants to merge 3 commits into19.0from
worktree-perf+phase6-keyset-pagination

Conversation

@kneckinator
Copy link
Copy Markdown
Contributor

@kneckinator kneckinator commented Apr 2, 2026

Summary

  • Replace OFFSET-based pagination in all async job dispatchers with NTILE-based ID-range batching
  • OFFSET N causes PostgreSQL to scan and discard N rows, making later batches O(N) slower — with 1M records and batch size 2000, the last batch scans ~1M rows
  • ID-range batching uses WHERE id BETWEEN min_id AND max_id, which is O(1) via the primary key index regardless of batch position
  • Add min_id/max_id support to get_beneficiaries() on both spp.program and spp.cycle

Changes

  • pagination_utils.py (new): compute_id_ranges() helper using PostgreSQL NTILE window function to pre-compute (min_id, max_id) boundaries in a single SQL query
  • program_manager.py: _enroll_eligible_registrants_async() dispatches jobs with ID ranges instead of offset/limit
  • cycle_manager_base.py: _check_eligibility_async() and _prepare_entitlements_async() use ID ranges
  • cycle_manager.py: Updated CustomDefaultCycleManager._prepare_entitlements() override to forward min_id/max_id
  • programs.py / cycle.py: get_beneficiaries() supports min_id/max_id for range queries
  • test_keyset_pagination.py (new): 12 tests covering the helper, get_beneficiaries range queries, and async dispatch integration

Context

Phase 6 of 9 in the spp_programs performance optimization effort. Independent of other phases (no hard dependencies, but logically after #116 )

Test plan

  • ./scripts/test_single_module.sh spp_programs — 591 tests, 0 failures
  • pre-commit run --files <changed_files> — all checks pass
  • For async operations: test via UI with ODOO_INIT_MODULES=spp_programs docker compose --profile ui up -d

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces ID-range based keyset pagination to replace inefficient OFFSET-based pagination for asynchronous job dispatching. It adds a new utility, compute_id_ranges, which utilizes PostgreSQL's NTILE function to calculate batch boundaries, and updates the get_beneficiaries methods and various managers in the spp_programs module to support min_id and max_id parameters. Feedback was provided regarding a potential race condition in the pagination utility that could lead to a TypeError if records are deleted between database queries.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.18%. Comparing base (f738582) to head (b4b9ddc).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             19.0     #148      +/-   ##
==========================================
+ Coverage   71.06%   71.18%   +0.11%     
==========================================
  Files         925      926       +1     
  Lines       54704    54753      +49     
==========================================
+ Hits        38876    38976     +100     
+ Misses      15828    15777      -51     
Flag Coverage Δ
spp_api_v2 80.10% <ø> (ø)
spp_api_v2_change_request 66.85% <ø> (ø)
spp_api_v2_cycles 71.12% <ø> (ø)
spp_api_v2_data 64.41% <ø> (ø)
spp_api_v2_entitlements 70.19% <ø> (ø)
spp_api_v2_gis 71.52% <ø> (ø)
spp_api_v2_products 66.27% <ø> (ø)
spp_api_v2_service_points 70.94% <ø> (ø)
spp_api_v2_simulation 71.12% <ø> (ø)
spp_api_v2_vocabulary 57.26% <ø> (ø)
spp_audit 64.44% <ø> (+0.60%) ⬆️
spp_base_common 90.26% <ø> (ø)
spp_case_entitlements 97.61% <ø> (ø)
spp_case_programs 97.14% <ø> (ø)
spp_cel_event 85.11% <ø> (ø)
spp_claim_169 58.11% <ø> (ø)
spp_dci_client_dr 55.87% <ø> (ø)
spp_dci_client_ibr 60.17% <ø> (ø)
spp_programs 63.17% <100.00%> (+0.94%) ⬆️
spp_security 66.66% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
spp_programs/__manifest__.py 0.00% <ø> (ø)
spp_programs/models/cycle.py 65.03% <100.00%> (+0.36%) ⬆️
spp_programs/models/managers/cycle_manager.py 78.26% <100.00%> (ø)
spp_programs/models/managers/cycle_manager_base.py 66.36% <100.00%> (+11.24%) ⬆️
spp_programs/models/managers/pagination_utils.py 100.00% <100.00%> (ø)
spp_programs/models/managers/program_manager.py 92.74% <100.00%> (+14.40%) ⬆️
spp_programs/models/programs.py 87.95% <100.00%> (+0.10%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kneckinator kneckinator force-pushed the worktree-perf+phase6-keyset-pagination branch 2 times, most recently from 737238c to ffb1394 Compare April 2, 2026 08:15
OFFSET N causes PostgreSQL to scan and discard N rows, making later
batches progressively slower. This replaces all async job dispatchers
with NTILE-based ID-range batching that uses WHERE id BETWEEN min_id
AND max_id, which is O(1) via the primary key index.
Handle the unlikely case where records are deleted between the COUNT
and MIN/MAX queries, which would cause a TypeError on None values.
Cover the _check_eligibility_async, _prepare_entitlements_async, and
the isinstance(states, str) branch in _enroll_eligible_registrants_async.
@kneckinator kneckinator force-pushed the worktree-perf+phase6-keyset-pagination branch from ffb1394 to b4b9ddc Compare April 2, 2026 08:17
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kneckinator is this really not intended to be added to the init.py?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants