perf: replace OFFSET pagination with ID-range batching in async jobs#148
perf: replace OFFSET pagination with ID-range batching in async jobs#148kneckinator wants to merge 3 commits into19.0from
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces ID-range based keyset pagination to replace inefficient OFFSET-based pagination for asynchronous job dispatching. It adds a new utility, compute_id_ranges, which utilizes PostgreSQL's NTILE function to calculate batch boundaries, and updates the get_beneficiaries methods and various managers in the spp_programs module to support min_id and max_id parameters. Feedback was provided regarding a potential race condition in the pagination utility that could lead to a TypeError if records are deleted between database queries.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 19.0 #148 +/- ##
==========================================
+ Coverage 71.06% 71.18% +0.11%
==========================================
Files 925 926 +1
Lines 54704 54753 +49
==========================================
+ Hits 38876 38976 +100
+ Misses 15828 15777 -51
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
737238c to
ffb1394
Compare
OFFSET N causes PostgreSQL to scan and discard N rows, making later batches progressively slower. This replaces all async job dispatchers with NTILE-based ID-range batching that uses WHERE id BETWEEN min_id AND max_id, which is O(1) via the primary key index.
Handle the unlikely case where records are deleted between the COUNT and MIN/MAX queries, which would cause a TypeError on None values.
Cover the _check_eligibility_async, _prepare_entitlements_async, and the isinstance(states, str) branch in _enroll_eligible_registrants_async.
ffb1394 to
b4b9ddc
Compare
There was a problem hiding this comment.
@kneckinator is this really not intended to be added to the init.py?
Summary
WHERE id BETWEEN min_id AND max_id, which is O(1) via the primary key index regardless of batch positionmin_id/max_idsupport toget_beneficiaries()on bothspp.programandspp.cycleChanges
pagination_utils.py(new):compute_id_ranges()helper using PostgreSQL NTILE window function to pre-compute (min_id, max_id) boundaries in a single SQL queryprogram_manager.py:_enroll_eligible_registrants_async()dispatches jobs with ID ranges instead of offset/limitcycle_manager_base.py:_check_eligibility_async()and_prepare_entitlements_async()use ID rangescycle_manager.py: UpdatedCustomDefaultCycleManager._prepare_entitlements()override to forwardmin_id/max_idprograms.py/cycle.py:get_beneficiaries()supportsmin_id/max_idfor range queriestest_keyset_pagination.py(new): 12 tests covering the helper, get_beneficiaries range queries, and async dispatch integrationContext
Phase 6 of 9 in the
spp_programsperformance optimization effort. Independent of other phases (no hard dependencies, but logically after #116 )Test plan
./scripts/test_single_module.sh spp_programs— 591 tests, 0 failurespre-commit run --files <changed_files>— all checks passODOO_INIT_MODULES=spp_programs docker compose --profile ui up -d