You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Moved the `AuthorityFrontier` discovery-state vocabulary into `opencontractserver/enrichment/constants.py` as a single source of truth (`DISCOVERY_STATE_*` constants, `DISCOVERY_STATE_CHOICES`, and `DISCOVERY_SUCCESS_STATES`). The model field (`opencontractserver/annotations/models.py`), the frontier service transition verbs, the discovery orchestrator, the crawl driver, and the verify/license gate's `GATE_*` verdicts now reference these constants instead of bare string literals, so a rename is a one-line edit and the parallel definitions cannot drift (CLAUDE.md item 4; issue #2020 finding 7). The model's `DISCOVERY_STATE_CHOICES` is re-exported from the constants with identical values, so no migration is generated.
2
+
- Renamed `enrichment.constants._USC_PREFIX_RE` / `_CFR_PREFIX_RE` to `USC_PREFIX_RE` / `CFR_PREFIX_RE` (no leading underscore). They are imported cross-module by the USC/CFR authority source providers' `can_handle` overrides, so the underscore (signalling module-private) was misleading; updated both providers' imports (issue #2020 finding 3).
3
+
- Optimised `AuthorityDiscoveryService._provider_for` (`opencontractserver/enrichment/services/authority_discovery_service.py`) to instantiate each enabled provider once per call and reuse the instances across every candidate key, instead of re-instantiating every provider for each candidate. `can_handle` is pure given the key, so the prior per-candidate instantiation was wasted work that grew with the candidate-key fan-out (issue #2020 finding 6).
4
+
- Expanded the `dequeue_for_provider` docstring to document its crash-recovery purpose (freshly-seeded rows carry `provider=None` and are intentionally skipped; the primary dispatch path is the provider-agnostic `dequeue_queued`) and clarified the `AuthorityKeyEquivalence` model docstring that its direction convention is documentation-only because callers query both columns (issue #2020 findings 9, 10).
- Fixed a TOCTOU race in `AuthorityFrontierService.seed_from_wanted_authorities` (`opencontractserver/enrichment/services/authority_frontier_service.py`): the per-key create-or-refresh now runs inside a single `transaction.atomic()` + `select_for_update()` critical section instead of `get_or_create` followed by an unconditional `save`. Two concurrent seed passes could previously both clear `get_or_create` and have one silently clobber the other's count update, and a freshly created row was briefly visible at `mention_count=0` (the field default) between the insert and its follow-up save. The lock serialises the second writer and seeds the real counts atomically. The "prefer existing non-null" semantics for `jurisdiction`/`authority_type` are preserved (issue #2020 finding 1).
2
+
- Fixed stale `last_error` on a successful retry in `AuthorityFrontierService.mark()` (same file): a row transitioning `failed (error=...) -> ingested` previously retained the old error string, so downstream health checks reading `last_error` misread a healthy row as broken. `mark()` now clears `last_error` implicitly when transitioning into a terminal success state (`enrichment.constants.DISCOVERY_SUCCESS_STATES`, i.e. `ingested`); the failure history is still preserved in the append-only `candidate_sources` audit trail (issue #2020 finding 2).
3
+
- Fixed a double XML traversal in `CFRAuthoritySourceProvider._fetch_impl` (`opencontractserver/pipeline/authority_source_providers/cfr_provider.py`): the `<P>` flatten list comprehension called `_flatten_element_text(p_el)` twice per element (once for the value, once for the truthiness filter). A walrus binds the result once so each `<P>` is traversed a single time (issue #2020 finding 5).
4
+
- Added regression tests in `opencontractserver/tests/test_authority_frontier.py`: the seed upsert takes a `SELECT ... FOR UPDATE` row lock, a `failed -> ingested` transition clears the stale error (while a non-success transition preserves it), and `dequeue_for_provider` excludes freshly-seeded `provider=None` rows (issue #2020 findings 11, 12).
0 commit comments