Skip to content

Commit cfc8370

Browse files
docs(detect): tidy review-history residue in chunked-validation prose
Four inline prose touches in the chunked-validation module. No code or test behaviour changes; only doc/comment rewording so the source reads as a standalone spec rather than a narrated diff against prior versions. - `chunk_candidates` docstring: state the ``max_entities_per_call > 0`` precondition positively, point at the two upstream enforcement layers, drop the "so we do not re-check it here" coda. - `_dispatch_chunk` defensive-guard comment: keep the substantive reasons (empty-facades precondition, loud named error vs. ``raise None``, guard survives ``python -O``) and drop the assert-vs-if comparison that only makes sense in the context of the review exchange that introduced it. - `_dispatch_chunk` docstring: replace "matching the pre-failover contract" with a direct statement of the single-alias-pool behaviour ("no alternate alias to try"). - `test_system_prompt_default_none_is_forwarded_untouched` comment: tighten the "accidentally inject a placeholder" framing to just state the current recipe contract and what the test pins. Made-with: Cursor
1 parent 2163fc8 commit cfc8370

2 files changed

Lines changed: 21 additions & 22 deletions

File tree

src/anonymizer/engine/detection/chunked_validation.py

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -160,10 +160,10 @@ def chunk_candidates(
160160
) -> list[list[tuple[Any, EntitySpan]]]:
161161
"""Partition the ordered (candidate, seed) pairs into chunks of at most ``max_entities_per_call``.
162162
163-
``max_entities_per_call`` is assumed positive. It is validated upstream by
164-
``ChunkedValidationParams`` (``Field(gt=0)``) and by
165-
``AnonymizerDetectConfig.validation_max_entities_per_call`` (``gt=0``), so
166-
we do not re-check it here.
163+
Assumes ``max_entities_per_call > 0``; positivity is enforced upstream at
164+
``ChunkedValidationParams.max_entities_per_call`` and
165+
``AnonymizerDetectConfig.validation_max_entities_per_call`` (both
166+
``Field(gt=0)``).
167167
"""
168168
return [list(ordered[i : i + max_entities_per_call]) for i in range(0, len(ordered), max_entities_per_call)]
169169

@@ -321,11 +321,11 @@ async def _dispatch_chunk(
321321
We use ``PydanticResponseRecipe`` so the facade appends JSON task
322322
instructions and parses the response into ``RawValidationDecisionsSchema``.
323323
324-
Single-alias pools get no additional attempts — the loop runs exactly
325-
once and re-raises the original exception, matching the pre-failover
326-
contract. Multi-alias pools get ``len(pool)`` total attempts. If every
327-
pool member raises, the *last* exception propagates so DataDesigner
328-
records the row as a ``FailedRecord`` via ``NddAdapter._detect_missing_records``.
324+
Single-alias pools run the loop exactly once and re-raise the original
325+
exception (no alternate alias to try). Multi-alias pools get
326+
``len(pool)`` total attempts. If every pool member raises, the *last*
327+
exception propagates so DataDesigner records the row as a
328+
``FailedRecord`` via ``NddAdapter._detect_missing_records``.
329329
330330
Each failover attempt is logged at WARNING so operators can correlate
331331
degraded pool members with run-level failure-rate spikes.
@@ -374,15 +374,14 @@ async def _dispatch_chunk(
374374
exc,
375375
)
376376

377-
# Defensive: ``facades`` is non-empty by caller contract
378-
# (``chunked_validate_row`` builds it from the configured pool, which the
379-
# config validator requires to be non-empty). After a non-empty pool fully
380-
# exhausts, ``last_exc`` is guaranteed set and we re-raise it. If we ever
381-
# reach this point with ``last_exc is None``, the precondition was
382-
# violated — raise a loud ``RuntimeError`` rather than ``raise None``
383-
# (which would produce ``TypeError: exceptions must derive from
384-
# BaseException``). Using a real check instead of ``assert`` so the guard
385-
# survives ``python -O``.
377+
# ``facades`` is non-empty by caller contract: ``chunked_validate_row``
378+
# builds it from the configured pool, and the config validator requires
379+
# a non-empty pool. After the loop, ``last_exc`` is therefore set and
380+
# we re-raise it. The ``None`` branch exists only to give a loud,
381+
# named error if that precondition is ever violated (rather than
382+
# ``raise None``, which would surface as ``TypeError: exceptions must
383+
# derive from BaseException``) and to keep the guard live under
384+
# ``python -O``, which strips ``assert``.
386385
if last_exc is None:
387386
raise RuntimeError(
388387
"_dispatch_chunk was called with an empty facades list; "

tests/engine/test_chunked_validation.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -411,10 +411,10 @@ def test_system_prompt_is_forwarded_to_facade(self) -> None:
411411
assert sentinel in forwarded
412412

413413
def test_system_prompt_default_none_is_forwarded_untouched(self) -> None:
414-
# When the caller leaves ``system_prompt`` unset, nothing upstream of
415-
# the recipe should synthesize one. Whatever the recipe produces from
416-
# ``None`` is what the facade sees (today: ``None``). This test pins
417-
# that we don't accidentally inject a placeholder along the way.
414+
# The recipe maps ``None`` input to ``None`` output; this test pins
415+
# that no intermediate layer replaces it with a placeholder string
416+
# when ``ChunkedValidationParams.system_prompt`` is left at its
417+
# default.
418418
text = "Alice spoke."
419419
spans = [_entity_span("a", "Alice", "first_name", 0, 5)]
420420
candidates = _candidates_schema(("a", "Alice", "first_name"))

0 commit comments

Comments
 (0)