NVIDIA-NeMo · lipikaramaswamy · Apr 23, 2026 · Apr 20, 2026 · Apr 20, 2026 · Apr 21, 2026
@@ -45,7 +45,35 @@ config = AnonymizerConfig(
 |-------|---------|-------------|
 | `entity_labels` | `None` (all defaults) | List of labels to detect. Leave unset (or pass `None`) to use the full default set. |
 | `gliner_threshold` | `0.3` | GLiNER confidence threshold (0.0--1.0). Lower values detect more entities but may increase false positives. |
+| `validation_max_entities_per_call` | `100` | Maximum candidate entities per validator LLM call. Rows with more candidates are split into chunks. See [Chunked validation](#chunked-validation). |
+| `validation_excerpt_window_chars` | `500` | Characters of context included before and after a chunk's entity spans in the validator prompt. Bounds per-chunk prompt size; not the model's context-window limit. |
 
+---
+
+## Chunked validation
+
+When a row yields many entity candidates, validating them in a single LLM call can often exceed the model's context window or the provider's rate limits (tokens-per-minute or requests-per-minute quotas that many hosted models enforce). Anonymizer automatically splits validation for such rows: candidates are grouped in position order into chunks of at most `validation_max_entities_per_call`, and each chunk is validated independently with its own bounded text excerpt (`validation_excerpt_window_chars` before and after the chunk's span). Decisions are merged back into a single per-row set.
+
+The chunked path is always on; if a row has fewer candidates than the limit, it runs as a single call and is exactly equivalent to the unchunked behavior. Tuning guidance:
+
+- **Raise `validation_max_entities_per_call`** if your validator has a large context window and you want fewer, larger calls.
+- **Lower it** if you hit provider rate limits or want more uniform per-call latency.
+- **Raise `validation_excerpt_window_chars`** when short windows hide the context needed to disambiguate entities (e.g., `"John"` as first name vs. last name depends on surrounding text).
+- **Lower it** to reduce per-chunk prompt tokens, at the risk of lower validation quality on context-sensitive labels.
+
+### Validator pools
+
+`entity_validator` can be a single alias (the default) or a list of aliases — a **pool**. When multiple aliases are configured, each chunk in a row is dispatched to the next alias in round-robin order, which lets you work around per-alias rate limits by spreading requests across equivalent endpoints.
+
+Pools also act as **failover**. If a chunk's assigned alias can't complete the call (an unrecoverable rate limit, a 5xx that didn't clear on retry, a malformed response), the same chunk is automatically retried against the other aliases in your pool before the row is given up on. A chunk only fails once every alias in the pool has failed for it. This is a cheap way to harden validation against any one endpoint having a bad day, on top of the load-spreading role.
+
+#### What happens when a row can't be validated
+
+If validation can't get a complete answer for a row — every alias in the pool has failed on at least one of that row's chunks — the row is **dropped from the output** rather than passed through with some entities unvalidated. This is deliberate: the alternative would be writing the original text back out with those entities still un-scrubbed, which is exactly the outcome you're trying to avoid.
+
+Dropped rows show up on `result.failed_records` with `step="detection"`, so you can tell which inputs didn't make it through by comparing input IDs against output IDs and reprocess those on a follow-up pass.
+
+See [Validator pools](models.md#validator-pools) for the YAML syntax and caveats.
 
 
 ## Entity labels

@@ -109,6 +109,33 @@ Roles you don't override keep their default alias selections, but those aliases
     Use [`anonymizer.validate_config(config)`](../reference/anonymizer/interface/anonymizer.md) (or [`anonymizer validate`](../reference/anonymizer/interface/cli/main.md) from the CLI) after changing model configs to catch alias mismatches before processing data.
 
 
+### Validator pools
+
+`entity_validator` accepts either a single alias (shown above) or a list of aliases. A list forms a **validator pool** with two jobs:
+
+1. **Load spreading.** [Chunked validation](detection.md#chunked-validation) dispatches each chunk to the next alias in round-robin order, aggregating quota across equivalent endpoints when a single alias would hit the provider's rate limits (tokens-per-minute or requests-per-minute quotas).
+2. **Failover.** If a chunk's assigned alias can't complete the call (an unrecoverable rate limit, a 5xx that didn't clear on retry, a malformed response), the same chunk is automatically retried against the other aliases in your pool before the row is given up on. A row is only dropped when *every* alias in the pool has failed for the same chunk. Single-alias pools have nothing to fall back to, so they behave the same as not using a pool.
+
+```yaml
+selected_models:
+  detection:
+    entity_detector: gliner-pii-detector
+    entity_validator:
+      - gpt5-primary
+      - gpt5-secondary
+    entity_augmenter: gpt5-primary
+    latent_detector: claude-sonnet
+```
+
+Every alias in the pool must also appear in `model_configs`; `anonymizer validate` flags unknown aliases by index. A scalar value remains valid and is equivalent to a one-element list.
+
+!!! warning "`max_parallel_requests` is enforced per alias"
+
+    A pool with N aliases effectively allows up to `sum(max_parallel_requests for alias in pool)` concurrent validator calls per row when chunks exist. Budget your provider rate limits accordingly — the whole point of pooling is to multiply in-flight requests, but the multiplication is real.
+
+    Pool aliases should target **equivalent models** (same model family, similar quality). Mixing heterogeneous models produces inconsistent validation across chunks in the same row and is almost always a misconfiguration.
+
+
 ### Choosing custom models
 
 For Anonymizer, the best overall leaderboard model is not always the best default for every role.

@@ -82,6 +82,24 @@ class Detect(BaseModel):
     gliner_threshold: float = Field(
         default=0.3, ge=0.0, le=1.0, description="GLiNER detection confidence threshold (0.0-1.0)."
     )
+    validation_max_entities_per_call: int = Field(
+        default=100,
+        gt=0,
+        description=(
+            "Maximum number of candidate entities included in a single validator LLM call. "
+            "When a row has more candidates than this, validation is split into chunks that "
+            "are dispatched (round-robin) across the validator pool."
+        ),
+    )
+    validation_excerpt_window_chars: int = Field(
+        default=500,
+        gt=0,
+        description=(
+            "Number of characters to include before and after a chunk's entity span when "
+            "building the text excerpt sent to the validator. Bounds the prompt context the "
+            "validator sees per chunk; it is NOT the LLM's context window limit."
+        ),
+    )
 
     @field_validator("entity_labels")
     @classmethod

@@ -3,17 +3,76 @@
 
 from __future__ import annotations
 
-from pydantic import BaseModel
+import logging
+from typing import Any
+
+from pydantic import BaseModel, field_validator
+
+logger = logging.getLogger(__name__)
 
 
 class DetectionModelSelection(BaseModel):
-    """Model aliases for the entity detection pipeline."""
+    """Model aliases for the entity detection pipeline.
+
+    ``entity_validator`` accepts either a single alias or a list of aliases.
+    A list forms a validator *pool*: chunked validation rotates calls
+    across the pool in round-robin order, which is useful for bypassing
+    per-alias TPM/RPM limits. A single scalar is normalized to a
+    one-element list.
+    """
 
     entity_detector: str
-    entity_validator: str
+    entity_validator: list[str]
     entity_augmenter: str
     latent_detector: str
 
+    @field_validator("entity_validator", mode="before")
+    @classmethod
+    def normalize_entity_validator(cls, value: Any) -> list[str]:
+        """Accept a scalar alias, a list of aliases, or a tuple of aliases; return a non-empty deduplicated list.
+
+        Normalizing at parse time keeps every downstream consumer on the
+        same shape (``list[str]``) regardless of whether the user wrote
+        ``entity_validator: some-alias`` or
+        ``entity_validator: [alias-a, alias-b]``. Tuples are accepted for
+        parity with Pydantic v2's default coercion for ``list[str]`` fields,
+        which lets programmatic callers pass either
+        ``DetectionModelSelection(entity_validator=["a", "b"])`` or
+        ``DetectionModelSelection(entity_validator=("a", "b"))`` without
+        caring about the concrete sequence type. Any other input type
+        raises ``TypeError``.
+
+        Duplicate aliases are collapsed to the first occurrence (order
+        preserved) and a warning is logged. A duplicate in the pool would
+        burn a failover attempt on an already-exhausted endpoint, which
+        almost certainly isn't what the user wants.
+        """
+        if isinstance(value, str):
+            aliases: list[str] = [value]
+        elif isinstance(value, (list, tuple)):
+            aliases = [str(item) for item in value]
+        else:
+            raise TypeError(f"entity_validator must be a string or list of strings, got {type(value).__name__}")
+        cleaned = [alias.strip() for alias in aliases if alias.strip()]
+        if not cleaned:
+            raise ValueError("entity_validator must name at least one model alias.")
+        seen: set[str] = set()
+        deduped: list[str] = []
+        for alias in cleaned:
+            if alias in seen:
+                continue
+            seen.add(alias)
+            deduped.append(alias)
+        if len(deduped) != len(cleaned):
+            removed = [alias for alias in cleaned if cleaned.count(alias) > 1]
+            logger.warning(
+                "entity_validator pool contained duplicate aliases %s; collapsing to %s. "
+                "Duplicates burn a failover attempt on an already-exhausted endpoint.",
+                sorted(set(removed)),
+                deduped,
+            )
+        return deduped
+
 
 class ReplaceModelSelection(BaseModel):
     """Model aliases for the replacement pipeline."""