Skip to content

feat(classification): enforce valid class vocabulary with re-prompt/retry (#356)#371

Merged
rstrahan merged 1 commit into
developfrom
feature/classification-enforce-valid-classes
Jun 22, 2026
Merged

feat(classification): enforce valid class vocabulary with re-prompt/retry (#356)#371
rstrahan merged 1 commit into
developfrom
feature/classification-enforce-valid-classes

Conversation

@rstrahan

Copy link
Copy Markdown
Contributor

Summary

Closes #356.

Page-level classification (multimodalPageLevelClassification) can return a class label that is not in the configured vocabulary — e.g. predicting receipt when only invoice, w2, and check are valid. This is especially common with smaller/cheaper models. Previously the code detected this but only logged a warning and used the invalid label anyway.

This PR adds a deterministic validation + retry guardrail:

  1. After the model returns a class, validate it against the configured vocabulary.
  2. If invalid, re-prompt the model — re-sending the original content with an appended correction listing the allowed classes. (At temperature=0, re-sending an identical request would return the identical wrong answer, so the correction is load-bearing — this is a single-turn re-prompt, not a blind retry.)
  3. Retry up to maxValidationRetries.
  4. If exhausted, assign invalidClassFallback (default unclassified) and flag the page with a validation_error in classification metadata. The document keeps processing — no hard failure.

New config keys (under classification:)

Key Default Description
enforceValidClasses true Validate + retry on out-of-vocabulary classes. false = legacy "warn and use as-is".
maxValidationRetries 2 Re-prompts after the initial attempt. 0 disables retries.
invalidClassFallback unclassified Class assigned when retries are exhausted.

All three are editable in the Configuration UI.

⚠️ Behavior change on upgrade

Enforcement is on by default, so an out-of-vocabulary prediction that previously passed through unchanged is now corrected or coerced to unclassified. Set enforceValidClasses: false to restore the prior behavior. This was a deliberate decision (stronger correctness guarantee out of the box); calling it out here for reviewer sign-off.

Design notes

  • Chose a set-membership check + single-turn re-prompt over a generic pydantic-validation framework. The classification "schema" is just an enum of strings; the broader "reuse pydantic everywhere / Discovery" idea from the issue is out of scope (the author flagged it as separate too).
  • Metering is aggregated across all retry attempts so token usage isn't lost.
  • Holistic (textbasedHolisticClassification) is not covered by this loop yet — noted as a follow-up.

Changes

  • Core: validation/retry loop + _build_validation_retry_content helper in classify_page_bedrock
  • Config model: 3 fields + string-tolerant validators on ClassificationConfig
  • Defaults: base-classification.yaml (inherited by config_library samples via merge)
  • Config UI: 3 ConfigSchema entries in patterns/unified/template.yaml with dependsOn wiring
  • Tests: 5 service tests (retry→valid, exhausted→fallback, valid-first no-retry, enforcement-off legacy, metering aggregation) + 2 config-model tests
  • Notebook: notebooks/misc/classification-valid-class-enforcement.ipynb (deterministic mock-driven demo of all 3 scenarios)
  • Docs: user (docs/classification.md), developer README, CHANGELOG with migration note

Testing

  • cd lib/idp_common_pkg && pytest tests/unit/classification tests/unit/config/test_config_models.py84 passed, 1 skipped
  • ruff check and ruff format --check clean on all changed files
  • CloudFormation template parses; all 3 schema entries well-formed
  • Notebook logic verified end-to-end against the real service

Note: 6 failures in tests/unit/config/test_configuration_sync.py are pre-existing on develop (they test extraction.temperature sync, unrelated to this change).

🤖 Generated with Claude Code

…etry (#356)

Page-level classification (multimodalPageLevelClassification) now validates the
model's predicted class against the configured class vocabulary and re-prompts
the model on out-of-vocabulary predictions, retrying up to a configurable limit.
Because classification runs at temperature=0, the retry appends a correction
message listing the allowed classes (a single-turn re-prompt) rather than
re-sending an identical request. When retries are exhausted the page is assigned
a configurable fallback class and flagged with a validation_error in metadata;
the document keeps processing (no hard failure).

Three new classification config keys control it (on by default):
- enforceValidClasses (default true)
- maxValidationRetries (default 2)
- invalidClassFallback (default unclassified)

Behavior change on upgrade: enforcement is on by default, so out-of-vocabulary
predictions that previously passed through unchanged are now corrected or
coerced to the fallback. Set enforceValidClasses: false to restore legacy
"warn and use as-is" behavior.

- Add fields + validators to ClassificationConfig
- Add defaults to base-classification.yaml (inherited by config_library samples)
- Add ConfigSchema entries to patterns/unified/template.yaml for the Config UI
- Add validation/retry loop and _build_validation_retry_content helper
- Add unit tests (retry-then-valid, exhausted-fallback, valid-first, legacy,
  metering aggregation) and config-model default/parsing tests
- Add demo notebook notebooks/misc/classification-valid-class-enforcement.ipynb
- Update user docs, developer README, and CHANGELOG

Holistic packet classification is not covered by this loop yet.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rstrahan rstrahan merged commit 9fe710c into develop Jun 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant