Conversation
Refactor the guardrails module: replace the single GuardrailsAdvisor
with two specialised Spring AI advisors (CheckForViolationsAdvisor for
blocking checks, SanitizeTextAdvisor for response masking), and split
the monolithic guardrails component into 12 per-guardrail components
(jailbreak, keywords, pii, llm-pii, nsfw, urls, secret-keys,
sanitize-text, check-for-violations, custom, custom-regex,
topical-alignment).
Add a sealed GuardrailException hierarchy with a stable
GuardrailExceptionKind tag for telemetry, per-guardrail FAIL_MODE
(FAIL_CLOSED / FAIL_OPEN) with a configuration-error override that
always fails closed for operator bugs (invalid regex, missing MODEL
child, programming errors), and new SPI types: GuardrailContext, sealed
MaskResult { Entities | Masked | Unchanged }, sealed Violation
{ Pattern | Classified | ExecutionFailure }, PreflightMasking mixin.
Introduce ReDoS-bounded detectors (PiiDetector, SecretKeyDetector,
UrlDetector, LlmPiiDetector, RegexParser, MaskEntityMap,
KeywordMatcher) and LlmClassifier with nonce-fenced user input for
prompt-injection hardening. Fail-open crashes are surfaced on the
response via SKIPPED_FAILURES_METADATA_KEY so downstream alerting can
detect silently degraded guardrails without grepping logs.
Cover fail-mode semantics, overlap masking, configuration-error
override, exception-message leakage, multi-turn bypass, concurrency,
streaming partial-leak semantics, ReDoS bounding, and unicode/RTL/emoji
edge cases with dedicated tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refactor the guardrails module: replace the single GuardrailsAdvisor
with two specialised Spring AI advisors (CheckForViolationsAdvisor for
blocking checks, SanitizeTextAdvisor for response masking), and split
the monolithic guardrails component into 12 per-guardrail components
(jailbreak, keywords, pii, llm-pii, nsfw, urls, secret-keys,
sanitize-text, check-for-violations, custom, custom-regex,
topical-alignment).
Add a sealed GuardrailException hierarchy with a stable
GuardrailExceptionKind tag for telemetry, per-guardrail FAIL_MODE
(FAIL_CLOSED / FAIL_OPEN) with a configuration-error override that
always fails closed for operator bugs (invalid regex, missing MODEL
child, programming errors), and new SPI types: GuardrailContext, sealed
MaskResult { Entities | Masked | Unchanged }, sealed Violation
{ Pattern | Classified | ExecutionFailure }, PreflightMasking mixin.
Introduce ReDoS-bounded detectors (PiiDetector, SecretKeyDetector,
UrlDetector, LlmPiiDetector, RegexParser, MaskEntityMap,
KeywordMatcher) and LlmClassifier with nonce-fenced user input for
prompt-injection hardening. Fail-open crashes are surfaced on the
response via SKIPPED_FAILURES_METADATA_KEY so downstream alerting can
detect silently degraded guardrails without grepping logs.
Cover fail-mode semantics, overlap masking, configuration-error
override, exception-message leakage, multi-turn bypass, concurrency,
streaming partial-leak semantics, ReDoS bounding, and unicode/RTL/emoji
edge cases with dedicated tests.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com