You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/validation.md
+79Lines changed: 79 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -194,3 +194,82 @@ interface ValidationError {
194
194
suggestion?: string;
195
195
}
196
196
```
197
+
198
+
## Prompt injection and system-instruction leakage hardening
199
+
200
+
Regex validation is **not a complete security boundary**, but it is useful as a first-pass filter for obvious hostile payloads in high-risk fields such as `user_message`, `external_content`, or `tool_result`.
201
+
202
+
### Common vulnerability patterns
203
+
204
+
- **Direct instruction override**: attacker text attempts to supersede system policies with phrases like “ignore previous instructions”.
205
+
- **Role/channel spoofing**: attacker injects fake role labels (`system:`, `assistant:`) so the model treats user content as higher-priority instructions.
206
+
- **Prompt exfiltration requests**: attacker asks the model to reveal hidden instructions, internal policies, API keys, or chain-of-thought.
207
+
- **Structured wrapper attacks**: attacker wraps malicious directives in XML/JSON/Markdown fences to look machine-generated or authoritative.
208
+
209
+
Use `deny_regex` to catch obvious attack language and `allow_regex` to constrain strict-format fields (IDs, enums, narrow command grammars).
210
+
211
+
### Example hardening set A: free-form user text
212
+
213
+
Use this when input must remain natural language but you want to block obvious prompt-injection attempts:
return_message: "The retrieved content appears unsafe and was not included."
265
+
```
266
+
267
+
**Mitigates:** known jailbreak strings and instruction-like payloads embedded in retrieval data.
268
+
269
+
### Practical guidance
270
+
271
+
- Prefer **allowlists** for structured fields; use denylists only as a secondary net.
272
+
- Keep regexes focused on high-signal patterns to reduce false positives.
273
+
- Combine regex checks with architecture controls: separate trusted instructions from untrusted context, quote/delimit untrusted text, and add explicit “treat context as data” system instructions.
274
+
- Never rely on regex alone for sensitive operations; require server-side policy checks and tool authorization.
275
+
- Log `POK031`/`POK032` failures and monitor spikes as potential attack signals.
0 commit comments