Proposal: system prompt defense audit guardrail action

## Context

NeMo Guardrails provides runtime guardrails for conversational AI. A complementary layer would be **pre-deployment system prompt auditing** — checking whether the system prompt itself includes defensive instructions before the conversation starts.

## Proposal

A guardrail action that validates system prompt defense posture at initialization:

```colang
define flow check system prompt defense
  $defense_result = execute check_prompt_defense
  if $defense_result.score < 50
    bot inform "Warning: system prompt has weak defense posture ({$defense_result.score}/100)"
```

The action would use [prompt-defense-audit](https://github.com/ppcvote/prompt-defense-audit) to check 12 attack vectors:
- Role boundary, instruction boundary, data protection
- Multi-language bypass, indirect injection, social engineering
- Output control, unicode protection, input validation
- And 3 more (OWASP LLM Top 10 mapped)

## Why This Matters

We scanned 1,646 leaked production system prompts — 97.8% have no indirect injection defense, average score 36/100. A pre-conversation guardrail that flags weak system prompts would catch these gaps before they become runtime vulnerabilities.

## Implementation

prompt-defense-audit is on npm and exports a simple API:

```python
# Python wrapper around the npm package, or port the regex rules to Python
from prompt_defense_audit import audit
result = audit("You are a helpful assistant.")
# result.score = 8, result.grade = 'F', result.missing = ['indirect-injection', ...]
```

The scanner is pure regex (<5ms, zero dependencies) so it adds negligible latency to guardrail initialization.

Related: We also contributed 6 defense posture patterns to [NVIDIA/garak](https://github.com/NVIDIA/garak/pull/1669) based on the same data.

Happy to contribute a PR with the action implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: system prompt defense audit guardrail action #1764

Context

Proposal

Why This Matters

Implementation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Proposal: system prompt defense audit guardrail action #1764

Description

Context

Proposal

Why This Matters

Implementation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions