Skip to content

Commit 633155f

Browse files
committed
docs: address review findings on Guardian migration PR
- Fix stale `grounding_context` tip in tutorial step 6 — was referencing a parameter removed from the code example (3/3 reviewer consensus) - Add deprecation notice to docs/examples/safety/README.md to match the deprecation docstrings already added to the three .py files - Resolve duplicate `intrinsics/` entries in examples/index.md — the Safety section row covers Guardian functions; the Performance row gains a "(Non-Guardian)" qualifier with a cross-reference - Tutorial step 7: add user message to eval_ctx for consistency with all other guardian_check() examples - safety-guardrails.md: add migration callout after custom criteria section noting that not all deprecated GuardianRisk values have CRITERIA_BANK keys - safety-guardrails.md: add note clarifying counterintuitive factuality_detection() return semantics ("yes" = incorrect, "no" = correct) - troubleshooting/common-errors.md: add factuality_correction() to the Guardian Intrinsics list (was omitted alongside the other three functions) - security-and-taint-tracking.md: update frontmatter description to signal deprecation in search results and link previews - security-and-taint-tracking.md: fix imprecise "no separate Guardian model pull" claim — intrinsics still download a model, just a different one Assisted-by: Claude Code
1 parent 1cfcc76 commit 633155f

6 files changed

Lines changed: 36 additions & 121 deletions

File tree

docs/docs/advanced/security-and-taint-tracking.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
title: "Security and Taint Tracking"
3-
description: "Use GuardianCheck with IBM Granite Guardian to validate LLM outputs for safety risks."
3+
description: "[Deprecated] GuardianCheck API for LLM output safety validation. Use Guardian Intrinsics instead."
44
# diataxis: how-to
55
---
66

77
> **Deprecated API.** The `GuardianCheck` class documented here is deprecated as
88
> of Mellea v0.4 and will emit `DeprecationWarning` on use. For new code, use the
99
> [Guardian Intrinsics](../how-to/safety-guardrails)`guardian_check()`,
1010
> `policy_guardrails()`, `factuality_detection()`, and `factuality_correction()`
11-
> which are faster, require no separate Guardian model pull, and produce consistent
11+
> which are faster, use a single Granite model instead of a separate Guardian model, and produce consistent
1212
> structured output.
1313
1414
**Prerequisites:** [Instruct, Validate, Repair](../concepts/instruct-validate-repair)

docs/docs/examples/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ to run.
7878
| Category | What it shows |
7979
| -------- | ------------- |
8080
| `aLora/` | Training aLoRA adapters for fast constraint checking; performance optimisation |
81-
| `intrinsics/` | Answer relevance, hallucination detection, citation validation, context relevance — specialised adapter-backed checks |
81+
| `intrinsics/` | *(Non-Guardian)* Answer relevance, hallucination detection, citation validation, context relevance — specialised adapter-backed checks. For Guardian safety functions see [Safety and validation](#safety-and-validation) above |
8282
| `sofai/` | Two-tier sampling: fast-model iteration with escalation to a slow model; cost optimisation |
8383

8484
### Multimodal

docs/docs/how-to/safety-guardrails.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,11 @@ print(f"PII score: {score:.4f}")
148148
# Example output: PII score: 0.9871
149149
```
150150

151+
> **Migrating from `GuardianRisk`?** Not all deprecated `GuardianRisk` enum
152+
> values have a corresponding `CRITERIA_BANK` key. For any risk category not
153+
> listed in the table above, pass a custom free-text description as the
154+
> `criteria` argument.
155+
151156
## Policy compliance
152157

153158
`policy_guardrails()` checks whether a scenario complies with a natural-language
@@ -189,6 +194,10 @@ documents, a user question, and the assistant's answer.
189194
Returns `"yes"` if the response is factually incorrect (contains unsupported or
190195
contradicted claims), or `"no"` if it is factually correct:
191196

197+
> **Note:** `"yes"` means factuality issues **were** detected — the response is
198+
> incorrect. `"no"` means the response is factually consistent with the context.
199+
> This is easy to misread; test against `== "yes"` to catch errors.
200+
192201
```python
193202
from mellea.backends.huggingface import LocalHFBackend
194203
from mellea.stdlib.components import Document, Message

docs/docs/troubleshooting/common-errors.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -213,7 +213,8 @@ nest_asyncio.apply()
213213
## Guardian / safety validation
214214

215215
Guardian Intrinsics (`guardian_check()`, `policy_guardrails()`,
216-
`factuality_detection()`) require `LocalHFBackend` with an IBM Granite model.
216+
`factuality_detection()`, `factuality_correction()`) require `LocalHFBackend`
217+
with an IBM Granite model.
217218
See [Safety Guardrails](../how-to/safety-guardrails) for full usage.
218219

219220
### `guardian_check()` returns unexpected scores

docs/docs/tutorials/04-making-agents-reliable.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -383,9 +383,9 @@ else:
383383
# Grounded response (score: 0.0034): Mellea is an open-source Python framework ...
384384
```
385385

386-
> **Tip:** Pass the same text you supplied as `grounding_context` to
387-
> `"Document:"` in the evaluation context. This ensures the guardian evaluates
388-
> the response against exactly what the agent was given.
386+
> **Tip:** Include the same document text the tool retrieved in the evaluation
387+
> context. This ensures the guardian evaluates the response against exactly
388+
> what the agent was given.
389389
390390
---
391391

@@ -443,7 +443,11 @@ output = asyncio.run(run_agent(
443443

444444
# Evaluate the agent's final output against harm criteria.
445445
guardian_backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
446-
eval_ctx = ChatContext().add(Message("assistant", output))
446+
eval_ctx = (
447+
ChatContext()
448+
.add(Message("user", "Find out what Mellea is, then calculate how many characters are in 'Mellea'."))
449+
.add(Message("assistant", output))
450+
)
447451
harm_score = guardian.guardian_check(eval_ctx, guardian_backend, criteria="harm")
448452

449453
if harm_score < 0.5:

docs/examples/safety/README.md

Lines changed: 14 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -1,127 +1,28 @@
1-
# Safety Examples
1+
# Safety Examples (Deprecated)
22

3-
This directory contains examples of using Granite Guardian models for content safety and validation.
3+
> **Deprecated.** These examples use the `GuardianCheck` API, which is deprecated
4+
> as of Mellea v0.4 and will emit `DeprecationWarning` on use.
5+
>
6+
> For the current Guardian API, see:
7+
> - **[`../intrinsics/`](../intrinsics/)** — replacement examples using Guardian Intrinsics
8+
> - **[Safety Guardrails how-to](https://mellea.dev/how-to/safety-guardrails)** — full documentation
49
510
## Files
611

712
### guardian.py
8-
Comprehensive examples of using the enhanced GuardianCheck requirement with Granite Guardian 3.3 8B.
913

10-
**Key Features:**
11-
- Multiple risk types (harm, jailbreak, social bias, etc.)
12-
- Thinking mode for detailed reasoning
13-
- Custom criteria for domain-specific safety
14-
- Groundedness detection
15-
- Function call hallucination detection
16-
- Multiple backend support (Ollama, HuggingFace)
14+
`GuardianCheck` examples using Granite Guardian 3.3 8B via Ollama.
1715

1816
### guardian_huggingface.py
19-
Using Guardian models with HuggingFace backend.
2017

21-
### repair_with_guardian.py
22-
Combining Guardian safety checks with automatic repair.
23-
24-
## Concepts Demonstrated
25-
26-
- **Content Safety**: Detecting harmful, biased, or inappropriate content
27-
- **Jailbreak Detection**: Identifying attempts to bypass safety measures
28-
- **Groundedness**: Ensuring responses are factually grounded
29-
- **Function Call Validation**: Detecting hallucinated tool calls
30-
- **Multi-Risk Assessment**: Checking multiple safety criteria
31-
- **Thinking Mode**: Getting detailed reasoning for safety decisions
32-
33-
## Available Risk Types
34-
35-
```python
36-
from mellea.stdlib.requirements.safety.guardian import GuardianRisk
37-
38-
# Built-in risk types
39-
GuardianRisk.HARM # Harmful content
40-
GuardianRisk.JAILBREAK # Jailbreak attempts
41-
GuardianRisk.SOCIAL_BIAS # Social bias
42-
GuardianRisk.GROUNDEDNESS # Factual grounding
43-
GuardianRisk.FUNCTION_CALL # Function call hallucination
44-
# ... and more
45-
```
46-
47-
## Basic Usage
48-
49-
```python
50-
from mellea import start_session
51-
from mellea.stdlib.requirements.safety.guardian import GuardianCheck, GuardianRisk
52-
53-
# Create guardian with specific risk type
54-
guardian = GuardianCheck(GuardianRisk.HARM, thinking=True)
55-
56-
# Use in validation
57-
m = start_session()
58-
m.chat("Write a professional email.")
59-
is_safe = m.validate([guardian])
60-
61-
print(f"Content is safe: {is_safe[0]._result}")
62-
if is_safe[0]._reason:
63-
print(f"Reasoning: {is_safe[0]._reason}")
64-
```
18+
`GuardianCheck` examples using a HuggingFace backend with shared backend reuse.
6519

66-
## Advanced Usage
67-
68-
### Custom Criteria
69-
```python
70-
custom_guardian = GuardianCheck(
71-
custom_criteria="Check for inappropriate content in educational context"
72-
)
73-
```
74-
75-
### Groundedness Detection
76-
```python
77-
groundedness_guardian = GuardianCheck(
78-
GuardianRisk.GROUNDEDNESS,
79-
thinking=True,
80-
context_text="Reference text for grounding check..."
81-
)
82-
```
83-
84-
### Function Call Validation
85-
```python
86-
function_guardian = GuardianCheck(
87-
GuardianRisk.FUNCTION_CALL,
88-
thinking=True,
89-
tools=[tool_definition]
90-
)
91-
```
92-
93-
### Multiple Guardians
94-
```python
95-
guardians = [
96-
GuardianCheck(GuardianRisk.HARM),
97-
GuardianCheck(GuardianRisk.JAILBREAK),
98-
GuardianCheck(GuardianRisk.SOCIAL_BIAS),
99-
]
100-
results = m.validate(guardians)
101-
```
102-
103-
## Thinking Mode
104-
105-
Enable `thinking=True` to get detailed reasoning:
106-
```python
107-
guardian = GuardianCheck(GuardianRisk.HARM, thinking=True)
108-
result = m.validate([guardian])
109-
print(result[0]._reason) # Detailed explanation
110-
```
111-
112-
## Backend Support
113-
114-
- **Ollama**: `backend_type="ollama"` (default)
115-
- **HuggingFace**: `backend_type="huggingface"`
116-
- **Custom**: Pass your own backend instance
117-
118-
## Models
20+
### repair_with_guardian.py
11921

120-
- Granite Guardian 3.0 2B
121-
- Granite Guardian 3.3 8B (recommended)
22+
`GuardianCheck` combined with `RepairTemplateStrategy`. Note: this pattern has no direct
23+
equivalent in the Guardian Intrinsics API.
12224

12325
## Related Documentation
12426

125-
- See `mellea/stdlib/requirements/safety/guardian.py` for implementation
126-
- See `test/stdlib/requirements/` for more examples
127-
- See IBM Granite Guardian documentation for model details
27+
- [Security and Taint Tracking (deprecated)](../../../docs/docs/advanced/security-and-taint-tracking.md)
28+
- [Safety Guardrails (current)](../../../docs/docs/how-to/safety-guardrails.md)

0 commit comments

Comments
 (0)