docs: address review findings on Guardian migration PR

planetf1 · planetf1 · commit 633155f34c54 · 2026-04-24T21:10:30.000+01:00
- Fix stale `grounding_context` tip in tutorial step 6 — was referencing
  a parameter removed from the code example (3/3 reviewer consensus)
- Add deprecation notice to docs/examples/safety/README.md to match the
  deprecation docstrings already added to the three .py files
- Resolve duplicate `intrinsics/` entries in examples/index.md — the Safety
  section row covers Guardian functions; the Performance row gains a
  "(Non-Guardian)" qualifier with a cross-reference
- Tutorial step 7: add user message to eval_ctx for consistency with all
  other guardian_check() examples
- safety-guardrails.md: add migration callout after custom criteria section
  noting that not all deprecated GuardianRisk values have CRITERIA_BANK keys
- safety-guardrails.md: add note clarifying counterintuitive factuality_detection()
  return semantics ("yes" = incorrect, "no" = correct)
- troubleshooting/common-errors.md: add factuality_correction() to the
  Guardian Intrinsics list (was omitted alongside the other three functions)
- security-and-taint-tracking.md: update frontmatter description to signal
  deprecation in search results and link previews
- security-and-taint-tracking.md: fix imprecise "no separate Guardian model
  pull" claim — intrinsics still download a model, just a different one

Assisted-by: Claude Code
diff --git a/docs/docs/advanced/security-and-taint-tracking.md b/docs/docs/advanced/security-and-taint-tracking.md
@@ -1,14 +1,14 @@
 ---
 title: "Security and Taint Tracking"
-description: "Use GuardianCheck with IBM Granite Guardian to validate LLM outputs for safety risks."
+description: "[Deprecated] GuardianCheck API for LLM output safety validation. Use Guardian Intrinsics instead."
 # diataxis: how-to
 ---
 
 > **Deprecated API.** The `GuardianCheck` class documented here is deprecated as
 > of Mellea v0.4 and will emit `DeprecationWarning` on use. For new code, use the
 > [Guardian Intrinsics](../how-to/safety-guardrails) — `guardian_check()`,
 > `policy_guardrails()`, `factuality_detection()`, and `factuality_correction()` —
-> which are faster, require no separate Guardian model pull, and produce consistent
+> which are faster, use a single Granite model instead of a separate Guardian model, and produce consistent
 > structured output.
 
 **Prerequisites:** [Instruct, Validate, Repair](../concepts/instruct-validate-repair)
diff --git a/docs/docs/examples/index.md b/docs/docs/examples/index.md
@@ -78,7 +78,7 @@ to run.
 | Category | What it shows |
 | -------- | ------------- |
 | `aLora/` | Training aLoRA adapters for fast constraint checking; performance optimisation |
-| `intrinsics/` | Answer relevance, hallucination detection, citation validation, context relevance — specialised adapter-backed checks |
+| `intrinsics/` | *(Non-Guardian)* Answer relevance, hallucination detection, citation validation, context relevance — specialised adapter-backed checks. For Guardian safety functions see [Safety and validation](#safety-and-validation) above |
 | `sofai/` | Two-tier sampling: fast-model iteration with escalation to a slow model; cost optimisation |
 
 ### Multimodal
diff --git a/docs/docs/how-to/safety-guardrails.md b/docs/docs/how-to/safety-guardrails.md
@@ -148,6 +148,11 @@ print(f"PII score: {score:.4f}")
 # Example output: PII score: 0.9871
 ```
 
+> **Migrating from `GuardianRisk`?** Not all deprecated `GuardianRisk` enum
+> values have a corresponding `CRITERIA_BANK` key. For any risk category not
+> listed in the table above, pass a custom free-text description as the
+> `criteria` argument.
+
 ## Policy compliance
 
 `policy_guardrails()` checks whether a scenario complies with a natural-language
@@ -189,6 +194,10 @@ documents, a user question, and the assistant's answer.
 Returns `"yes"` if the response is factually incorrect (contains unsupported or
 contradicted claims), or `"no"` if it is factually correct:
 
+> **Note:** `"yes"` means factuality issues **were** detected — the response is
+> incorrect. `"no"` means the response is factually consistent with the context.
+> This is easy to misread; test against `== "yes"` to catch errors.
+
 ```python
 from mellea.backends.huggingface import LocalHFBackend
 from mellea.stdlib.components import Document, Message
diff --git a/docs/docs/troubleshooting/common-errors.md b/docs/docs/troubleshooting/common-errors.md
@@ -213,7 +213,8 @@ nest_asyncio.apply()
 ## Guardian / safety validation
 
 Guardian Intrinsics (`guardian_check()`, `policy_guardrails()`,
-`factuality_detection()`) require `LocalHFBackend` with an IBM Granite model.
+`factuality_detection()`, `factuality_correction()`) require `LocalHFBackend`
+with an IBM Granite model.
 See [Safety Guardrails](../how-to/safety-guardrails) for full usage.
 
 ### `guardian_check()` returns unexpected scores
diff --git a/docs/docs/tutorials/04-making-agents-reliable.md b/docs/docs/tutorials/04-making-agents-reliable.md
@@ -383,9 +383,9 @@ else:
 # Grounded response (score: 0.0034): Mellea is an open-source Python framework ...
 ```
 
-> **Tip:** Pass the same text you supplied as `grounding_context` to
-> `"Document:"` in the evaluation context. This ensures the guardian evaluates
-> the response against exactly what the agent was given.
+> **Tip:** Include the same document text the tool retrieved in the evaluation
+> context. This ensures the guardian evaluates the response against exactly
+> what the agent was given.
 
 ---
 
@@ -443,7 +443,11 @@ output = asyncio.run(run_agent(
 
 # Evaluate the agent's final output against harm criteria.
 guardian_backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
-eval_ctx = ChatContext().add(Message("assistant", output))
+eval_ctx = (
+    ChatContext()
+    .add(Message("user", "Find out what Mellea is, then calculate how many characters are in 'Mellea'."))
+    .add(Message("assistant", output))
+)
 harm_score = guardian.guardian_check(eval_ctx, guardian_backend, criteria="harm")
 
 if harm_score < 0.5:
diff --git a/docs/examples/safety/README.md b/docs/examples/safety/README.md
@@ -1,127 +1,28 @@
-# Safety Examples
+# Safety Examples (Deprecated)
 
-This directory contains examples of using Granite Guardian models for content safety and validation.
+> **Deprecated.** These examples use the `GuardianCheck` API, which is deprecated
+> as of Mellea v0.4 and will emit `DeprecationWarning` on use.
+>
+> For the current Guardian API, see:
+> - **[`../intrinsics/`](../intrinsics/)** — replacement examples using Guardian Intrinsics
+> - **[Safety Guardrails how-to](https://mellea.dev/how-to/safety-guardrails)** — full documentation
 
 ## Files
 
 ### guardian.py
-Comprehensive examples of using the enhanced GuardianCheck requirement with Granite Guardian 3.3 8B.
 
-**Key Features:**
-- Multiple risk types (harm, jailbreak, social bias, etc.)
-- Thinking mode for detailed reasoning
-- Custom criteria for domain-specific safety
-- Groundedness detection
-- Function call hallucination detection
-- Multiple backend support (Ollama, HuggingFace)
+`GuardianCheck` examples using Granite Guardian 3.3 8B via Ollama.
 
 ### guardian_huggingface.py
-Using Guardian models with HuggingFace backend.
 
-### repair_with_guardian.py
-Combining Guardian safety checks with automatic repair.
-
-## Concepts Demonstrated
-
-- **Content Safety**: Detecting harmful, biased, or inappropriate content
-- **Jailbreak Detection**: Identifying attempts to bypass safety measures
-- **Groundedness**: Ensuring responses are factually grounded
-- **Function Call Validation**: Detecting hallucinated tool calls
-- **Multi-Risk Assessment**: Checking multiple safety criteria
-- **Thinking Mode**: Getting detailed reasoning for safety decisions
-
-## Available Risk Types
-
-```python
-from mellea.stdlib.requirements.safety.guardian import GuardianRisk
-
-# Built-in risk types
-GuardianRisk.HARM              # Harmful content
-GuardianRisk.JAILBREAK         # Jailbreak attempts
-GuardianRisk.SOCIAL_BIAS       # Social bias
-GuardianRisk.GROUNDEDNESS      # Factual grounding
-GuardianRisk.FUNCTION_CALL     # Function call hallucination
-# ... and more
-```
-
-## Basic Usage
-
-```python
-from mellea import start_session
-from mellea.stdlib.requirements.safety.guardian import GuardianCheck, GuardianRisk
-
-# Create guardian with specific risk type
-guardian = GuardianCheck(GuardianRisk.HARM, thinking=True)
-
-# Use in validation
-m = start_session()
-m.chat("Write a professional email.")
-is_safe = m.validate([guardian])
-
-print(f"Content is safe: {is_safe[0]._result}")
-if is_safe[0]._reason:
-    print(f"Reasoning: {is_safe[0]._reason}")
-```
+`GuardianCheck` examples using a HuggingFace backend with shared backend reuse.
 
-## Advanced Usage
-
-### Custom Criteria
-```python
-custom_guardian = GuardianCheck(
-    custom_criteria="Check for inappropriate content in educational context"
-)
-```
-
-### Groundedness Detection
-```python
-groundedness_guardian = GuardianCheck(
-    GuardianRisk.GROUNDEDNESS,
-    thinking=True,
-    context_text="Reference text for grounding check..."
-)
-```
-
-### Function Call Validation
-```python
-function_guardian = GuardianCheck(
-    GuardianRisk.FUNCTION_CALL,
-    thinking=True,
-    tools=[tool_definition]
-)
-```
-
-### Multiple Guardians
-```python
-guardians = [
-    GuardianCheck(GuardianRisk.HARM),
-    GuardianCheck(GuardianRisk.JAILBREAK),
-    GuardianCheck(GuardianRisk.SOCIAL_BIAS),
-]
-results = m.validate(guardians)
-```
-
-## Thinking Mode
-
-Enable `thinking=True` to get detailed reasoning:
-```python
-guardian = GuardianCheck(GuardianRisk.HARM, thinking=True)
-result = m.validate([guardian])
-print(result[0]._reason)  # Detailed explanation
-```
-
-## Backend Support
-
-- **Ollama**: `backend_type="ollama"` (default)
-- **HuggingFace**: `backend_type="huggingface"`
-- **Custom**: Pass your own backend instance
-
-## Models
+### repair_with_guardian.py
 
-- Granite Guardian 3.0 2B
-- Granite Guardian 3.3 8B (recommended)
+`GuardianCheck` combined with `RepairTemplateStrategy`. Note: this pattern has no direct
+equivalent in the Guardian Intrinsics API.
 
 ## Related Documentation
 
-- See `mellea/stdlib/requirements/safety/guardian.py` for implementation
-- See `test/stdlib/requirements/` for more examples
-- See IBM Granite Guardian documentation for model details
+- [Security and Taint Tracking (deprecated)](../../../docs/docs/advanced/security-and-taint-tracking.md)
+- [Safety Guardrails (current)](../../../docs/docs/how-to/safety-guardrails.md)