|
1 | | -# Safety Examples |
| 1 | +# Safety Examples (Deprecated) |
2 | 2 |
|
3 | | -This directory contains examples of using Granite Guardian models for content safety and validation. |
| 3 | +> **Deprecated.** These examples use the `GuardianCheck` API, which is deprecated |
| 4 | +> as of Mellea v0.4 and will emit `DeprecationWarning` on use. |
| 5 | +> |
| 6 | +> For the current Guardian API, see: |
| 7 | +> - **[`../intrinsics/`](../intrinsics/)** — replacement examples using Guardian Intrinsics |
| 8 | +> - **[Safety Guardrails how-to](https://mellea.dev/how-to/safety-guardrails)** — full documentation |
4 | 9 |
|
5 | 10 | ## Files |
6 | 11 |
|
7 | 12 | ### guardian.py |
8 | | -Comprehensive examples of using the enhanced GuardianCheck requirement with Granite Guardian 3.3 8B. |
9 | 13 |
|
10 | | -**Key Features:** |
11 | | -- Multiple risk types (harm, jailbreak, social bias, etc.) |
12 | | -- Thinking mode for detailed reasoning |
13 | | -- Custom criteria for domain-specific safety |
14 | | -- Groundedness detection |
15 | | -- Function call hallucination detection |
16 | | -- Multiple backend support (Ollama, HuggingFace) |
| 14 | +`GuardianCheck` examples using Granite Guardian 3.3 8B via Ollama. |
17 | 15 |
|
18 | 16 | ### guardian_huggingface.py |
19 | | -Using Guardian models with HuggingFace backend. |
20 | 17 |
|
21 | | -### repair_with_guardian.py |
22 | | -Combining Guardian safety checks with automatic repair. |
23 | | - |
24 | | -## Concepts Demonstrated |
25 | | - |
26 | | -- **Content Safety**: Detecting harmful, biased, or inappropriate content |
27 | | -- **Jailbreak Detection**: Identifying attempts to bypass safety measures |
28 | | -- **Groundedness**: Ensuring responses are factually grounded |
29 | | -- **Function Call Validation**: Detecting hallucinated tool calls |
30 | | -- **Multi-Risk Assessment**: Checking multiple safety criteria |
31 | | -- **Thinking Mode**: Getting detailed reasoning for safety decisions |
32 | | - |
33 | | -## Available Risk Types |
34 | | - |
35 | | -```python |
36 | | -from mellea.stdlib.requirements.safety.guardian import GuardianRisk |
37 | | - |
38 | | -# Built-in risk types |
39 | | -GuardianRisk.HARM # Harmful content |
40 | | -GuardianRisk.JAILBREAK # Jailbreak attempts |
41 | | -GuardianRisk.SOCIAL_BIAS # Social bias |
42 | | -GuardianRisk.GROUNDEDNESS # Factual grounding |
43 | | -GuardianRisk.FUNCTION_CALL # Function call hallucination |
44 | | -# ... and more |
45 | | -``` |
46 | | - |
47 | | -## Basic Usage |
48 | | - |
49 | | -```python |
50 | | -from mellea import start_session |
51 | | -from mellea.stdlib.requirements.safety.guardian import GuardianCheck, GuardianRisk |
52 | | - |
53 | | -# Create guardian with specific risk type |
54 | | -guardian = GuardianCheck(GuardianRisk.HARM, thinking=True) |
55 | | - |
56 | | -# Use in validation |
57 | | -m = start_session() |
58 | | -m.chat("Write a professional email.") |
59 | | -is_safe = m.validate([guardian]) |
60 | | - |
61 | | -print(f"Content is safe: {is_safe[0]._result}") |
62 | | -if is_safe[0]._reason: |
63 | | - print(f"Reasoning: {is_safe[0]._reason}") |
64 | | -``` |
| 18 | +`GuardianCheck` examples using a HuggingFace backend with shared backend reuse. |
65 | 19 |
|
66 | | -## Advanced Usage |
67 | | - |
68 | | -### Custom Criteria |
69 | | -```python |
70 | | -custom_guardian = GuardianCheck( |
71 | | - custom_criteria="Check for inappropriate content in educational context" |
72 | | -) |
73 | | -``` |
74 | | - |
75 | | -### Groundedness Detection |
76 | | -```python |
77 | | -groundedness_guardian = GuardianCheck( |
78 | | - GuardianRisk.GROUNDEDNESS, |
79 | | - thinking=True, |
80 | | - context_text="Reference text for grounding check..." |
81 | | -) |
82 | | -``` |
83 | | - |
84 | | -### Function Call Validation |
85 | | -```python |
86 | | -function_guardian = GuardianCheck( |
87 | | - GuardianRisk.FUNCTION_CALL, |
88 | | - thinking=True, |
89 | | - tools=[tool_definition] |
90 | | -) |
91 | | -``` |
92 | | - |
93 | | -### Multiple Guardians |
94 | | -```python |
95 | | -guardians = [ |
96 | | - GuardianCheck(GuardianRisk.HARM), |
97 | | - GuardianCheck(GuardianRisk.JAILBREAK), |
98 | | - GuardianCheck(GuardianRisk.SOCIAL_BIAS), |
99 | | -] |
100 | | -results = m.validate(guardians) |
101 | | -``` |
102 | | - |
103 | | -## Thinking Mode |
104 | | - |
105 | | -Enable `thinking=True` to get detailed reasoning: |
106 | | -```python |
107 | | -guardian = GuardianCheck(GuardianRisk.HARM, thinking=True) |
108 | | -result = m.validate([guardian]) |
109 | | -print(result[0]._reason) # Detailed explanation |
110 | | -``` |
111 | | - |
112 | | -## Backend Support |
113 | | - |
114 | | -- **Ollama**: `backend_type="ollama"` (default) |
115 | | -- **HuggingFace**: `backend_type="huggingface"` |
116 | | -- **Custom**: Pass your own backend instance |
117 | | - |
118 | | -## Models |
| 20 | +### repair_with_guardian.py |
119 | 21 |
|
120 | | -- Granite Guardian 3.0 2B |
121 | | -- Granite Guardian 3.3 8B (recommended) |
| 22 | +`GuardianCheck` combined with `RepairTemplateStrategy`. Note: this pattern has no direct |
| 23 | +equivalent in the Guardian Intrinsics API. |
122 | 24 |
|
123 | 25 | ## Related Documentation |
124 | 26 |
|
125 | | -- See `mellea/stdlib/requirements/safety/guardian.py` for implementation |
126 | | -- See `test/stdlib/requirements/` for more examples |
127 | | -- See IBM Granite Guardian documentation for model details |
| 27 | +- [Security and Taint Tracking (deprecated)](../../../docs/docs/advanced/security-and-taint-tracking.md) |
| 28 | +- [Safety Guardrails (current)](../../../docs/docs/how-to/safety-guardrails.md) |
0 commit comments