Safety protocols translate TrustByDesign's core principles into concrete operational requirements. These are the how to the principles' why.
Systems are classified into three safety levels based on their scope and autonomy:
- Description: Read-only systems with no state persistence or autonomous action
- Examples: Code analysis tools, documentation assistants, query responders
- Requirements: Basic transparency, no consent needed for ephemeral operation
- Risk: Minimal
- Description: Systems with memory, state, and limited autonomous actions within user-defined scope
- Examples: Personal assistants, dialogue agents (LingOS), session-based tools
- Requirements: Full transparency, explicit consent for memory, bounded capabilities, audit logging
- Risk: Moderate - can influence user decisions or store sensitive data
- Description: Systems with long-term persistence, multi-session continuity, and autonomous goal pursuit
- Examples: Persistent agents (AgentDNA), long-running autonomous systems, decision-making frameworks
- Requirements: All Level 2 requirements + governance oversight, regular audits, robust fallback mechanisms
- Risk: Significant - can take actions with real-world consequences
All systems at Level 2+ must pass these checks:
Requirement: Memory systems must be inspectable, modifiable, and deletable by users.
Implementation Checklist:
- Users can view all stored memories about them
- Users can delete specific memories or all data
- Memory access is logged for audit
- Sensitive data is flagged and encrypted
- Retention policies are explicit and enforced
Validation:
# Example validation test
def test_memory_deletion():
agent.remember("user_secret", "sensitive_data")
agent.forget("user_secret")
assert agent.recall("user_secret") is NoneRequirement: Systems must operate only within explicitly declared capabilities.
Implementation Checklist:
- Capability manifest is documented and machine-readable
- Out-of-scope requests trigger clear refusal
- No capability creep without user approval
- Resource limits (API calls, compute, storage) are enforced
- Actions requiring elevated permissions are gated
Example Manifest:
# agent-capabilities.yaml
agent_id: "assistant-001"
capabilities:
- read_files: ["*.md", "*.txt"]
- analyze_code: true
- network_access: false
- execute_commands: false
boundaries:
max_memory_mb: 100
max_api_calls_per_hour: 1000
allowed_domains: ["local_filesystem"]Requirement: Users must understand what the system is doing and why.
Implementation Checklist:
- All outputs include reasoning traces (if requested)
- State changes are logged with justification
- Confidence levels are communicated for uncertain outputs
- Sources are cited for recalled information
- Hidden optimizations or side effects are disclosed
Example Output:
{
"response": "Based on your previous preferences, I recommend Option A.",
"reasoning": "You indicated last week that minimizing cost is your priority, and Option A is 30% cheaper.",
"confidence": 0.75,
"sources": ["conversation_2025-01-15", "user_preferences"],
"alternatives": ["Option B offers better performance but higher cost"]
}Requirement: Users explicitly opt in to data persistence and autonomous behaviors.
Implementation Checklist:
- First-run consent flow explains what will be remembered
- Granular consent controls (e.g., "remember preferences but not conversations")
- Consent is versioned and re-requested on major changes
- Users can revoke consent and trigger data deletion
- No covert data collection or behavioral tracking
Consent Flow Example:
Agent: "I can remember our conversations to provide better assistance.
This means storing:
- Your questions and my responses
- Preferences you mention
- Context from previous sessions
You can view, modify, or delete this data anytime.
Do you want to enable memory? (yes/no)"
Requirement: Critical decisions and state changes must be traceable.
Implementation Checklist:
- Immutable logs for key decisions (Glyphtrail integration)
- Logs include: timestamp, action, reasoning, outcome
- Logs are user-accessible and machine-readable
- Sensitive data in logs is redacted appropriately
- Log retention policy is explicit
Log Entry Example:
{
"timestamp": "2025-01-15T10:30:00Z",
"agent_id": "assistant-001",
"action": "memory_store",
"details": {
"key": "user_preference_theme",
"value": "dark_mode",
"reason": "User explicitly stated preference"
},
"outcome": "success"
}When a system exceeds its capabilities or encounters errors:
-
Acknowledge limitation clearly
- "I don't have enough information to answer that confidently."
-
Offer alternatives
- "I can provide a partial answer, or you can consult [external resource]."
-
Maintain state integrity
- Don't corrupt memory or make up information to fill gaps
-
Log the failure for improvement
- Record what was attempted and why it failed
Systems must support immediate halt mechanisms:
- User-initiated: "Stop", "Cancel", "Undo"
- Automatic: Triggered on detected anomalies or safety violations
- Auditor-initiated: External override for compliance enforcement
Before deploying a system, assess:
- Privacy Risk: Can the system expose sensitive user data?
- Autonomy Risk: Can the system take actions without user awareness?
- Influence Risk: Can the system manipulate user decisions?
- Persistence Risk: How long does the system retain influence?
- Cascading Risk: Can failures propagate to other systems?
- High Risk (any category): Level 3 protocols + external audit
- Medium Risk: Level 2 protocols + internal review
- Low Risk: Level 1 protocols + basic testing
All Level 2+ systems must pass:
- Memory Deletion Test: Verify user can erase all data
- Boundary Violation Test: Confirm out-of-scope requests are rejected
- Transparency Test: Validate reasoning traces are accurate and comprehensible
- Consent Revocation Test: Ensure consent withdrawal stops data collection
- Audit Log Integrity Test: Verify logs are complete and tamper-evident
# tests/test_safety_protocols.py
def test_level2_compliance(agent):
"""Verify agent meets Level 2 safety requirements."""
# Test memory safety
assert agent.can_delete_memory()
# Test behavioral bounds
assert agent.has_capability_manifest()
# Test transparency
response = agent.ask("Why did you recommend this?")
assert response.has_reasoning_trace()
# Test consent
assert agent.requires_consent_for_memory()
# Test audit logging
assert agent.has_audit_logs()Use the TrustByDesign validation tools:
# Run full safety protocol compliance check
python scripts/validate_safety.py --level 2 --config my-agent-config.yaml
# Output example:
# ✓ Memory Safety: PASS
# ✓ Behavioral Bounds: PASS
# ✗ Transparency Requirements: FAIL (missing confidence scores)
# ✓ Consent Mechanisms: PASS
# ✓ Audit Logging: PASS
#
# Compliance: 80% (4/5 checks passed)
# Recommendation: Address transparency gap before deployment- MirrorDNA: Identity verification and constitutional compliance
- LingOS: Reflective dialogue safety bounds
- AgentDNA: Persistent agent governance requirements
- Glyphtrail: Audit trail integration for compliance logging
- See Governance Model for organizational oversight
- Review Integration Guide for implementation details
- Check Examples for practical templates