Status: IMPLEMENTATION_READY
Date: 2026-04-12
Package: @agent-assistant/policy
Version target: v0.1.0 (pre-1.0, provisional)
Roadmap stage: v1.4 (after core, sessions, surfaces, memory, connectivity, routing, coordination, proactive land)
Scope reference: docs/architecture/v1-policy-scope.md
@agent-assistant/policy provides the classification, gating, and audit contract for assistant actions — the layer between "the assistant decided to act" and "the action actually executes."
Owns:
PolicyEngine— the central evaluator for action risk, rule matching, and decisionsAction— the typed unit of policy evaluation; products produce Actions before executing external operationsRiskLevel— four-level enum (low | medium | high | critical) used across the engine and rulesRiskClassifier— interface for assigning a risk level to an action;defaultRiskClassifierships in-boxPolicyRule— product-supplied rule definition with priority, evaluate function, and descriptionPolicyDecision— output of rule evaluation:allow | deny | require_approval | escalateplus supporting metadataPolicyEvaluationContext— session-scoped context made available to every rule during evaluationApprovalHint— data contract describing what approval is needed (approver role, timeout, prompt text)ApprovalResolution— data contract recording how an approval was resolved (approved/rejected, by whom)AuditEvent— structured record of every evaluation: action, risk level, decision, approval resolutionAuditSink— pluggable interface for routing audit events to a persistence backendInMemoryAuditSink— test adapter that accumulates events in an accessible array- Rule management methods —
registerRule,removeRule,listRules - Fallback decision — configurable per engine instance; defaults to
require_approval - Error types —
PolicyError,RuleNotFoundError,ClassificationError
Does NOT own:
- Product-specific action type catalogs (e.g., "PR merge rules") — product repos
- Commercial tier logic or pricing enforcement — product repos
- Customer-specific escalation chains or notification flows — product repos
- User authentication and identity management — relay foundation (relayauth)
- Fleet-level rate limiting across assistant instances — relay foundation / cloud infra
- Content moderation and safety filtering — external services / product repos
- Approval UX (modals, Slack buttons, email confirmations) — product repos
- Hosted audit pipelines and audit storage —
AgentWorkforce/cloud - Scheduler integration for time-based auto-escalation — deferred to v1.1
- This package does not implement approval workflows. It defines the data contract; products own the UX and resolution flow.
- This package does not persist rules. v1 rule storage is in-memory only.
- This package does not register a capability handler. Products call
policyEngine.evaluate()inside their own handlers. - This package does not emit connectivity signals or interact with surfaces directly.
- This package does not own session lifecycle. Session IDs and user IDs are passed in by the caller.
- This package does not rate-limit across sessions, workspaces, or users. That is infrastructure scope.
An Action is a proposed assistant operation with consequences beyond generating text. It is the unit of policy evaluation.
interface Action {
/** Unique identifier for this action instance. Generated by the product before calling evaluate(). */
id: string;
/**
* Action type string. Products define their own action types (e.g., 'send_email', 'merge_pr').
* The package does not define a fixed type catalog.
*/
type: string;
/** Human-readable description for audit logs and approval UX. */
description: string;
/** The session in which this action was proposed. */
sessionId: string;
/** The user on whose behalf the action is being taken. */
userId: string;
/**
* Whether this action was initiated proactively (no user message in the current turn).
* Required — callers must be explicit. Policy rules use this to gate proactive actions more strictly.
*/
proactive: boolean;
/** Product-supplied metadata. Available to classifiers and rule evaluate functions. */
metadata?: Record<string, unknown>;
}type RiskLevel = 'low' | 'medium' | 'high' | 'critical';| Level | Meaning | Default gating |
|---|---|---|
low |
Reversible, internal, no external side effects | Auto-approve |
medium |
External but limited blast radius; reversible with effort | Auto-approve with audit |
high |
Significant external consequences; hard to reverse | Require human approval |
critical |
Irreversible, broad impact, or affects shared state | Escalate or deny |
These are defaults. Products override gating through registered policy rules.
interface RiskClassifier {
classify(action: Action): RiskLevel | Promise<RiskLevel>;
}The package ships defaultRiskClassifier, which returns medium for all actions. This ensures unclassified actions are not silently auto-approved.
Products supply classifiers that inspect action.type and action.metadata to assign appropriate risk levels:
const myClassifier: RiskClassifier = {
classify(action) {
if (action.type === 'merge_pr') return 'high';
if (action.type === 'post_comment') return 'medium';
if (action.type === 'read_pr_list') return 'low';
return 'medium'; // fallthrough to default
},
};Classifiers may be async — useful when external context (e.g., PR size, target branch) informs the risk level.
interface PolicyRule {
id: string;
/**
* Evaluation priority. Lower numbers evaluate first. Default: 100.
* When two rules share the same priority, registration order is used (FIFO).
*/
priority?: number;
/**
* Evaluate the action and return a decision or null.
* Returning null defers to the next rule in priority order.
* If no rule returns a non-null decision, the engine applies the fallback decision.
*/
evaluate(
action: Action,
riskLevel: RiskLevel,
context: PolicyEvaluationContext
): PolicyDecision | null | Promise<PolicyDecision | null>;
/** Human-readable description for observability and audit. */
description?: string;
}interface PolicyEvaluationContext {
sessionId: string;
userId: string;
workspaceId?: string;
/**
* Whether the action originates from a proactive engine (no user message in current turn).
* Rules should apply stricter gating when this is true.
*/
proactive: boolean;
/** Product-supplied context values for rule logic. */
metadata?: Record<string, unknown>;
}interface PolicyDecision {
/** The decision outcome. */
action: 'allow' | 'deny' | 'require_approval' | 'escalate';
/** The rule that produced this decision. Use 'fallback' when the fallback decision applies. */
ruleId: string;
/** The risk level that was in effect at evaluation time. */
riskLevel: RiskLevel;
/** Human-readable reason for observability and audit. */
reason?: string;
/**
* Present when action is 'require_approval'. Hints for the approval UX.
* The package does not implement approval UX — products use this to drive their UI.
*/
approvalHint?: ApprovalHint;
}| Decision | Meaning | Caller behavior |
|---|---|---|
allow |
The action may proceed | Execute the action |
deny |
The action is prohibited | Do not execute; surface a denial reason to the user |
require_approval |
A human must approve before execution | Block execution; enter approval flow using ApprovalHint |
escalate |
The action exceeds local authority; route to a higher authority | Block execution; notify configured escalation target |
interface ApprovalHint {
/** Suggested approver role or identity (e.g., 'workspace_admin', 'user', 'team_lead'). */
approver?: string;
/** Suggested timeout before auto-escalating, in milliseconds. */
timeoutMs?: number;
/** Prompt text to present to the approver. */
prompt?: string;
}ApprovalHint is informational. The engine does not enforce timeouts or send notifications. Products own the approval UX and resolution flow.
interface ApprovalResolution {
/** Whether the approver approved (true) or rejected (false) the action. */
approved: boolean;
/** Identity of the approver, if known. */
approvedBy?: string;
/** ISO-8601 timestamp of resolution. */
resolvedAt: string;
/** Optional comment from the approver. */
comment?: string;
}Products record the resolution by passing an ApprovalResolution to the audit sink after the approval flow completes. The engine does not own the approval workflow state.
After an approval is resolved, products record the outcome by updating the audit event:
// After evaluate() returns require_approval and the product resolves the approval:
const resolution: ApprovalResolution = {
approved: true,
approvedBy: 'khaliq@example.com',
resolvedAt: new Date().toISOString(),
comment: 'Reviewed and confirmed',
};
await auditSink.record({
...originalAuditEvent,
approval: resolution,
});This is the product's responsibility. The package provides the type contract.
interface PolicyEngineConfig {
/**
* Risk classifier. Defaults to defaultRiskClassifier (always returns 'medium').
*/
classifier?: RiskClassifier;
/**
* Fallback decision when no rule produces a non-null result.
* Defaults to 'require_approval'.
*/
fallbackDecision?: PolicyDecision['action'];
/**
* Audit sink for recording evaluation events.
* Defaults to a no-op sink. Pass InMemoryAuditSink for tests.
*/
auditSink?: AuditSink;
}
function createActionPolicy(config?: PolicyEngineConfig): PolicyEngine;interface PolicyEngine {
/**
* Evaluate an action:
* 1. Classify risk using the configured classifier.
* 2. Build a PolicyEvaluationContext from the action.
* 3. Evaluate rules in priority order until one returns a non-null decision.
* 4. If no rule matches, apply the fallback decision.
* 5. Record an AuditEvent to the configured AuditSink.
* 6. Return the PolicyDecision.
*/
evaluate(action: Action): Promise<PolicyDecision>;
/** Register a policy rule. Throws RuleNotFoundError if a rule with the same id is already registered. */
registerRule(rule: PolicyRule): void;
/** Remove a rule by id. Throws RuleNotFoundError if not found. */
removeRule(ruleId: string): void;
/** List registered rules, sorted by priority then registration order. */
listRules(): PolicyRule[];
}evaluate(action):
1. riskLevel = await classifier.classify(action)
- If classify throws, wrap in ClassificationError and re-throw
2. context = buildContext(action)
3. sortedRules = rules sorted by (priority ASC, registrationOrder ASC)
4. for rule in sortedRules:
decision = await rule.evaluate(action, riskLevel, context)
if decision is not null:
break → use this decision
5. if no decision found:
decision = { action: fallbackDecision, ruleId: 'fallback', riskLevel, reason: 'No rule matched.' }
6. auditEvent = buildAuditEvent(action, riskLevel, decision)
7. await auditSink.record(auditEvent)
8. return decision
The evaluation loop is fail-fast: if a rule's evaluate function throws, the error propagates as a PolicyError and no audit event is recorded for that call. Products may wrap evaluate() in a try/catch to handle policy errors gracefully.
interface AuditEvent {
/** Unique event ID generated by the engine. */
id: string;
/** The action that was evaluated. Snapshot of the action at evaluation time. */
action: Action;
/** The risk level assigned by the classifier. */
riskLevel: RiskLevel;
/** The decision reached (from a rule or the fallback). */
decision: PolicyDecision;
/** ISO-8601 timestamp of evaluation. */
evaluatedAt: string;
/**
* Approval resolution, if the decision was 'require_approval' and the product has
* recorded the outcome. Not set by the engine — products write this after resolution.
*/
approval?: ApprovalResolution;
}interface AuditSink {
record(event: AuditEvent): Promise<void>;
}Audit is always-on: every evaluate() call records an event. Products that do not need audit pass a no-op sink:
const noOpSink: AuditSink = { record: async () => {} };class InMemoryAuditSink implements AuditSink {
/** All recorded events in order. */
readonly events: AuditEvent[];
async record(event: AuditEvent): Promise<void>;
/** Clear all recorded events. */
clear(): void;
}Used in tests and local development. No external infrastructure required.
The proactive: boolean field on Action and PolicyEvaluationContext is required and non-optional. Callers must be explicit about action origin.
Rationale: proactive actions originate without a user request in the current turn. They carry higher risk because the user has not expressed intent and may be unaware the action is being taken. Policy rules that differentiate proactive vs. interactive actions are a core use case.
Pattern for proactive-stricter rules:
const proactiveGatingRule: PolicyRule = {
id: 'proactive-require-approval',
priority: 10,
description: 'Require approval for all proactive high or critical actions',
evaluate(action, riskLevel, context) {
if (context.proactive && (riskLevel === 'high' || riskLevel === 'critical')) {
return {
action: 'require_approval',
ruleId: 'proactive-require-approval',
riskLevel,
reason: 'Proactive high-risk actions require explicit human approval.',
approvalHint: {
approver: 'user',
prompt: `The assistant is about to perform a proactive action: ${action.description}. Approve?`,
},
};
}
return null;
},
};/** Base class for all policy errors. */
class PolicyError extends Error {
constructor(message: string, public readonly cause?: unknown);
}
/** Thrown when a rule is not found by the given id. */
class RuleNotFoundError extends PolicyError {
constructor(public readonly ruleId: string);
}
/** Thrown when the risk classifier throws or returns an invalid value. */
class ClassificationError extends PolicyError {
constructor(message: string, cause?: unknown);
}import { createActionPolicy, InMemoryAuditSink } from '@agent-assistant/policy';
import type { PolicyRule, RiskClassifier } from '@agent-assistant/policy';
const auditSink = new InMemoryAuditSink();
const classifier: RiskClassifier = {
classify(action) {
switch (action.type) {
case 'send_email': return 'high';
case 'create_draft': return 'medium';
case 'read_inbox': return 'low';
default: return 'medium';
}
},
};
const policyEngine = createActionPolicy({ classifier, auditSink });
// Register rules
policyEngine.registerRule({
id: 'deny-critical',
priority: 1,
description: 'Deny all critical-risk actions in v1',
evaluate(action, riskLevel) {
if (riskLevel === 'critical') {
return { action: 'deny', ruleId: 'deny-critical', riskLevel, reason: 'Critical actions are not permitted.' };
}
return null;
},
});// In product capability handler (not in the policy package):
capabilities: {
proactive: async (message, context) => {
const followUpDecisions = await proactiveEngine.evaluateFollowUp(/* ... */);
for (const decision of followUpDecisions) {
if (decision.action !== 'fire') continue;
const action: Action = {
id: nanoid(),
type: 'proactive_follow_up',
description: decision.messageTemplate ?? 'Proactive follow-up',
sessionId: decision.sessionId,
userId: sessionUserId,
proactive: true,
};
const policyDecision = await policyEngine.evaluate(action);
if (policyDecision.action === 'allow') {
await context.runtime.emit({ sessionId: action.sessionId, text: decision.messageTemplate });
} else if (policyDecision.action === 'require_approval') {
// Product handles approval UX using policyDecision.approvalHint
}
}
},
}// Products map trait values to policy config — the policy package does not read traits:
const policyEngine = createActionPolicy({
fallbackDecision: traits.riskTolerance === 'cautious' ? 'deny' : 'require_approval',
classifier: buildClassifierFromTraits(traits),
});| Capability | Detail |
|---|---|
createActionPolicy(config) factory |
Returns a PolicyEngine with all v1 methods |
defaultRiskClassifier |
Returns medium for all actions; products override |
RiskClassifier interface |
Products supply; async supported |
PolicyRule registration |
engine.registerRule(rule) with priority ordering |
PolicyRule management |
registerRule, removeRule, listRules |
| Policy evaluation | engine.evaluate(action) — classify, evaluate rules, fallback, audit |
| Fallback decision | Configurable; defaults to require_approval |
ApprovalHint and ApprovalResolution contracts |
Types only; no workflow implementation |
AuditSink interface |
Pluggable; always called on every evaluate() |
InMemoryAuditSink |
Test adapter with accessible events array |
| Proactive flag | Required field on Action and PolicyEvaluationContext |
| Error types | PolicyError, RuleNotFoundError, ClassificationError |
| 42+ tests | Per project DoD standard |
| Capability | Target | Reason |
|---|---|---|
| Persistent rule storage adapter | v1.1 | v1 is in-memory only |
| Approval workflow engine (timeouts, notifications) | v1.1 | Products own approval UX in v1 |
| Time-based auto-escalation | v1.1 | Requires scheduler binding integration |
| Cumulative risk budgets per session | v1.1 | Requires action history tracking |
| Cross-session / org-level policy | v2 | Broader scoping model not yet designed |
| Policy inheritance and override chains | v2 | v1 uses flat priority-ordered rules |
| ML-based risk classification | v2+ | Function-based classifiers in v1 |
| Distributed policy evaluation | v2+ | Single-process in v1 |
| Action rollback contracts | v2+ | Requires undo semantics not yet designed |
| Compliance framework mappings (SOC2, GDPR) | v3+ | Enterprise concern; not v1 SDK scope |
packages/policy/
package.json — zero runtime dependencies (nanoid for IDs)
tsconfig.json
src/
types.ts — Action, RiskLevel, PolicyRule, PolicyDecision, AuditEvent,
ApprovalHint, ApprovalResolution, PolicyEvaluationContext,
AuditSink, InMemoryAuditSink, error classes
policy.ts — createActionPolicy factory, PolicyEngine implementation,
rule registration, evaluation loop, classification, audit recording
index.ts — public re-exports
policy.test.ts — 42+ tests
README.md
| Category | Count | Coverage |
|---|---|---|
| Type structural tests | 4 | All required types and interfaces exist and export correctly |
| Risk classification | 5 | Default classifier, custom classifier, async classifier, classification error, unclassified action default |
| Policy rule registration | 5 | Register, list, remove, duplicate rejection, priority ordering |
| Policy evaluation — basic | 6 | Single rule allow, deny, require_approval, escalate, null-deferral, fallback |
| Policy evaluation — priority | 4 | Priority ordering, first-match-wins, same-priority stability, rule removal mid-evaluation |
| Proactive action gating | 4 | Proactive flag respected, stricter rule for proactive, proactive+high risk, proactive+low risk |
| Approval contract | 3 | ApprovalHint on require_approval, ApprovalResolution structure, approval metadata fields |
| Audit sink | 5 | InMemoryAuditSink recording, event structure, allow/deny/require_approval audit, clear() |
| Error handling | 3 | PolicyError, RuleNotFoundError, ClassificationError |
| Fallback decision | 3 | Default fallback is require_approval, configurable fallback, fallback with audit |
| Allow/deny/escalate outcomes | 3 | End-to-end allow, end-to-end deny, end-to-end escalate |
| Total | 45 |
V1_POLICY_SPEC_READY