v1 Policy Spec — `@agent-assistant/policy`

Status: IMPLEMENTATION_READY Date: 2026-04-12 Package: @agent-assistant/policy Version target: v0.1.0 (pre-1.0, provisional) Roadmap stage: v1.4 (after core, sessions, surfaces, memory, connectivity, routing, coordination, proactive land) Scope reference: docs/architecture/v1-policy-scope.md

1. Responsibilities

@agent-assistant/policy provides the classification, gating, and audit contract for assistant actions — the layer between "the assistant decided to act" and "the action actually executes."

Owns:

PolicyEngine — the central evaluator for action risk, rule matching, and decisions
Action — the typed unit of policy evaluation; products produce Actions before executing external operations
RiskLevel — four-level enum (low | medium | high | critical) used across the engine and rules
RiskClassifier — interface for assigning a risk level to an action; defaultRiskClassifier ships in-box
PolicyRule — product-supplied rule definition with priority, evaluate function, and description
PolicyDecision — output of rule evaluation: allow | deny | require_approval | escalate plus supporting metadata
PolicyEvaluationContext — session-scoped context made available to every rule during evaluation
ApprovalHint — data contract describing what approval is needed (approver role, timeout, prompt text)
ApprovalResolution — data contract recording how an approval was resolved (approved/rejected, by whom)
AuditEvent — structured record of every evaluation: action, risk level, decision, approval resolution
AuditSink — pluggable interface for routing audit events to a persistence backend
InMemoryAuditSink — test adapter that accumulates events in an accessible array
Rule management methods — registerRule, removeRule, listRules
Fallback decision — configurable per engine instance; defaults to require_approval
Error types — PolicyError, RuleNotFoundError, ClassificationError

Does NOT own:

Product-specific action type catalogs (e.g., "PR merge rules") — product repos
Commercial tier logic or pricing enforcement — product repos
Customer-specific escalation chains or notification flows — product repos
User authentication and identity management — relay foundation (relayauth)
Fleet-level rate limiting across assistant instances — relay foundation / cloud infra
Content moderation and safety filtering — external services / product repos
Approval UX (modals, Slack buttons, email confirmations) — product repos
Hosted audit pipelines and audit storage — AgentWorkforce/cloud
Scheduler integration for time-based auto-escalation — deferred to v1.1

2. Non-Goals

This package does not implement approval workflows. It defines the data contract; products own the UX and resolution flow.
This package does not persist rules. v1 rule storage is in-memory only.
This package does not register a capability handler. Products call policyEngine.evaluate() inside their own handlers.
This package does not emit connectivity signals or interact with surfaces directly.
This package does not own session lifecycle. Session IDs and user IDs are passed in by the caller.
This package does not rate-limit across sessions, workspaces, or users. That is infrastructure scope.

3. Action Classification

3.1 Action

An Action is a proposed assistant operation with consequences beyond generating text. It is the unit of policy evaluation.

interface Action {
  /** Unique identifier for this action instance. Generated by the product before calling evaluate(). */
  id: string;

  /**
   * Action type string. Products define their own action types (e.g., 'send_email', 'merge_pr').
   * The package does not define a fixed type catalog.
   */
  type: string;

  /** Human-readable description for audit logs and approval UX. */
  description: string;

  /** The session in which this action was proposed. */
  sessionId: string;

  /** The user on whose behalf the action is being taken. */
  userId: string;

  /**
   * Whether this action was initiated proactively (no user message in the current turn).
   * Required — callers must be explicit. Policy rules use this to gate proactive actions more strictly.
   */
  proactive: boolean;

  /** Product-supplied metadata. Available to classifiers and rule evaluate functions. */
  metadata?: Record<string, unknown>;
}

3.2 Risk Level

type RiskLevel = 'low' | 'medium' | 'high' | 'critical';

Level	Meaning	Default gating
`low`	Reversible, internal, no external side effects	Auto-approve
`medium`	External but limited blast radius; reversible with effort	Auto-approve with audit
`high`	Significant external consequences; hard to reverse	Require human approval
`critical`	Irreversible, broad impact, or affects shared state	Escalate or deny

These are defaults. Products override gating through registered policy rules.

3.3 Risk Classifier

interface RiskClassifier {
  classify(action: Action): RiskLevel | Promise<RiskLevel>;
}

The package ships defaultRiskClassifier, which returns medium for all actions. This ensures unclassified actions are not silently auto-approved.

Products supply classifiers that inspect action.type and action.metadata to assign appropriate risk levels:

const myClassifier: RiskClassifier = {
  classify(action) {
    if (action.type === 'merge_pr') return 'high';
    if (action.type === 'post_comment') return 'medium';
    if (action.type === 'read_pr_list') return 'low';
    return 'medium'; // fallthrough to default
  },
};

Classifiers may be async — useful when external context (e.g., PR size, target branch) informs the risk level.

4. Policy Rules

4.1 PolicyRule

interface PolicyRule {
  id: string;

  /**
   * Evaluation priority. Lower numbers evaluate first. Default: 100.
   * When two rules share the same priority, registration order is used (FIFO).
   */
  priority?: number;

  /**
   * Evaluate the action and return a decision or null.
   * Returning null defers to the next rule in priority order.
   * If no rule returns a non-null decision, the engine applies the fallback decision.
   */
  evaluate(
    action: Action,
    riskLevel: RiskLevel,
    context: PolicyEvaluationContext
  ): PolicyDecision | null | Promise<PolicyDecision | null>;

  /** Human-readable description for observability and audit. */
  description?: string;
}

4.2 PolicyEvaluationContext

interface PolicyEvaluationContext {
  sessionId: string;
  userId: string;
  workspaceId?: string;

  /**
   * Whether the action originates from a proactive engine (no user message in current turn).
   * Rules should apply stricter gating when this is true.
   */
  proactive: boolean;

  /** Product-supplied context values for rule logic. */
  metadata?: Record<string, unknown>;
}

4.3 PolicyDecision

interface PolicyDecision {
  /** The decision outcome. */
  action: 'allow' | 'deny' | 'require_approval' | 'escalate';

  /** The rule that produced this decision. Use 'fallback' when the fallback decision applies. */
  ruleId: string;

  /** The risk level that was in effect at evaluation time. */
  riskLevel: RiskLevel;

  /** Human-readable reason for observability and audit. */
  reason?: string;

  /**
   * Present when action is 'require_approval'. Hints for the approval UX.
   * The package does not implement approval UX — products use this to drive their UI.
   */
  approvalHint?: ApprovalHint;
}

4.4 Decision Semantics

Decision	Meaning	Caller behavior
`allow`	The action may proceed	Execute the action
`deny`	The action is prohibited	Do not execute; surface a denial reason to the user
`require_approval`	A human must approve before execution	Block execution; enter approval flow using `ApprovalHint`
`escalate`	The action exceeds local authority; route to a higher authority	Block execution; notify configured escalation target

5. Approval Contract

5.1 ApprovalHint

interface ApprovalHint {
  /** Suggested approver role or identity (e.g., 'workspace_admin', 'user', 'team_lead'). */
  approver?: string;

  /** Suggested timeout before auto-escalating, in milliseconds. */
  timeoutMs?: number;

  /** Prompt text to present to the approver. */
  prompt?: string;
}

ApprovalHint is informational. The engine does not enforce timeouts or send notifications. Products own the approval UX and resolution flow.

5.2 ApprovalResolution

interface ApprovalResolution {
  /** Whether the approver approved (true) or rejected (false) the action. */
  approved: boolean;

  /** Identity of the approver, if known. */
  approvedBy?: string;

  /** ISO-8601 timestamp of resolution. */
  resolvedAt: string;

  /** Optional comment from the approver. */
  comment?: string;
}

Products record the resolution by passing an ApprovalResolution to the audit sink after the approval flow completes. The engine does not own the approval workflow state.

5.3 Recording Approval Outcomes

After an approval is resolved, products record the outcome by updating the audit event:

// After evaluate() returns require_approval and the product resolves the approval:
const resolution: ApprovalResolution = {
  approved: true,
  approvedBy: 'khaliq@example.com',
  resolvedAt: new Date().toISOString(),
  comment: 'Reviewed and confirmed',
};

await auditSink.record({
  ...originalAuditEvent,
  approval: resolution,
});

This is the product's responsibility. The package provides the type contract.

6. Policy Engine

6.1 Factory

interface PolicyEngineConfig {
  /**
   * Risk classifier. Defaults to defaultRiskClassifier (always returns 'medium').
   */
  classifier?: RiskClassifier;

  /**
   * Fallback decision when no rule produces a non-null result.
   * Defaults to 'require_approval'.
   */
  fallbackDecision?: PolicyDecision['action'];

  /**
   * Audit sink for recording evaluation events.
   * Defaults to a no-op sink. Pass InMemoryAuditSink for tests.
   */
  auditSink?: AuditSink;
}

function createActionPolicy(config?: PolicyEngineConfig): PolicyEngine;

6.2 PolicyEngine Interface

interface PolicyEngine {
  /**
   * Evaluate an action:
   * 1. Classify risk using the configured classifier.
   * 2. Build a PolicyEvaluationContext from the action.
   * 3. Evaluate rules in priority order until one returns a non-null decision.
   * 4. If no rule matches, apply the fallback decision.
   * 5. Record an AuditEvent to the configured AuditSink.
   * 6. Return the PolicyDecision.
   */
  evaluate(action: Action): Promise<PolicyDecision>;

  /** Register a policy rule. Throws RuleNotFoundError if a rule with the same id is already registered. */
  registerRule(rule: PolicyRule): void;

  /** Remove a rule by id. Throws RuleNotFoundError if not found. */
  removeRule(ruleId: string): void;

  /** List registered rules, sorted by priority then registration order. */
  listRules(): PolicyRule[];
}

6.3 Evaluation Algorithm

evaluate(action):
  1. riskLevel = await classifier.classify(action)
     - If classify throws, wrap in ClassificationError and re-throw
  2. context = buildContext(action)
  3. sortedRules = rules sorted by (priority ASC, registrationOrder ASC)
  4. for rule in sortedRules:
       decision = await rule.evaluate(action, riskLevel, context)
       if decision is not null:
         break → use this decision
  5. if no decision found:
       decision = { action: fallbackDecision, ruleId: 'fallback', riskLevel, reason: 'No rule matched.' }
  6. auditEvent = buildAuditEvent(action, riskLevel, decision)
  7. await auditSink.record(auditEvent)
  8. return decision

The evaluation loop is fail-fast: if a rule's evaluate function throws, the error propagates as a PolicyError and no audit event is recorded for that call. Products may wrap evaluate() in a try/catch to handle policy errors gracefully.

7. Audit Hooks

7.1 AuditEvent

interface AuditEvent {
  /** Unique event ID generated by the engine. */
  id: string;

  /** The action that was evaluated. Snapshot of the action at evaluation time. */
  action: Action;

  /** The risk level assigned by the classifier. */
  riskLevel: RiskLevel;

  /** The decision reached (from a rule or the fallback). */
  decision: PolicyDecision;

  /** ISO-8601 timestamp of evaluation. */
  evaluatedAt: string;

  /**
   * Approval resolution, if the decision was 'require_approval' and the product has
   * recorded the outcome. Not set by the engine — products write this after resolution.
   */
  approval?: ApprovalResolution;
}

7.2 AuditSink

interface AuditSink {
  record(event: AuditEvent): Promise<void>;
}

Audit is always-on: every evaluate() call records an event. Products that do not need audit pass a no-op sink:

const noOpSink: AuditSink = { record: async () => {} };

7.3 InMemoryAuditSink

class InMemoryAuditSink implements AuditSink {
  /** All recorded events in order. */
  readonly events: AuditEvent[];

  async record(event: AuditEvent): Promise<void>;

  /** Clear all recorded events. */
  clear(): void;
}

Used in tests and local development. No external infrastructure required.

8. Proactive Action Gating

The proactive: boolean field on Action and PolicyEvaluationContext is required and non-optional. Callers must be explicit about action origin.

Rationale: proactive actions originate without a user request in the current turn. They carry higher risk because the user has not expressed intent and may be unaware the action is being taken. Policy rules that differentiate proactive vs. interactive actions are a core use case.

Pattern for proactive-stricter rules:

const proactiveGatingRule: PolicyRule = {
  id: 'proactive-require-approval',
  priority: 10,
  description: 'Require approval for all proactive high or critical actions',
  evaluate(action, riskLevel, context) {
    if (context.proactive && (riskLevel === 'high' || riskLevel === 'critical')) {
      return {
        action: 'require_approval',
        ruleId: 'proactive-require-approval',
        riskLevel,
        reason: 'Proactive high-risk actions require explicit human approval.',
        approvalHint: {
          approver: 'user',
          prompt: `The assistant is about to perform a proactive action: ${action.description}. Approve?`,
        },
      };
    }
    return null;
  },
};

9. Error Types

/** Base class for all policy errors. */
class PolicyError extends Error {
  constructor(message: string, public readonly cause?: unknown);
}

/** Thrown when a rule is not found by the given id. */
class RuleNotFoundError extends PolicyError {
  constructor(public readonly ruleId: string);
}

/** Thrown when the risk classifier throws or returns an invalid value. */
class ClassificationError extends PolicyError {
  constructor(message: string, cause?: unknown);
}

10. Integration Patterns

10.1 Basic Setup

import { createActionPolicy, InMemoryAuditSink } from '@agent-assistant/policy';
import type { PolicyRule, RiskClassifier } from '@agent-assistant/policy';

const auditSink = new InMemoryAuditSink();

const classifier: RiskClassifier = {
  classify(action) {
    switch (action.type) {
      case 'send_email': return 'high';
      case 'create_draft': return 'medium';
      case 'read_inbox': return 'low';
      default: return 'medium';
    }
  },
};

const policyEngine = createActionPolicy({ classifier, auditSink });

// Register rules
policyEngine.registerRule({
  id: 'deny-critical',
  priority: 1,
  description: 'Deny all critical-risk actions in v1',
  evaluate(action, riskLevel) {
    if (riskLevel === 'critical') {
      return { action: 'deny', ruleId: 'deny-critical', riskLevel, reason: 'Critical actions are not permitted.' };
    }
    return null;
  },
});

10.2 Wiring to a Proactive Capability Handler

// In product capability handler (not in the policy package):
capabilities: {
  proactive: async (message, context) => {
    const followUpDecisions = await proactiveEngine.evaluateFollowUp(/* ... */);

    for (const decision of followUpDecisions) {
      if (decision.action !== 'fire') continue;

      const action: Action = {
        id: nanoid(),
        type: 'proactive_follow_up',
        description: decision.messageTemplate ?? 'Proactive follow-up',
        sessionId: decision.sessionId,
        userId: sessionUserId,
        proactive: true,
      };

      const policyDecision = await policyEngine.evaluate(action);

      if (policyDecision.action === 'allow') {
        await context.runtime.emit({ sessionId: action.sessionId, text: decision.messageTemplate });
      } else if (policyDecision.action === 'require_approval') {
        // Product handles approval UX using policyDecision.approvalHint
      }
    }
  },
}

10.3 Wiring Traits to Policy Configuration

// Products map trait values to policy config — the policy package does not read traits:
const policyEngine = createActionPolicy({
  fallbackDecision: traits.riskTolerance === 'cautious' ? 'deny' : 'require_approval',
  classifier: buildClassifierFromTraits(traits),
});

11. v1 Scope vs. Deferred

In Scope for v1

Capability	Detail
`createActionPolicy(config)` factory	Returns a `PolicyEngine` with all v1 methods
`defaultRiskClassifier`	Returns `medium` for all actions; products override
`RiskClassifier` interface	Products supply; async supported
`PolicyRule` registration	`engine.registerRule(rule)` with priority ordering
`PolicyRule` management	`registerRule`, `removeRule`, `listRules`
Policy evaluation	`engine.evaluate(action)` — classify, evaluate rules, fallback, audit
Fallback decision	Configurable; defaults to `require_approval`
`ApprovalHint` and `ApprovalResolution` contracts	Types only; no workflow implementation
`AuditSink` interface	Pluggable; always called on every `evaluate()`
`InMemoryAuditSink`	Test adapter with accessible `events` array
Proactive flag	Required field on `Action` and `PolicyEvaluationContext`
Error types	`PolicyError`, `RuleNotFoundError`, `ClassificationError`
42+ tests	Per project DoD standard

Deferred

Capability	Target	Reason
Persistent rule storage adapter	v1.1	v1 is in-memory only
Approval workflow engine (timeouts, notifications)	v1.1	Products own approval UX in v1
Time-based auto-escalation	v1.1	Requires scheduler binding integration
Cumulative risk budgets per session	v1.1	Requires action history tracking
Cross-session / org-level policy	v2	Broader scoping model not yet designed
Policy inheritance and override chains	v2	v1 uses flat priority-ordered rules
ML-based risk classification	v2+	Function-based classifiers in v1
Distributed policy evaluation	v2+	Single-process in v1
Action rollback contracts	v2+	Requires undo semantics not yet designed
Compliance framework mappings (SOC2, GDPR)	v3+	Enterprise concern; not v1 SDK scope

12. Package Structure

packages/policy/
  package.json          — zero runtime dependencies (nanoid for IDs)
  tsconfig.json
  src/
    types.ts            — Action, RiskLevel, PolicyRule, PolicyDecision, AuditEvent,
                          ApprovalHint, ApprovalResolution, PolicyEvaluationContext,
                          AuditSink, InMemoryAuditSink, error classes
    policy.ts           — createActionPolicy factory, PolicyEngine implementation,
                          rule registration, evaluation loop, classification, audit recording
    index.ts            — public re-exports
    policy.test.ts      — 42+ tests
  README.md

13. Test Categories (Minimum 42)

Category	Count	Coverage
Type structural tests	4	All required types and interfaces exist and export correctly
Risk classification	5	Default classifier, custom classifier, async classifier, classification error, unclassified action default
Policy rule registration	5	Register, list, remove, duplicate rejection, priority ordering
Policy evaluation — basic	6	Single rule allow, deny, require_approval, escalate, null-deferral, fallback
Policy evaluation — priority	4	Priority ordering, first-match-wins, same-priority stability, rule removal mid-evaluation
Proactive action gating	4	Proactive flag respected, stricter rule for proactive, proactive+high risk, proactive+low risk
Approval contract	3	ApprovalHint on require_approval, ApprovalResolution structure, approval metadata fields
Audit sink	5	InMemoryAuditSink recording, event structure, allow/deny/require_approval audit, clear()
Error handling	3	PolicyError, RuleNotFoundError, ClassificationError
Fallback decision	3	Default fallback is require_approval, configurable fallback, fallback with audit
Allow/deny/escalate outcomes	3	End-to-end allow, end-to-end deny, end-to-end escalate
Total	45

V1_POLICY_SPEC_READY

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1 Policy Spec — `@agent-assistant/policy`

1. Responsibilities

2. Non-Goals

3. Action Classification

3.1 Action

3.2 Risk Level

3.3 Risk Classifier

4. Policy Rules

4.1 PolicyRule

4.2 PolicyEvaluationContext

4.3 PolicyDecision

4.4 Decision Semantics

5. Approval Contract

5.1 ApprovalHint

5.2 ApprovalResolution

5.3 Recording Approval Outcomes

6. Policy Engine

6.1 Factory

6.2 PolicyEngine Interface

6.3 Evaluation Algorithm

7. Audit Hooks

7.1 AuditEvent

7.2 AuditSink

7.3 InMemoryAuditSink

8. Proactive Action Gating

9. Error Types

10. Integration Patterns

10.1 Basic Setup

10.2 Wiring to a Proactive Capability Handler

10.3 Wiring Traits to Policy Configuration

11. v1 Scope vs. Deferred

In Scope for v1

Deferred

12. Package Structure

13. Test Categories (Minimum 42)

FilesExpand file tree

v1-policy-spec.md

Latest commit

History

v1-policy-spec.md

File metadata and controls

v1 Policy Spec — @agent-assistant/policy

1. Responsibilities

2. Non-Goals

3. Action Classification

3.1 Action

3.2 Risk Level

3.3 Risk Classifier

4. Policy Rules

4.1 PolicyRule

4.2 PolicyEvaluationContext

4.3 PolicyDecision

4.4 Decision Semantics

5. Approval Contract

5.1 ApprovalHint

5.2 ApprovalResolution

5.3 Recording Approval Outcomes

6. Policy Engine

6.1 Factory

6.2 PolicyEngine Interface

6.3 Evaluation Algorithm

7. Audit Hooks

7.1 AuditEvent

7.2 AuditSink

7.3 InMemoryAuditSink

8. Proactive Action Gating

9. Error Types

10. Integration Patterns

10.1 Basic Setup

10.2 Wiring to a Proactive Capability Handler

10.3 Wiring Traits to Policy Configuration

11. v1 Scope vs. Deferred

In Scope for v1

Deferred

12. Package Structure

13. Test Categories (Minimum 42)

v1 Policy Spec — `@agent-assistant/policy`