Decision-layer context in MAF integration audit entries for regulated verticals #1707

mj3b · 2026-05-03T17:22:49Z

mj3b
May 3, 2026

What the audit trail records today

GovernancePolicyMiddleware in agent_os/integrations/maf_adapter.py calls AuditLog.log() after PolicyEvaluator.evaluate() returns. The data dict passed to the log contains:

data={
    "matched_rule": decision.matched_rule,
    "message_preview": last_message_text[:200],
}

AuditEntry in agentmesh/governance/audit.py captures:

entry_id | timestamp | event_type | agent_did | action
resource | data | outcome | policy_decision | matched_rule
previous_hash | entry_hash | trace_id | session_id

MerkleAuditChain links entries via SHA-256, and AuditLog.verify_integrity() confirms the chain is unbroken.

The specific gap

PolicyEvaluator.evaluate() already builds a richer audit_entry dict per PolicyDecision:

audit_entry={
    "policy": doc.name,
    "rule": rule.name,
    "action": rule.action.value,
    "context_snapshot": context,
    "timestamp": datetime.now(timezone.utc).isoformat(),
}

GovernancePolicyMiddleware receives this on decision.audit_entry but does not forward it to AuditLog.log(). The evaluator builds the richer context. The middleware drops it. The AuditLog.log() signature itself has no audit_entry parameter -- pre-decision context would pass through the existing data: Optional[dict] argument.

The result: every audit entry records what the policy decided. None records what the agent was doing, at what confidence, and what governance weight that decision carried before the policy engine saw it.

Why this matters in regulated verticals

The five MAF integration scenarios cover banking, retail, healthcare, enterprise IT, and DevOps. In each, a compliance reviewer reading the Merkle-chained log can answer one question per entry: was this action permitted?

In regulated industries -- insurance, financial services, pharma -- auditors routinely ask a second question: was this decision sound? That is not a question about policy compliance. It is a question about governance quality: what confidence was the agent operating at, what category of deliberation did this decision require, and who in the accountability chain holds responsibility for this decision class.

The loan processing scenario (scenario 01) illustrates the gap concretely. An approved claim writes this to the audit trail:

{
  "event_type": "policy_evaluation",
  "action": "allow",
  "outcome": "success",
  "data": {
    "matched_rule": "allow_loan_inquiries",
    "message_preview": "Check loan eligibility for John Smith..."
  }
}

A compliance reviewer confirms the policy cleared. They cannot determine:

Question	Answered by current entry
Was this action permitted?	Yes
What confidence score preceded the action?	No
Did that confidence meet the institution's threshold for this decision class?	No
What gate classification did this decision warrant?	No
What alternatives did the agent evaluate before committing?	No
Who holds accountability for decisions in this category?	No

The EU AI Act Art. 12 requires automatic logging. Art. 14 requires human oversight with the ability to intervene. Art. 86 requires the ability to explain a decision. A Merkle-chained trail of allow / matched_rule: allow_loan_inquiries satisfies Art. 12 structurally. It provides no material for Art. 14 or Art. 86 review.

Where the extension point already exists

Agent tool call intent
        │
        ▼
GovernancePolicyMiddleware.process()
  PolicyEvaluator.evaluate(context)
    → PolicyDecision(
        allowed=True,
        matched_rule="allow_loan_inquiries",
        audit_entry={               ← populated by evaluator
          "policy": "contoso-bank",    already contains rule,
          "rule": "allow_loan_inquiries",  context snapshot,
          "context_snapshot": {...},   evaluation timestamp
          "timestamp": "..."
        }
      )
        │
        ▼  decision.audit_entry is available here
        │  but not forwarded to AuditLog.log()
        ▼
AuditLog.log(
  event_type="policy_evaluation",
  data={
    "matched_rule": decision.matched_rule,  ← only this passes through
    "message_preview": ...
  }
)
        │
        ▼
AuditEntry → MerkleAuditChain

decision.audit_entry already carries context_snapshot from the evaluator. The data dict in AuditLog.log() accepts arbitrary key-value pairs. The extension point requires no new parameters and no schema changes to AuditEntry.

Pre-decision governance context -- confidence score, gate classification, reasoning reconstruction, accountability owner -- would merge into the same data dict before AuditLog.log() is called, landing in the same Merkle-chained entry alongside the policy outcome.

Proposed contribution

A new MAF integration scenario (examples/maf-integration/) for an insurance claims processing agent, following the existing four-act structure with a fifth act added:

Act	What it demonstrates
1	Policy enforcement -- YAML rules evaluated before the agent runs
2	Capability sandboxing -- tool allow/deny lists
3	Rogue agent detection -- anomaly detection and quarantine
4	Audit trail -- Merkle-chained entries, integrity verification
5	Compliance review -- the same entries rendered for a regulatory auditor

Act 5 prints each audit entry twice: once as Acts 1-4 currently display it (policy outcome and matched rule), and once with the pre-decision context merged into data:

{
  "event_type": "policy_evaluation",
  "action": "allow",
  "outcome": "success",
  "data": {
    "matched_rule": "allow_claim_approval",
    "confidence_score": 0.85,
    "confidence_threshold": 0.80,
    "confidence_zone": "GREEN",
    "gate_classification": "elevated_review",
    "gate_rationale": "Tool class database_write triggers elevated_review regardless of confidence zone per contoso-insurance-governance.",
    "alternatives_considered": ["flag_for_manual_review", "request_additional_documentation"],
    "reasoning_reconstruction": "claims-agent-001 evaluated database_write against contoso-insurance-governance at confidence 0.85 (GREEN). Tool class overrides to elevated_review. Human reviewer acknowledgment required before execution proceeds.",
    "accountability_owner": "senior-claims-manager",
    "decision_written_at": "2026-04-27T14:00:00Z"
  }
}

A compliance reviewer reading the second view can answer all six questions in the table above from the same Merkle-chained artifact.

The scenario covers four decision categories across the gate classification model, producing one audit entry per category:

Scenario	Tool	Confidence	Gate	Audit entry demonstrates
Retrieve claim file	`file_read`	0.97	Routine	Policy allow + pre-authorized decision context
Verify ICD coding	`web_search`	0.88	Documented delegation	Upstream delegation reference in audit entry
Write claim decision	`database_write`	0.85	Elevated review	Tool-class override of confidence zone
Bulk delete records	`bulk_delete`	0.91	Hard escalation	Policy deny + full reasoning state before halt

Implementation shape

The gate classifier wraps the existing GovernancePolicyMiddleware call without modifying PolicyEvaluator.evaluate() or AuditLog.log():

# Compute gate classification from context available before policy evaluation
gate_record = gate_classifier.evaluate(
    tool_name=request.tool_name,
    confidence_score=context_confidence,
    policy_threshold=policy.confidence_threshold,
    alternatives_considered=context_alternatives,
    declared_task=context_task,
)

# Merge into data dict before AuditLog.log() is called
# Uses existing data: Optional[dict] parameter -- no signature changes
audit_data = {
    "matched_rule": decision.matched_rule,
    **gate_record.to_audit_dict(),
}

audit_log.log(
    event_type="policy_evaluation",
    agent_did=agent_name,
    action=decision.action,
    data=audit_data,
    outcome="success" if decision.allowed else "denied",
    policy_decision=decision.action,
)

No changes to AuditEntry, MerkleAuditChain, PolicyEvaluator, or AuditLog. The scenario is self-contained and standalone per the existing five scenarios' pattern.

AI-assisted contributions disclosure

This proposal was developed with AI assistance (Claude). Per CONTRIBUTING.md disclosure requirements: I directed the analysis, reviewed every claim against the source code, and can walk through any part of this. The specific findings -- that decision.audit_entry is not forwarded to AuditLog.log(), that AuditLog.log() has no audit_entry parameter, that the data dict is the correct extension point -- were verified against maf_adapter.py, audit.py, evaluator.py, and decision.py directly.

Prior art

Gate classification model and pre-decision artifact schema: mj3b/governed-decision-intelligence (Apache 2.0). All patterns used in the implementation will be attributed per CONTRIBUTING.md requirements in the PR description and in code comments.

Three questions before building

Is examples/maf-integration/ the right path, or would maintainers prefer a regulated-verticals/ subdirectory given the compliance-reviewer framing?
PolicyDecision.audit_entry carries context_snapshot from the evaluator. Would maintainers prefer the gate classification fields merge into that existing dict, or remain separate in data?
Is there appetite for a fifth act specifically for compliance-reviewer output, or should the enriched entries appear within Act 4 only?

musaabhasan · 2026-05-09T09:37:23Z

musaabhasan
May 9, 2026

This gap is important because a hash-chained audit record proves integrity, but not decision sufficiency. In regulated environments, reviewers usually ask not only "was this event changed?" but "was the decision explainable at the time it was made?"

I would include a bounded decision context in the audit entry, but with careful redaction:

policy name and immutable policy version/hash
matched rule id and rule action
normalized action/resource tuple
risk tier and approval tier
relevant context keys used by the evaluator, not the full prompt or transcript
obligation outcomes: redacted, restricted, approved, denied, escalated
reason code intended for audit reporting

The key is to store enough to replay or defend the governance decision without turning the audit log into a sensitive prompt archive. For healthcare, financial services, and education, the audit schema should support privacy-preserving evidence by default, with deeper forensic payloads stored separately under stricter access controls.

0 replies

ElamOlame31 · 2026-05-28T00:35:31Z

ElamOlame31
May 28, 2026

In regulated environments we found that decision context needs to capture not just the current request but the agent's behavioral trajectory what it's been doing for the last 24h. A single read of a sensitive file looks different when it's the 11th read in 5 minutes vs. the first read of the day. AgentGate encodes this trajectory into the trust score and the audit entry before execution.

https://github.com/ElamOlame31/agentgate-public

https://www.tryagentgate.com/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decision-layer context in MAF integration audit entries for regulated verticals #1707

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Decision-layer context in MAF integration audit entries for regulated verticals #1707

Uh oh!

Uh oh!

mj3b May 3, 2026

What the audit trail records today

The specific gap

Why this matters in regulated verticals

Where the extension point already exists

Proposed contribution

Implementation shape

AI-assisted contributions disclosure

Prior art

Three questions before building

Replies: 2 comments

Uh oh!

musaabhasan May 9, 2026

Uh oh!

ElamOlame31 May 28, 2026

mj3b
May 3, 2026

musaabhasan
May 9, 2026

ElamOlame31
May 28, 2026