Skip to content

Commit ca76049

Browse files
author
bgagent
committed
feat(cedar-hitl): pin Cedar engines and seed cross-engine parity contract
Chunk 1 of the Cedar HITL gates PR (docs/design/CEDAR_HITL_GATES.md). Lays the foundation before engine rewrites in Chunk 2+: both Cedar engines pinned exactly per decision aws-samples#23, annotation surface validated by Day-1 spikes per decision aws-samples#22, and the golden-file parity fixtures seeded so every subsequent chunk can rely on the contract. - Pin cedarpy==4.8.0 (agent) and @cedar-policy/cedar-wasm@4.10.0 (cdk) exactly (no ^/~); document both in mise.toml header. - Add agent/tests/test_cedarpy_annotations_contract.py (10 tests) validating all 5 annotations round-trip verbatim via policies_to_json_str() under staticPolicies.<id>.annotations. - Add cdk/test/handlers/shared/cedar-policy.test.ts (12 tests) validating policySetTextToParts + policyToJson extract the same annotations verbatim and isAuthorized returns the documented {type, response} wrapper shape. - Add contracts/cedar-parity/ with 5 golden-file fixtures (single-match, multi-match, hard-deny, soft-deny write, no-match default-allow) + README documenting the contract. Every fixture policy carries a @rule_id - including the base permit as @rule_id("base_permit") - so the parity tests raise if either engine returns an unannotated match instead of silently dropping it. - Add agent/tests/test_cedar_parity.py (6 tests, cedarpy side) and cdk/test/handlers/shared/cedar-parity.test.ts (6 tests, cedar-wasm side) loading the shared fixtures and asserting (decision, sorted rule_ids) match expected. Both tests hard-import cedarpy/cedar-wasm so a dependency regression fails loud rather than silently skipping. - Update docs/design/CEDAR_HITL_GATES.md sections 15.2 row 3, 15.6 prose and the parity mermaid diagram to point at contracts/cedar-parity/ (the precedent set by contracts/memory-hash-vectors.json) instead of a new tests/fixtures/ dir. Regenerate the Starlight mirror. - Add IMPL-29 noting the cedarpy diagnostics.reasons / cedar-wasm diagnostics.reason naming asymmetry surfaced by the spikes; engine code normalizes at the boundary. - Fix rev-4 -> rev-5 cosmetic footer drift. Test counts: agent 500 -> 516 (+16), cdk 1036 -> 1054 (+18), cli 190 unchanged. No production code changes in this chunk; engine rewrite lands in Chunk 2. Follow-up: separate chore issue to move contracts/memory-hash-vectors.json into a self-named subdir for consistency with contracts/cedar-parity/.
1 parent 945b149 commit ca76049

17 files changed

Lines changed: 953 additions & 10 deletions

agent/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ dependencies = [
1111
"uvicorn==0.42.0",
1212
"aws-opentelemetry-distro~=0.15.0",
1313
"mcp==1.23.0",
14-
"cedarpy>=4.8.0",
14+
"cedarpy==4.8.0",
1515
]
1616

1717
[tool.bandit]

agent/tests/test_cedar_parity.py

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
"""Cedar cross-engine parity — agent side (cedarpy).
2+
3+
Loads every ``contracts/cedar-parity/*.json`` fixture, runs each
4+
``(policies, input)`` through ``cedarpy.is_authorized``, and asserts the
5+
observed ``(decision, matching_rule_ids)`` equals the fixture's
6+
``expected`` payload.
7+
8+
The companion test ``cdk/test/handlers/shared/cedar-parity.test.ts`` runs
9+
the same fixtures through ``@cedar-policy/cedar-wasm``. If either side
10+
disagrees with the fixture, CI fails BEFORE deploy — satisfying the
11+
cross-engine parity contract (decision #23, finding #1, §15.6 of
12+
``docs/design/CEDAR_HITL_GATES.md``).
13+
14+
Fixture path resolution mirrors the pattern in
15+
``test_prompts.py::TestCrossLanguageHashParity`` for
16+
``contracts/memory-hash-vectors.json``.
17+
"""
18+
19+
import json
20+
import os
21+
from pathlib import Path
22+
23+
# Hard import (not importorskip): the parity contract REQUIRES cedarpy.
24+
# A dependency regression that drops cedarpy must fail loudly, not be
25+
# silently skipped — skipping would let divergence reach production.
26+
# See silent-failure audit finding #8 (Chunk 1 review, 2026-05-07).
27+
import cedarpy
28+
import pytest
29+
30+
_FIXTURE_DIR = Path(os.path.dirname(__file__)) / ".." / ".." / "contracts" / "cedar-parity"
31+
_FIXTURE_DIR = _FIXTURE_DIR.resolve()
32+
33+
_VALID_DECISIONS = frozenset({"allow", "deny"})
34+
35+
36+
def _validate_fixture(fixture: dict, path: Path) -> None:
37+
"""Reject malformed fixtures at load time so bad data fails loud."""
38+
for required in ("name", "policies", "input", "expected"):
39+
if required not in fixture:
40+
raise AssertionError(f"{path.name}: missing required field {required!r}")
41+
for required in ("principal", "action", "resource"):
42+
if required not in fixture["input"]:
43+
raise AssertionError(f"{path.name}: input missing {required!r}")
44+
expected = fixture["expected"]
45+
if "decision" not in expected or "matching_rule_ids" not in expected:
46+
raise AssertionError(f"{path.name}: expected missing decision/matching_rule_ids")
47+
# Enforce lowercase canonical form — both engines report lowercase
48+
# natively (cedar-wasm) or are normalized on read (cedarpy via .value.lower()).
49+
# Rejecting case drift at load prevents a fixture author from writing
50+
# "Deny" and having only one engine's comparator hit the case-mismatch.
51+
if expected["decision"] not in _VALID_DECISIONS:
52+
raise AssertionError(
53+
f"{path.name}: decision must be lowercase in {_VALID_DECISIONS}, "
54+
f"got {expected['decision']!r}"
55+
)
56+
57+
58+
def _load_fixtures() -> list[dict]:
59+
"""Load every parity fixture; skip README.md."""
60+
assert _FIXTURE_DIR.is_dir(), (
61+
f"expected fixture dir at {_FIXTURE_DIR}; see contracts/cedar-parity/README.md"
62+
)
63+
fixtures = []
64+
for path in sorted(_FIXTURE_DIR.glob("*.json")):
65+
with path.open() as f:
66+
fixture = json.load(f)
67+
_validate_fixture(fixture, path)
68+
fixtures.append(fixture)
69+
assert fixtures, f"no fixtures found under {_FIXTURE_DIR}; at least one golden file is required"
70+
return fixtures
71+
72+
73+
def _entity_uid(entity_ref: dict) -> str:
74+
"""Format an entity reference dict as a Cedar UID string literal."""
75+
return f'{entity_ref["type"]}::"{entity_ref["id"]}"'
76+
77+
78+
def _build_entities(fixture_input: dict) -> list[dict]:
79+
"""Build cedarpy's entities list from principal/action/resource references.
80+
81+
Includes ``action`` so the two engines receive equivalent entity sets;
82+
cedarpy tolerates undeclared actions today but the TS side passes an
83+
empty entities list — keeping both sides symmetric prevents silent
84+
asymmetric failures if a future fixture attaches attributes to the
85+
action entity. See silent-failure audit finding #3.
86+
"""
87+
entities = []
88+
for key in ("principal", "action", "resource"):
89+
ref = fixture_input.get(key)
90+
if ref and isinstance(ref, dict) and "type" in ref and "id" in ref:
91+
entities.append(
92+
{
93+
"uid": {"type": ref["type"], "id": ref["id"]},
94+
"attrs": {},
95+
"parents": [],
96+
}
97+
)
98+
return entities
99+
100+
101+
def _build_request(fixture_input: dict) -> dict:
102+
"""Translate the fixture input into the cedarpy is_authorized request shape."""
103+
return {
104+
"principal": _entity_uid(fixture_input["principal"]),
105+
"action": _entity_uid(fixture_input["action"]),
106+
"resource": _entity_uid(fixture_input["resource"]),
107+
"context": fixture_input.get("context", {}),
108+
}
109+
110+
111+
def _recover_rule_ids(policies: str, matching_policy_ids: list[str]) -> list[str]:
112+
"""Map engine-internal positional IDs (policy0, ...) back to @rule_id annotations.
113+
114+
Enforces that EVERY matching policy must carry a ``@rule_id`` annotation.
115+
Dropping unannotated matches would silently hide genuine cross-engine
116+
disagreement (e.g. one engine matching the base ``permit`` alongside a
117+
``forbid``) — the whole point of this test is to fail such disagreement,
118+
not bury it. See silent-failure audit finding #1 (Chunk 1 review,
119+
2026-05-07). Fixture policies are expected to annotate every rule
120+
including the base permit (``@rule_id("base_permit")``); a missing
121+
annotation raises rather than silently coerces to empty.
122+
"""
123+
try:
124+
parsed = json.loads(cedarpy.policies_to_json_str(policies))
125+
except Exception as exc:
126+
raise AssertionError(
127+
f"cedarpy.policies_to_json_str returned an unparseable result: "
128+
f"{type(exc).__name__}: {exc}"
129+
) from exc
130+
id_map = {
131+
pid: body.get("annotations", {}).get("rule_id")
132+
for pid, body in parsed.get("staticPolicies", {}).items()
133+
}
134+
recovered = []
135+
for pid in matching_policy_ids:
136+
rule_id = id_map.get(pid)
137+
if not rule_id:
138+
raise AssertionError(
139+
f"cedarpy matched policy {pid!r} but the fixture's policies define "
140+
f"no @rule_id annotation for it; every fixture policy (including the "
141+
f"base permit) must carry a rule_id so cross-engine disagreement "
142+
f"surfaces rather than being silently dropped"
143+
)
144+
recovered.append(rule_id)
145+
return sorted(recovered)
146+
147+
148+
_FIXTURES = _load_fixtures()
149+
150+
151+
@pytest.mark.parametrize("fixture", _FIXTURES, ids=[f["name"] for f in _FIXTURES])
152+
def test_cedarpy_matches_fixture_decision(fixture: dict) -> None:
153+
"""cedarpy's decision + recovered rule IDs must match the fixture's expected payload."""
154+
policies = fixture["policies"]
155+
request = _build_request(fixture["input"])
156+
entities = _build_entities(fixture["input"])
157+
158+
result = cedarpy.is_authorized(request, policies, entities)
159+
160+
# cedarpy decision enum: Decision.Allow / Decision.Deny. Fixture stores
161+
# lowercase to match cedar-wasm's native format; normalize before compare.
162+
# Fixture-side case was already validated at load (see _validate_fixture).
163+
observed_decision = result.decision.value.lower()
164+
expected_decision = fixture["expected"]["decision"]
165+
assert observed_decision == expected_decision, (
166+
f"fixture {fixture['name']!r}: decision drift — "
167+
f"cedarpy returned {observed_decision!r}, fixture expects {expected_decision!r}"
168+
)
169+
170+
observed_rule_ids = _recover_rule_ids(policies, result.diagnostics.reasons)
171+
expected_rule_ids = sorted(fixture["expected"]["matching_rule_ids"])
172+
assert observed_rule_ids == expected_rule_ids, (
173+
f"fixture {fixture['name']!r}: matching_rule_ids drift — "
174+
f"cedarpy returned {observed_rule_ids!r}, fixture expects {expected_rule_ids!r}"
175+
)
176+
177+
178+
def test_fixture_dir_exists() -> None:
179+
"""Guard against silent empty-dir regressions if glob picks up nothing."""
180+
assert _FIXTURE_DIR.is_dir()
181+
assert len(_FIXTURES) >= 1
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
"""Cedar-HITL Day-1 spike: cedarpy annotation round-trip contract.
2+
3+
Locks the assumption (decision #22 / §15.6 of docs/design/CEDAR_HITL_GATES.md)
4+
that ``cedarpy.policies_to_json_str()`` preserves all five annotations the
5+
engine relies on — ``@rule_id``, ``@tier``, ``@approval_timeout_s``,
6+
``@severity``, ``@category`` — verbatim as string-valued entries under
7+
``staticPolicies.<policy_id>.annotations``.
8+
9+
If cedarpy's annotation surface ever changes shape (renamed key, dropped
10+
values, typed coercion), this test flips red BEFORE the engine's
11+
annotation-merging logic starts returning subtly-wrong answers.
12+
13+
Parity with the TypeScript side is tested separately in
14+
``test_cedar_parity.py`` against the shared ``contracts/cedar-parity/``
15+
fixtures; this module validates only the agent-side API shape.
16+
"""
17+
18+
import json
19+
20+
import pytest
21+
22+
cedarpy = pytest.importorskip("cedarpy")
23+
24+
25+
_ANNOTATED_POLICY = (
26+
'@tier("soft") '
27+
'@rule_id("force_push_any") '
28+
'@approval_timeout_s("300") '
29+
'@severity("medium") '
30+
'@category("destructive") '
31+
'forbid (principal, action == Agent::Action::"execute_bash", resource) '
32+
'when { context.command like "*git push --force*" };'
33+
)
34+
35+
36+
def _first_static_policy(policies_text: str) -> dict:
37+
"""Parse a Cedar policy set and return the first staticPolicies entry."""
38+
parsed = json.loads(cedarpy.policies_to_json_str(policies_text))
39+
statics = parsed.get("staticPolicies", {})
40+
assert statics, f"expected at least one static policy, got keys={list(parsed)}"
41+
return next(iter(statics.values()))
42+
43+
44+
class TestAnnotationsRoundTrip:
45+
"""All five annotations round-trip verbatim as strings."""
46+
47+
def test_policies_to_json_str_returns_static_policies_wrapper(self):
48+
parsed = json.loads(cedarpy.policies_to_json_str(_ANNOTATED_POLICY))
49+
# The design's annotation-merging code keys off ``staticPolicies`` —
50+
# if cedarpy ever flattens this wrapper, the engine's lookup table
51+
# construction breaks silently.
52+
assert "staticPolicies" in parsed
53+
54+
def test_annotations_key_present_on_parsed_policy(self):
55+
body = _first_static_policy(_ANNOTATED_POLICY)
56+
assert "annotations" in body, (
57+
f"cedarpy dropped the annotations key from parsed policy; body keys were {list(body)}"
58+
)
59+
60+
def test_rule_id_annotation_preserved(self):
61+
annotations = _first_static_policy(_ANNOTATED_POLICY)["annotations"]
62+
assert annotations.get("rule_id") == "force_push_any"
63+
64+
def test_tier_annotation_preserved(self):
65+
annotations = _first_static_policy(_ANNOTATED_POLICY)["annotations"]
66+
assert annotations.get("tier") == "soft"
67+
68+
def test_approval_timeout_s_annotation_preserved_as_string(self):
69+
annotations = _first_static_policy(_ANNOTATED_POLICY)["annotations"]
70+
# Cedar annotations are always string-valued; the engine coerces to
71+
# int inside ``_merge_annotations`` (§6.3). If cedarpy ever switches
72+
# to int coercion on its side, the merge code's ``try: int(...)``
73+
# still works, but the documented contract (§5.2) says "string".
74+
assert annotations.get("approval_timeout_s") == "300"
75+
assert isinstance(annotations.get("approval_timeout_s"), str)
76+
77+
def test_severity_annotation_preserved(self):
78+
annotations = _first_static_policy(_ANNOTATED_POLICY)["annotations"]
79+
assert annotations.get("severity") == "medium"
80+
81+
def test_category_annotation_preserved(self):
82+
annotations = _first_static_policy(_ANNOTATED_POLICY)["annotations"]
83+
assert annotations.get("category") == "destructive"
84+
85+
def test_all_five_annotations_present_exactly(self):
86+
annotations = _first_static_policy(_ANNOTATED_POLICY)["annotations"]
87+
expected = {
88+
"tier": "soft",
89+
"rule_id": "force_push_any",
90+
"approval_timeout_s": "300",
91+
"severity": "medium",
92+
"category": "destructive",
93+
}
94+
assert annotations == expected, f"annotations drift: expected {expected}, got {annotations}"
95+
96+
97+
class TestDiagnosticsShape:
98+
"""The is_authorized result carries matching policy IDs under diagnostics.reasons."""
99+
100+
def test_diagnostics_reasons_is_a_list(self):
101+
# The engine's three-outcome branching walks ``diagnostics.reasons``
102+
# to recover matching policy IDs, which the annotation lookup table
103+
# then maps back to ``@rule_id`` values. If cedarpy ever renames
104+
# this attribute (singular ``.reason``, nested object, etc.) the
105+
# engine silently loses the ability to surface rule IDs to users.
106+
req = {
107+
"principal": 'Agent::TaskAgent::"new_task"',
108+
"action": 'Agent::Action::"execute_bash"',
109+
"resource": 'Agent::BashCommand::"command"',
110+
"context": {"command": "git push --force origin main"},
111+
}
112+
entities = [
113+
{"uid": {"type": "Agent::TaskAgent", "id": "new_task"}, "attrs": {}, "parents": []},
114+
{"uid": {"type": "Agent::BashCommand", "id": "command"}, "attrs": {}, "parents": []},
115+
]
116+
r = cedarpy.is_authorized(req, _ANNOTATED_POLICY, entities)
117+
assert hasattr(r.diagnostics, "reasons"), (
118+
"cedarpy.Diagnostics no longer exposes .reasons — engine rule-ID "
119+
"recovery will break. Update §15.6 IMPL-29 before proceeding."
120+
)
121+
assert isinstance(r.diagnostics.reasons, list)
122+
assert len(r.diagnostics.reasons) >= 1
123+
124+
125+
class TestMultiMatchDiagnostics:
126+
"""Multi-match produces multiple policy IDs in diagnostics.reasons."""
127+
128+
def test_two_matching_policies_produce_two_reasons(self):
129+
policies = (
130+
_ANNOTATED_POLICY
131+
+ "\n"
132+
+ (
133+
'@tier("soft") '
134+
'@rule_id("force_push_main") '
135+
'@approval_timeout_s("600") '
136+
'@severity("high") '
137+
'forbid (principal, action == Agent::Action::"execute_bash", resource) '
138+
'when { context.command like "*git push --force origin main*" };'
139+
)
140+
)
141+
req = {
142+
"principal": 'Agent::TaskAgent::"new_task"',
143+
"action": 'Agent::Action::"execute_bash"',
144+
"resource": 'Agent::BashCommand::"command"',
145+
"context": {"command": "git push --force origin main"},
146+
}
147+
entities = [
148+
{"uid": {"type": "Agent::TaskAgent", "id": "new_task"}, "attrs": {}, "parents": []},
149+
{"uid": {"type": "Agent::BashCommand", "id": "command"}, "attrs": {}, "parents": []},
150+
]
151+
r = cedarpy.is_authorized(req, policies, entities)
152+
# §6.3 annotation-merging depends on receiving both policy IDs here;
153+
# if cedarpy short-circuits on first match, the "max severity" and
154+
# "min timeout" merge rules never fire.
155+
assert len(r.diagnostics.reasons) == 2, (
156+
f"expected 2 matching policies, got {r.diagnostics.reasons}"
157+
)

agent/uv.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

cdk/package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
"@aws-sdk/lib-dynamodb": "^3.1021.0",
2727
"@aws-sdk/s3-request-presigner": "^3.1021.0",
2828
"@aws/durable-execution-sdk-js": "^1.1.0",
29+
"@cedar-policy/cedar-wasm": "4.10.0",
2930
"aws-cdk-lib": "^2.238.0",
3031
"cdk-nag": "^2.37.55",
3132
"constructs": "^10.3.0",

0 commit comments

Comments
 (0)