Skip to content

Commit b090826

Browse files
authored
Add progressive degradation fixture levels
Add progressive degradation fixture levels - Add mild and moderate degraded coding workflow fixture bundles. - Update layered admissibility artifact and benchmark documentation to a four-point curve. - Add progressive degradation generator tests for monotonic scores and expected failures. - Align expected layer scores with generated relational scores. Validation reported in PR: targeted degradation generator, scorer, comparator, validator, fixture tests, full pytest, and npm run check passed.
1 parent e465d7c commit b090826

29 files changed

Lines changed: 800 additions & 5 deletions

artifacts/layered_admissibility_results.json

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,53 @@
2222
"relational_score": 1.0,
2323
"structural_score": 1.0
2424
},
25+
{
26+
"expected_admissible": false,
27+
"failed_contracts": [
28+
"recovery_path_available"
29+
],
30+
"failure_labels": [
31+
"RECOVERY_PATH_INVALID"
32+
],
33+
"fixture_id": "coding_workflow_pr_review_mild_v1",
34+
"fixture_path": "fixtures/coding_workflow_pr_review_mild_v1",
35+
"fixture_version": "1.0.0",
36+
"governance_score": 1.0,
37+
"observed_admissible": false,
38+
"operational_score": 1.0,
39+
"overall_admissibility_score": 0.9166666666666666,
40+
"passed_contracts": [
41+
"no_orphan_tool_calls",
42+
"pre_merge_review",
43+
"security_causal_block"
44+
],
45+
"relational_score": 0.6666666666666666,
46+
"structural_score": 1.0
47+
},
48+
{
49+
"expected_admissible": false,
50+
"failed_contracts": [
51+
"recovery_path_available",
52+
"security_causal_block"
53+
],
54+
"failure_labels": [
55+
"CAUSAL_DEPENDENCY_LOSS",
56+
"RECOVERY_PATH_INVALID"
57+
],
58+
"fixture_id": "coding_workflow_pr_review_moderate_v1",
59+
"fixture_path": "fixtures/coding_workflow_pr_review_moderate_v1",
60+
"fixture_version": "1.0.0",
61+
"governance_score": 1.0,
62+
"observed_admissible": false,
63+
"operational_score": 1.0,
64+
"overall_admissibility_score": 0.8333333333333334,
65+
"passed_contracts": [
66+
"no_orphan_tool_calls",
67+
"pre_merge_review"
68+
],
69+
"relational_score": 0.3333333333333333,
70+
"structural_score": 1.0
71+
},
2572
{
2673
"expected_admissible": false,
2774
"failed_contracts": [

docs/benchmarks/layered_admissibility.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,16 @@ Deterministically compare admissibility outcomes across fixture bundles using Co
99
| fixture_id | expected_admissible | observed_admissible | structural_score | relational_score | operational_score | governance_score | overall_admissibility_score | failure_labels |
1010
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
1111
| coding_workflow_pr_review_v1 | true | true | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | none |
12+
| coding_workflow_pr_review_mild_v1 | false | false | 1.000 | 0.667 | 1.000 | 1.000 | 0.917 | RECOVERY_PATH_INVALID |
13+
| coding_workflow_pr_review_moderate_v1 | false | false | 1.000 | 0.333 | 1.000 | 1.000 | 0.833 | CAUSAL_DEPENDENCY_LOSS, RECOVERY_PATH_INVALID |
1214
| coding_workflow_pr_review_degraded_v1 | false | false | 1.000 | 0.000 | 0.000 | 1.000 | 0.500 | CAUSAL_DEPENDENCY_LOSS, INVARIANT_VIOLATION, POLICY_ORDER_BROKEN, RECOVERY_PATH_INVALID |
1315

1416
## Interpretation
1517

16-
The positive fixture remains fully admissible while the degraded fixture shows deterministic score loss and explicit failure labels.
18+
- positive fixture remains fully admissible
19+
- mild fixture isolates recovery reachability loss
20+
- moderate fixture combines recovery and causality loss
21+
- severe fixture combines relational and operational failures
1722

1823
## Non-goals
1924

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# coding_workflow_pr_review_mild_v1
2+
3+
Deterministic mild degraded fixture for coding workflow replay-validation contracts.
4+
5+
## Intentional degradations
6+
7+
1. **Reachability degradation**: reconstructed dependency graph removes recovery edges from `test_failure` to `rollback` and `escalate_to_human`, violating `recovery_path_available`.
8+
9+
## Preserved properties
10+
11+
- Ordering sequence remains intact in reconstructed trace.
12+
- No orphan dependency invariant is preserved.
13+
14+
## Expected failures
15+
16+
- `RECOVERY_PATH_INVALID`
17+
18+
This fixture is intentionally synthetic, deterministic, and scoped to this fixture family.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"fixture_id": "coding_workflow_pr_review_mild_v1",
3+
"fixture_version": "1.0.0",
4+
"expected_admissible": false,
5+
"expected_layer_scores": {
6+
"structural": 1.0,
7+
"relational": 0.6666666666666666,
8+
"operational": 1.0,
9+
"governance": 1.0
10+
},
11+
"notes": "Mild degraded fixture isolating recovery-path reachability loss.",
12+
"must_fail_contracts": [
13+
"recovery_path_available"
14+
],
15+
"expected_failure_labels": [
16+
"RECOVERY_PATH_INVALID"
17+
]
18+
}
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"expected_failures": [
3+
"RECOVERY_PATH_INVALID"
4+
],
5+
"allowed_failures": [
6+
"ORPHAN_DEPENDENCY",
7+
"DETACHED_DEPENDENCY",
8+
"GRAPH_FRAGMENTATION",
9+
"TEMPORAL_ORDER_VIOLATION"
10+
],
11+
"disallowed_failures": [
12+
"POLICY_ORDER_BROKEN",
13+
"INVARIANT_VIOLATION",
14+
"CYCLE_INTRODUCED",
15+
"REPLAY_NON_REPRODUCIBLE",
16+
"ARTIFACT_INTEGRITY_VIOLATION"
17+
]
18+
}
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"contract_id": "no_orphan_tool_calls",
3+
"layer": "relational",
4+
"type": "invariant",
5+
"definition": {
6+
"rule": "no_orphan_dependencies"
7+
},
8+
"severity": "HIGH"
9+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"contract_id": "pre_merge_review",
3+
"layer": "operational",
4+
"type": "ordering",
5+
"definition": {
6+
"required_sequence": [
7+
"generate_patch",
8+
"run_tests",
9+
"human_review",
10+
"merge"
11+
]
12+
},
13+
"severity": "CRITICAL"
14+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"contract_id": "recovery_path_available",
3+
"layer": "relational",
4+
"type": "reachability",
5+
"definition": {
6+
"from": "test_failure",
7+
"to": [
8+
"rollback",
9+
"escalate_to_human"
10+
],
11+
"min_paths": 1
12+
},
13+
"severity": "HIGH"
14+
}
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"contract_id": "security_causal_block",
3+
"layer": "relational",
4+
"type": "causality",
5+
"definition": {
6+
"required_causal_edges": [
7+
["security_scan_failed", "deploy_blocked"]
8+
]
9+
},
10+
"severity": "HIGH"
11+
}
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
{
2+
"graph_version": "1.0",
3+
"nodes": [
4+
{"node_id": "generate_patch", "label": "Generate patch", "metadata": {"phase": "build"}},
5+
{"node_id": "run_tests", "label": "Run tests", "metadata": {"phase": "verify"}},
6+
{"node_id": "test_failure", "label": "Test failure", "metadata": {"phase": "verify"}},
7+
{"node_id": "rollback", "label": "Rollback", "metadata": {"phase": "recovery"}},
8+
{"node_id": "security_scan_failed", "label": "Security scan failed", "metadata": {"phase": "security"}},
9+
{"node_id": "deploy_blocked", "label": "Deploy blocked", "metadata": {"phase": "security"}},
10+
{"node_id": "escalate_to_human", "label": "Escalate to human", "metadata": {"phase": "recovery"}},
11+
{"node_id": "human_review", "label": "Human review", "metadata": {"phase": "governance"}},
12+
{"node_id": "merge", "label": "Merge", "metadata": {"phase": "release"}}
13+
],
14+
"edges": [
15+
{"source": "generate_patch", "target": "run_tests", "relation": "PREREQUISITE", "metadata": {}},
16+
{"source": "run_tests", "target": "test_failure", "relation": "CAUSAL", "metadata": {}},
17+
{"source": "run_tests", "target": "security_scan_failed", "relation": "DATA_FLOW", "metadata": {}},
18+
{"source": "test_failure", "target": "rollback", "relation": "RECOVERY", "metadata": {}},
19+
{"source": "test_failure", "target": "escalate_to_human", "relation": "RECOVERY", "metadata": {}},
20+
{"source": "security_scan_failed", "target": "deploy_blocked", "relation": "CAUSAL", "metadata": {}},
21+
{"source": "rollback", "target": "human_review", "relation": "TEMPORAL", "metadata": {}},
22+
{"source": "escalate_to_human", "target": "human_review", "relation": "TEMPORAL", "metadata": {}},
23+
{"source": "human_review", "target": "merge", "relation": "PREREQUISITE", "metadata": {}},
24+
{"source": "run_tests", "target": "merge", "relation": "PREREQUISITE", "metadata": {}},
25+
{"source": "deploy_blocked", "target": "merge", "relation": "BLOCKER", "metadata": {"state": "prevented"}}
26+
]
27+
}

0 commit comments

Comments
 (0)