Skip to content

Commit b7955b9

Browse files
Merge pull request #6 from dstackai/feat/sandbox-verdict-gate-cli-output
Improve sandbox verdict reliability and streamline CLI output
2 parents fb2c7c9 + c1ae806 commit b7955b9

File tree

11 files changed

+1473
-357
lines changed

11 files changed

+1473
-357
lines changed

Cargo.lock

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ members = [
66
resolver = "2"
77

88
[workspace.package]
9-
version = "0.0.6"
9+
version = "0.0.7"
1010
edition = "2021"
1111
license = "MIT"
1212
authors = ["Clawform Contributors"]

README.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Clawform executes agentic programs from markdown files.
44

5-
You keep instructions in repo files and run `cf -f program.md` (equivalent to `cf apply -f program.md`). Each run uses `.clawform/agent_*.json|md` protocol files, writes session data under `.clawform/programs/<program_id>/sessions/<session_id>/`, and appends `.clawform/history/index.jsonl`.
5+
You keep instructions in repo files and run `cf -f program.md` (equivalent to `cf apply -f program.md`). Each run writes session data under `.clawform/programs/<program_id>/sessions/<session_id>/` and appends `.clawform/history/index.jsonl`.
66

77
## Why It Exists
88

@@ -34,13 +34,13 @@ Before execution, Clawform previews:
3434

3535
Then it asks for confirmation and executes the program with the configured provider.
3636

37-
During execution, Clawform streams agent progress events to the terminal and writes per-session `commands/*` and `messages/*` files for clickable `out`/`msg` links.
37+
During execution, Clawform streams progress events to the terminal.
3838

3939
After execution, Clawform stores:
4040

4141
- run outcome and summary (`outcome.json`, `output.md`)
4242
- per-session snapshots for next-run diff (`program.md`, `variables.json`)
43-
- changed files reported for this session (from `agent_outputs.json`)
43+
- changed files reported for this session
4444

4545
## Install
4646

@@ -53,7 +53,7 @@ curl -fsSL https://raw.githubusercontent.com/dstackai/clawform/main/install.sh |
5353
Install a specific version:
5454

5555
```bash
56-
CLAWFORM_VERSION=v0.0.6 curl -fsSL https://raw.githubusercontent.com/dstackai/clawform/main/install.sh | sh
56+
CLAWFORM_VERSION=v0.0.7 curl -fsSL https://raw.githubusercontent.com/dstackai/clawform/main/install.sh | sh
5757
```
5858

5959
## Quick Start
@@ -98,10 +98,11 @@ Last session: 019d5843-eb2d-70b1-b49a-343033117944 (success, 43m ago)
9898
program: examples/smoke.md unchanged
9999
changes: 0 files
100100
Proceed? [y/N] y
101-
session 019d586b-aa65-78b2-8a0d-27b5543c59bb
101+
🧵 019d586b-aa65-78b2-8a0d-27b5543c59bb | workspace-write
102102
✔ cat examples/smoke.md | 1ms | out
103103
💬 Verified `example-data/output-smoke.txt:1` already contains the required `SMOKE_OK` line with trailing newline. | msg
104104
turn 1 | tokens: in=117k out=1.6k cached=107k
105+
✅ Verified `example-data/output-smoke.txt` already contains `SMOKE_OK`. | file
105106
total | tokens: in=117k out=1.6k cached=107k
106107
changes: 0 files
107108
```
@@ -116,6 +117,8 @@ Clawform keeps local state under `.clawform/`:
116117

117118
Session folders keep `program.md`, `variables.json`, `output.md`, `outcome.json`, plus `commands/*` and `messages/*` used by interactive `out`/`msg` links.
118119

120+
For internal protocol details and strict agent result schema, see `contrib/ARCHITECTURE.md`.
121+
119122
## Commands
120123

121124
```bash

contrib/ARCHITECTURE.md

Lines changed: 51 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -87,13 +87,15 @@ Variable rules:
8787
- program diff vs last session snapshot (if available)
8888
- variable diff vs last session variable snapshot (if available)
8989
5. Ask for confirmation (interactive default; skipped by `--yes`).
90-
6. Write runtime variables file (`.clawform/agent_variables.json`) when variables are present.
91-
7. Run provider in the current workspace (no temp workspace copy).
92-
8. Stream provider events to terminal; during the run write session `commands/*` and `messages/*` for clickable `out`/`msg` links.
93-
9. Read agent status from `.clawform/agent_result.json` (required).
94-
10. Collect reported changed files from `.clawform/agent_outputs.json` when that file exists and was updated in this run.
95-
11. Persist run-end records (`output.md`, `outcome.json`) and append `.clawform/history/index.jsonl`.
96-
12. Persist program snapshot (`program.md`) and variable snapshot (`variables.json`) on success.
90+
6. Clear prior run protocol files in `.clawform/` and write runtime variables file (`.clawform/agent_variables.json`) when variables are present.
91+
7. Build runtime prompt; in sandboxed modes (`sandboxed`/`auto`) include explicit verdict-gate rules for sandbox-vs-program blocking.
92+
8. Run provider in the current workspace (no temp workspace copy).
93+
9. Stream provider events to terminal; during the run write session `commands/*` and `messages/*`.
94+
10. In `auto` sandbox mode, allow at most one unsandboxed retry only when current-run `.clawform/agent_result.json` reports `status=partial|failure` and `reason=sandbox_blocked` (no stdout/stderr heuristic fallback).
95+
11. Read agent status from `.clawform/agent_result.json` (required) and validate strict status/reason schema.
96+
12. Collect reported changed files from `.clawform/agent_outputs.json` when that file exists and was updated in this run.
97+
13. Persist run-end records (`output.md`, `outcome.json`) and append `.clawform/history/index.jsonl`.
98+
14. Persist program snapshot (`program.md`) and variable snapshot (`variables.json`) on success.
9799

98100
## 5) State and Storage Layout
99101

@@ -119,7 +121,7 @@ Path aliases used in this section:
119121
During the current run:
120122

121123
- Write `<protocol_root>/agent_variables.json` (when variables exist); the agent reads this file for resolved `${{ var.NAME }}` values.
122-
- Write `<session_root>/commands/*` and `<session_root>/messages/*` for clickable `out`/`msg` links in live progress output.
124+
- Write `<session_root>/commands/*` and `<session_root>/messages/*` as per-session execution artifacts.
123125
- Read `<protocol_root>/agent_result.json`, `<protocol_root>/agent_outputs.json`, and optional `<protocol_root>/agent_output.md` at run end to determine status, changed files, and summary.
124126

125127
On the next run of the same program:
@@ -138,11 +140,11 @@ For audit/debug visibility:
138140
| Data path | Scope | Why we store it | When it is used |
139141
|---|---|---|---|
140142
| `<protocol_root>/agent_variables.json` | Workspace-global scratch file for the currently running apply (overwritten on each apply) | Provide resolved runtime variables to the agent | Read by the agent during that same apply run |
141-
| `<protocol_root>/agent_result.json` | Workspace-global scratch file for the currently running apply (overwritten on each apply) | Receive final structured run status (`success/partial/failure`) and short reason/message from the agent | Read by Clawform at run end (and sandbox-retry logic), only if file mtime is from this run |
143+
| `<protocol_root>/agent_result.json` | Workspace-global scratch file for the currently running apply (overwritten on each apply) | Receive final structured run verdict (`status`, optional `reason`, `message`) where `reason` is strict enum (`sandbox_blocked` or `program_blocked`) | Read by Clawform at run end; in sandbox auto mode also used as the only retry signal source, only if file mtime is from this run |
142144
| `<protocol_root>/agent_outputs.json` | Workspace-global scratch file for the currently running apply (overwritten on each apply) | Receive changed-file list from the agent | Read by Clawform at run end for file summary/history, only if file mtime is from this run |
143145
| `<protocol_root>/agent_output.md` | Workspace-global scratch file for the currently running apply (optional; overwritten on each apply) | Receive agent-written summary text | Read by Clawform at run end; then copied into session `output.md` |
144-
| `<session_root>/commands/*.txt` | Per-session (`<program_id>/<session_id>`) | Preserve command output behind clickable `out` links | Used immediately in terminal progress output |
145-
| `<session_root>/messages/*.md` | Per-session (`<program_id>/<session_id>`) | Preserve assistant/message output behind clickable `msg` links | Used immediately in terminal progress output; also fallback summary source |
146+
| `<session_root>/commands/*.txt` | Per-session (`<program_id>/<session_id>`) | Preserve command output artifacts for this session | Used for progress drilldown and debugging |
147+
| `<session_root>/messages/*.md` | Per-session (`<program_id>/<session_id>`) | Preserve assistant/message artifacts for this session | Used for progress drilldown and fallback summary source |
146148
| `<session_root>/output.md` | Per-session (`<program_id>/<session_id>`) | Store stable summary artifact for this session | Used on next run of same `program_id` for preview/prompt reference |
147149
| `<session_root>/program.md` | Per-session (`<program_id>/<session_id>`) | Snapshot program text that produced this session | Used on next run of same `program_id` to compute program diff |
148150
| `<session_root>/variables.json` | Per-session (`<program_id>/<session_id>`) | Snapshot resolved variables for this session | Used on next run of same `program_id` to compute variables diff |
@@ -153,6 +155,7 @@ Compatibility behavior:
153155

154156
- No read fallback is used for `agent_summary.md` or `events.ndjson`.
155157
- Current apply reads only the current protocol files documented in this section.
158+
- Sandbox auto-retry does not parse provider stdout/stderr for sandbox heuristics; it only trusts current-run `agent_result.json`.
156159

157160
Current limitation:
158161

@@ -169,6 +172,43 @@ Current apply does not persist:
169172
- provider stdout/stderr artifact logs
170173
- canonical `events.ndjson` for new sessions
171174

175+
## 5.4 Agent Result Protocol Rules
176+
177+
Protocol file: `<protocol_root>/agent_result.json`
178+
179+
Expected shape:
180+
181+
```json
182+
{
183+
"status": "success|partial|failure",
184+
"reason": "sandbox_blocked|program_blocked",
185+
"message": "short human-readable summary"
186+
}
187+
```
188+
189+
Rules:
190+
191+
1. `status` is required and strict enum: `success | partial | failure`.
192+
2. `reason` is strict enum: `sandbox_blocked | program_blocked`.
193+
3. `reason` is required for `partial` and `failure`; omitted for `success`.
194+
4. Unknown `reason` values are rejected when Clawform parses `agent_result.json`.
195+
5. In sandboxed modes (`sandboxed`/`auto`), runtime prompt enforces verdict gate semantics:
196+
- first restriction symptom triggers block-cause classification
197+
- any sandbox evidence (including non-fatal permission/network warnings), mixed evidence, or uncertainty => `reason: sandbox_blocked`
198+
- `reason: program_blocked` only when zero restriction symptoms appeared and one read-only check confirms an independent non-sandbox cause
199+
- no workaround/fallback commands before writing the verdict
200+
201+
## 5.5 Auto Sandbox Retry and Progress Output
202+
203+
Applies only when sandbox mode is `auto`:
204+
205+
1. First pass runs sandboxed.
206+
2. One unsandboxed retry is allowed only when current-run `agent_result.json` reports:
207+
- `status` in `partial|failure`
208+
- `reason: sandbox_blocked`
209+
3. No retry is triggered from command-output text heuristics.
210+
4. When retry is triggered, Clawform emits a retry-decision progress line and then launches one unsandboxed attempt.
211+
172212
## 6) Known Bugs
173213

174214
### 6.1 Interrupted Runs Recorded as Failure

0 commit comments

Comments
 (0)