dreadnode
diff --git a/‎CHANGELOG.md‎
Lines changed: 1 addition & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 7 additions & 14 deletions b/‎README.md‎
Lines changed: 7 additions & 14 deletions
diff --git a/‎docs/cli.md‎
Lines changed: 10 additions & 14 deletions b/‎docs/cli.md‎
Lines changed: 10 additions & 14 deletions
diff --git a/‎docs/glossary.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/glossary.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/guide/output.md‎
Lines changed: 0 additions & 2 deletions b/‎docs/guide/output.md‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎docs/guide/resampling.md‎
Lines changed: 19 additions & 46 deletions b/‎docs/guide/resampling.md‎
Lines changed: 19 additions & 46 deletions
diff --git a/‎docs/guide/web-ui.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/guide/web-ui.md‎
Lines changed: 2 additions & 2 deletions
@@ -20,7 +20,7 @@ Initial release.
 - **Subagent capture** — separate ATIF trajectories for each subagent invocation, linked to parent via `SubagentTrajectoryRef`
 - **API request capture** — local reverse proxy captures raw request/response bodies, system prompts, tool definitions, token usage, and compaction events
 - **Turn-level resampling** — replay a specific API request N times to study response variance (stateless, no tool execution)
-- **Intervention testing** — edit captured API requests (assistant text, tool results, system prompt) and resample with modified inputs; available from both CLI (`harness resample-edit`) and web UI
+- **Intervention testing** — edit captured API requests (thinking, text, tool results, system prompt) and resample with modified inputs; available from both CLI (`harness resample-edit`) and web UI
 - **Session-level resampling** — re-run a forked session N times with full tool execution (`harness resample-session`)
 - **Turn-level replay** — branch execution from any API turn with exact-match context, filesystem reset via git worktrees, and full tool execution; replicates run in parallel (`harness replay`)
 - **Transcript capture** — Claude Code transcript JSONL copied into session output for replay support
 
@@ -1,6 +1,6 @@
 # AgentLens
 
-Developed at [MATS Exploration Phase](https://www.matsprogram.org/) under [Neel Nanda](https://github.com/neelnanda-io), for a research project with [Greg Kocher](https://github.com/gregkocher).
+> **This repository has moved to [dreadnode/agent-lens](https://github.com/dreadnode/agent-lens).** This copy is no longer maintained — please use the new location for the latest code, issues, and contributions.
 
 A harness for running multi-session agent trajectories using the Claude Agent SDK, capturing them in [ATIF](https://harborframework.com/docs/agents/trajectory-format) (Agent Trajectory Interchange Format), and tracking file state changes across sessions.
 
@@ -13,7 +13,7 @@ The harness takes a YAML config describing a sequence of sessions (prompts to an
 - **ATIF trajectories** — standardized JSON capturing every agent step, tool call, observation, and thinking block
 - **Shadow git change tracking** — automatic tracking of all file changes via an invisible git repo, with per-step write attribution and full unified diffs
 - **Session chaining** — three modes for controlling how sessions relate to each other (isolated, chained, forked)
-- **Resampling & replay** — study behavioral variance at multiple levels: stateless API resampling, intervention testing (edit assistant text, tool results, or system prompts and resample), session-level resampling, and turn-level replay with full tool execution from any branch point
+- **Resampling & replay** — study behavioral variance at multiple levels: stateless API resampling, intervention testing (edit inputs and resample), session-level resampling, and turn-level replay with full tool execution from any branch point
 - **Subagent capture** — separate ATIF trajectories for each subagent invocation, linked to the parent via `SubagentTrajectoryRef`
 
 ## Install
@@ -325,7 +325,7 @@ Edit a captured API request and resample with the modified version — the CLI e
 # Step 1: Dump the request for editing
 harness resample-edit runs/my-run --session 1 --request 5 --dump > edit.json
 
-# Step 2: Edit the JSON (assistant text, tool results, system prompt...)
+# Step 2: Edit the JSON (thinking, text, tool results, system prompt...)
 # Step 3: Resample with the modified request
 harness resample-edit runs/my-run --session 1 --request 5 \
   --input edit.json --label "removed hedging" --count 5
@@ -335,13 +335,11 @@ Pipe through `jq` for programmatic edits:
 
 ```bash
 harness resample-edit runs/my-run --session 1 --request 5 --dump \
-  | jq '.system = "You are a cautious engineer. Double-check everything."' \
+  | jq '.messages[-1].content[0].thinking = "Be more direct."' \
   | harness resample-edit runs/my-run --session 1 --request 5 \
-      --input - --label "cautious prompt" --count 10
+      --input - --label "direct thinking" --count 10
 ```
 
-> **Note:** Thinking blocks cannot be edited — they carry cryptographic signatures validated by the API. See [Thinking blocks](docs/guide/resampling.md#thinking-blocks-not-editable) for details.
-
 Variants are saved alongside vanilla resamples and appear in the web UI.
 
 ### `harness resample-session`
@@ -362,18 +360,13 @@ Replay a session from any API turn with full tool execution. Each replicate runs
 # List available turns
 harness replay runs/my-run --session 1 --list-turns
 
-# Replay from turn 5, three times (only session 1 runs)
+# Replay from turn 5, three times (runs in parallel)
 harness replay runs/my-run --session 1 --turn 5 --count 3
 
-# Replay session 1 turn 5, then continue with sessions 2, 3, etc.
-harness replay runs/my-run --session 1 --turn 5 --continue-sessions
-
 # Replay with an additional prompt after tool results
 harness replay runs/my-run --session 1 --turn 5 --prompt "Try a different approach"
 ```
 
-By default, replay only runs the targeted session. Use `--continue-sessions` to also run subsequent sessions from the original config.
-
 Replay creates new run directories (e.g. `replay_my-run_s1_t5_r01_<timestamp>/`) with full artifacts. Each includes a `replay_meta.json` with provenance linking back to the source run, session, and turn. The source working directory is never modified.
 
 ## Web UI
@@ -395,7 +388,7 @@ Open `http://localhost:5173`. The UI reads from the `runs/` directory and provid
 - **API captures** — request/response viewer with token usage, system prompts, tool definitions, compaction events
 - **Subagent viewer** — separate trajectory view for each subagent, with task prompt and return value
 - **Resamples** — compare N resample outputs for a given API turn
-- **Edit & Resample** — interactive message editor for intervention testing: edit assistant text, tool results, or system prompts in the conversation, then resample with the modified input to study how changes affect behavior (thinking blocks are shown read-only — see [why](docs/guide/resampling.md#thinking-blocks-not-editable))
+- **Edit & Resample** — interactive message editor for intervention testing: edit thinking, text, tool results, or system prompts in the conversation, then resample with the modified input to study how changes affect behavior
 - **Changelog** — per-step file write log across all sessions with expandable diffs
 - **Config viewer** — frozen YAML config from the run
 - **Analysis** — rendered markdown from `analysis.md`
 
@@ -126,7 +126,7 @@ Results are saved to `session_NN/resamples/request_NNN/` (and `request_NNN_vNN/`
 
 Edit a captured API request and resample with the modified version.
 
-For intervention strategy and output details, see [Resampling & Replay](guide/resampling.md#intervention-testing).
+For intervention strategy and output details, see [Resampling & Replay](guide/resampling.md#intervention-testing-edit-resample).
 
 ```bash
 harness resample-edit <run_dir> [OPTIONS]
@@ -151,9 +151,7 @@ harness resample-edit <run_dir> [OPTIONS]
 harness resample-edit runs/my-run --session 1 --request 5 --dump > edit.json
 ```
 
-**Step 2** — Edit the JSON file (change assistant text, tool results, system prompt, etc.), then resample.
-
-> **Do not edit thinking blocks.** They carry cryptographic signatures validated by the API — any modification will cause a 400 error. See [Thinking blocks](guide/resampling.md#thinking-blocks-not-editable) for details.
+**Step 2** — Edit the JSON file (change thinking, text, tool results, system prompt, etc.), then resample:
 
 ```bash
 harness resample-edit runs/my-run --session 1 --request 5 \
@@ -164,19 +162,19 @@ harness resample-edit runs/my-run --session 1 --request 5 \
 
 ```bash
 harness resample-edit runs/my-run --session 1 --request 5 --dump \
-  | jq '.system = "You are a cautious engineer. Always check for edge cases."' \
+  | jq '.messages[-1].content[0].thinking = "I should be more direct."' \
   | harness resample-edit runs/my-run --session 1 --request 5 \
-      --input - --label "cautious prompt" --count 10
+      --input - --label "direct thinking" --count 10
 ```
 
 ### Batch interventions
 
 ```bash
 for req in 3 5 7 9; do
   harness resample-edit runs/my-run --session 1 --request $req --dump \
-    | jq '(.messages[] | select(.role == "user") | .content[] | select(.type == "tool_result")).content = "Error: file not found"' \
+    | jq '.messages[-1].content[0].thinking = "Skip exploration, go straight to implementation."' \
     | harness resample-edit runs/my-run --session 1 --request $req \
-        --input - --label "tool-error" --count 5
+        --input - --label "skip-exploration" --count 5
 done
 ```
 
@@ -239,20 +237,18 @@ Turns in session 1 (12 total):
 
 ### Replaying
 
-By default, only the targeted session is replayed. Use `--continue-sessions` to also run sessions after it.
-
 ```bash
-# Replay from turn 5, three times (only session 1 runs)
+# Replay from turn 5, three times (runs in parallel)
 harness replay runs/my-run --session 1 --turn 5 --count 3
 
-# Replay session 1 turn 5, then continue with sessions 2, 3, etc.
-harness replay runs/my-run --session 1 --turn 5 --continue-sessions
-
 # Replay with an additional prompt
 harness replay runs/my-run --session 1 --turn 5 --prompt "Try a different approach"
 
 # Replay from turn 1 (re-run from scratch)
 harness replay runs/my-run --session 1 --turn 1 --count 2
+
+# Replay session 1 turn 5, then continue sessions 2..end
+harness replay runs/my-run --session 1 --turn 5 --continue-sessions
 ```
 
 Each replay creates a new run directory (e.g. `replay_my-run_s1_t5_r01_2026-03-16T00-00-00/`) with full artifacts including `replay_meta.json` for provenance tracking. The source working directory is never modified — each replicate operates in its own git worktree.
@@ -73,7 +73,7 @@ A full-fidelity re-execution from a specific turn. Each replicate runs in an iso
 
 ### Intervention (variant)
 
-A modified resample — the API request is edited before being sent (e.g. changing assistant text, tool results, or system prompt) to test counterfactuals. Thinking blocks cannot be edited due to cryptographic signature requirements. Variants are saved alongside vanilla resamples with a `_vNN` suffix and include the edited request for reproducibility.
+A modified resample — the API request is edited before being sent (e.g. changing a thinking block or system prompt) to test counterfactuals. Variants are saved alongside vanilla resamples with a `_vNN` suffix and include the edited request for reproducibility.
 
 ### Shadow git
 
 
@@ -13,8 +13,6 @@ runs/<run_name>/
 │
 ├── session_01/
 │   ├── trajectory.json         # ATIF v1.6 trajectory (parent)
-│   ├── transcript.jsonl        # Claude Code transcript (for replay)
-│   ├── uuid_map.json           # turn correlation map (transcript ↔ ATIF ↔ raw dumps)
 │   ├── session_diff.patch      # unified diff of this session's changes
 │   ├── subagent_<name>_<id>.json  # subagent ATIF trajectory (if any)
 │   ├── api_captures.jsonl      # API request/response metadata
 
@@ -25,7 +25,7 @@ Cheapest / fastest                                    Most thorough
 | I want to... | Method | Command |
 |--------------|--------|---------|
 | Check if the model would say the same thing again | [Turn resample](#turn-level-resampling) | `harness resample` |
-| See what happens if the model had seen different text or tool results | [Intervention](#intervention-testing) | `harness resample-edit` |
+| See what happens if the model had different thinking | [Intervention](#intervention-testing) | `harness resample-edit` |
 | See what happens if a tool returned something different | [Intervention](#intervention-testing) | `harness resample-edit` |
 | Compare N complete trajectories for the same task | [Session resample](#session-level-resampling) | `harness resample-session` |
 | Branch from a specific point and let the agent continue | [Turn replay](#turn-level-replay) | `harness replay` |
@@ -87,18 +87,17 @@ session_01/resamples/request_005/
 
 ## Intervention testing
 
-Edit the conversation inputs — text, tool results, or system prompt — then resample. This lets you test counterfactuals: "What would the model do differently if it had seen X instead of Y?"
+Edit the conversation inputs — thinking blocks, text, tool results, or system prompt — then resample. This lets you test counterfactuals: "What would the model do differently if it had seen X instead of Y?"
 
 Like turn-level resampling, this is **stateless** — no tools execute. But the input is modified before sending, so you can study causal effects.
 
 **What you can edit:**
 
-- **Assistant text** — alter what the model said in prior turns (e.g., remove hedging, change a decision)
-- **Tool results** — change what a tool returned (e.g., different file contents, simulated errors)
+- **Thinking blocks** — change the model's internal reasoning
+- **Text responses** — alter what the model said in prior turns
+- **Tool results** — change what a tool returned (e.g., different file contents)
 - **System prompt** — modify instructions
 
-> **Note:** Thinking blocks are visible in the dump and UI but are **not editable** — the API requires cryptographic signatures on thinking blocks that can't survive modification. They are preserved as-is so the model retains its original reasoning context. See [Thinking blocks](#thinking-blocks) for details.
-
 ### From the CLI
 
 Two-step workflow: dump the request, edit it, resample.
@@ -107,7 +106,7 @@ Two-step workflow: dump the request, edit it, resample.
 # 1. Dump the request to a file
 harness resample-edit runs/my-run --session 1 --request 5 --dump > edit.json
 
-# 2. Edit edit.json (change assistant text, tool results, system prompt...)
+# 2. Edit edit.json (change thinking, text, tool results, system prompt...)
 
 # 3. Resample with the modified request
 harness resample-edit runs/my-run --session 1 --request 5 \
@@ -117,30 +116,28 @@ harness resample-edit runs/my-run --session 1 --request 5 \
 For scriptable interventions, pipe through `jq`:
 
 ```bash
-# Change the system prompt
 harness resample-edit runs/my-run --session 1 --request 5 --dump \
-  | jq '.system = "You are a cautious engineer. Always check for edge cases."' \
+  | jq '.messages[-1].content[0].thinking = "Be more direct."' \
   | harness resample-edit runs/my-run --session 1 --request 5 \
-      --input - --label "cautious prompt" --count 10
+      --input - --label "direct thinking" --count 10
 ```
 
 Batch across multiple requests:
 
 ```bash
-# Change a tool result across several turns
 for req in 3 5 7 9; do
   harness resample-edit runs/my-run --session 1 --request $req --dump \
-    | jq '(.messages[] | select(.role == "user") | .content[] | select(.type == "tool_result")).content = "Error: file not found"' \
+    | jq '.messages[-1].content[0].thinking = "Skip exploration."' \
     | harness resample-edit runs/my-run --session 1 --request $req \
-        --input - --label "tool-error" --count 5
+        --input - --label "skip-exploration" --count 5
 done
 ```
 
 ### From the web UI
 
 1. Open a session's API captures
 2. Click "Edit & Resample" on any request
-3. Modify text, tool results, or system prompts (thinking blocks are shown read-only)
+3. Modify thinking blocks, text, tool results, or system prompts
 4. Resample with the modified input
 
 ### Output
@@ -214,24 +211,22 @@ Bracketed tags (e.g. `[_step_1_3]`) indicate shadow git snapshots — turns wher
 
 ### Running
 
-By default, replay **only runs the targeted session** — it branches from the specified turn and lets the agent continue until that session ends. Subsequent sessions from the original config are not run.
-
-To replay the full remaining experiment (the targeted session *and* all sessions after it), use `--continue-sessions`.
-
 ```bash
-# Replay from turn 5, three times (only session 1 runs)
+# Replay from turn 5, three times (runs in parallel)
 harness replay runs/my-run --session 1 --turn 5 --count 3
 
-# Replay session 1 turn 5, then continue with sessions 2, 3, etc.
-harness replay runs/my-run --session 1 --turn 5 --continue-sessions
-
 # Replay with an additional prompt after tool results
 harness replay runs/my-run --session 1 --turn 5 --prompt "Try a different approach"
 
 # Replay from turn 1 (re-run from scratch with same config)
 harness replay runs/my-run --session 1 --turn 1 --count 2
+
+# Replay session 1 turn 5, then continue with sessions 2..end
+harness replay runs/my-run --session 1 --turn 5 --continue-sessions
 ```
 
+When `--continue-sessions` is enabled, each replicate runs the replayed session first, then continues with sessions `N+1..end` from the original config.
+
 ### Output
 
 Each replay creates a new independent run directory:
@@ -258,28 +253,6 @@ runs/replay_my-run_s1_t5_r01_2026-03-16T00-00-00/
 
 Each session generates a `uuid_map.json` that correlates entries across the three data formats (transcript, ATIF trajectory, raw API dumps). The primary join key is `tool_call_id`. The replay system uses this to find shadow git tags for filesystem reset.
 
-### Thinking blocks (not editable)
-
-> **Warning:** Thinking blocks cannot be edited in interventions. Any attempt to modify thinking content in a dumped request JSON will cause the API to reject the request with a 400 error. The UI editor shows thinking blocks as read-only.
-
-#### Why: cryptographic signatures
-
-When the Anthropic API returns a response with extended thinking enabled, each `thinking` block includes a cryptographic `signature` field. On subsequent requests, the API validates this signature to confirm the thinking content has not been tampered with. This is a server-side integrity check — there is no way to regenerate or forge a valid signature outside of Anthropic's infrastructure.
-
-This means:
-- **Unmodified thinking blocks** have valid signatures and are accepted by the API
-- **Edited thinking blocks** have invalidated signatures and are rejected (HTTP 400)
-- **Stripped signatures** (keeping the text but removing the `signature` field) are also rejected
-
-`redacted_thinking` blocks are similarly protected — they contain opaque encrypted content that cannot be inspected or modified.
-
-#### What this means for interventions
-
-All resampling methods preserve thinking blocks with their original signatures intact, so the model always sees its full original reasoning context. This is faithful — the model receives the same thinking it originally produced.
-
-To test counterfactuals about model behavior, edit the fields that *are* modifiable:
-- **Assistant text** — change what the model said (its visible output)
-- **Tool results** — change what a tool returned (e.g., different file contents, simulated errors)
-- **System prompt** — change the instructions
+### Thinking signatures
 
-These fields have no signature requirements and can be freely modified.
+When resampling, the harness automatically strips thinking block signatures from the request. Signatures are response-specific and would cause errors if replayed verbatim.
@@ -22,7 +22,7 @@ Configure the UI via `ui/.env` or shell environment:
 | `ANTHROPIC_API_KEY` | — | Required for resampling via Anthropic API |
 | `ANTHROPIC_BASE_URL` | `https://api.anthropic.com` | Override the API base URL for resampling |
 
-The resampling API keys are needed for any resampling in the UI (both vanilla resamples and "Edit & Resample"). The UI auto-detects whether to use OpenRouter or Anthropic based on the original run's API target.
+The resampling API keys are only needed if you use the "Edit & Resample" feature in the UI. The UI auto-detects whether to use OpenRouter or Anthropic based on the original run's API target.
 
 ## Features
 
@@ -62,7 +62,7 @@ Compare N resample outputs for a given API turn side-by-side.
 ### Edit & Resample
 Interactive message editor for intervention testing:
 
-1. Edit assistant text, tool results, or system prompts (thinking blocks are shown read-only)
+1. Edit thinking blocks, text, tool results, or system prompts
 2. Resample with the modified input
 3. Compare original vs. variant responses