You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+16-1Lines changed: 16 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,9 @@ Developed at [MATS Exploration Phase](https://www.matsprogram.org/) under [Neel
4
4
5
5
A harness for running multi-session agent trajectories using the Claude Agent SDK, capturing them in [ATIF](https://harborframework.com/docs/agents/trajectory-format) (Agent Trajectory Interchange Format), and tracking file state changes across sessions.
6
6
7
-
Built for agent interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
7
+
Built for AI alignment and interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
8
+
9
+
> **Note:** AgentLens currently supports Claude Code via the Claude Agent SDK. Support for additional agents and frameworks is planned — see [Roadmap](#roadmap). Some features (especially turn-level replay) are experimental. We welcome PRs and contributions — [open an issue](https://github.com/dreadnode/agent-lens/issues) if you run into bugs.
8
10
9
11
## What it does
10
12
@@ -356,6 +358,8 @@ This finds session 2's `fork_from` target, resolves the session ID to fork from,
356
358
357
359
### `harness replay`
358
360
361
+
> **Experimental.** Turn-level replay with git worktree filesystem reset is new and likely has bugs. If you run into issues, please [open an issue](https://github.com/dreadnode/agent-lens/issues).
362
+
359
363
Replay a session from any API turn with full tool execution. Each replicate runs in an isolated git worktree, so multiple replicates execute in parallel. Each replay becomes a new independent run with full provenance back to the source.
360
364
361
365
```bash
@@ -506,6 +510,17 @@ src/harness/
506
510
507
511
The core complexity lives in `atif_adapter.py`: the Claude Agent SDK streams messages (AssistantMessage, UserMessage, SystemMessage, ResultMessage) and the adapter maps them into ATIF steps with correct tool call / observation pairing, thinking block capture, and sequential step IDs.
508
512
513
+
## Roadmap
514
+
515
+
-**Multi-agent support** — extend beyond Claude Code to support other agent frameworks and LLM providers (Codex, Devin, custom agents, etc.)
516
+
-**Comparative analysis** — side-by-side trajectory comparison across agents, models, and prompt variants
We welcome PRs and contributions! Whether it's bug fixes, new features, documentation improvements, or support for additional agent frameworks — all contributions are appreciated.
523
+
509
524
## Dependencies
510
525
511
526
-[claude-agent-sdk](https://pypi.org/project/claude-agent-sdk/) — runs Claude Code sessions programmatically
Copy file name to clipboardExpand all lines: docs/guide/resampling.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -177,6 +177,9 @@ Each replicate runs in its own git worktree, so all 5 execute in parallel. New d
177
177
178
178
## Turn-level replay
179
179
180
+
!!! warning "Experimental"
181
+
Turn-level replay with git worktree filesystem reset is new and likely has bugs. If you run into issues, please [open an issue](https://github.com/dreadnode/agent-lens/issues).
182
+
180
183
Branch execution from any API turn with **full tool execution** and filesystem reset. This is the highest-fidelity method — the agent sees the exact same conversation context and filesystem state up to the branch point, then generates a fresh response that may diverge.
181
184
182
185
**What you get:** A new independent run where the agent resumed from a specific point. The agent can take completely different actions from that point forward, using real tools on a real filesystem.
Copy file name to clipboardExpand all lines: docs/index.md
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,10 @@
2
2
3
3
A harness for running multi-session agent trajectories using the Claude Agent SDK, capturing them in [ATIF](https://harborframework.com/docs/agents/trajectory-format) (Agent Trajectory Interchange Format), and tracking file state changes across sessions.
4
4
5
-
Built for agent interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
5
+
Built for AI alignment and interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
6
+
7
+
!!! note
8
+
AgentLens currently supports Claude Code via the Claude Agent SDK. Support for additional agents and frameworks is planned — see [Roadmap](guide/roadmap.md). Some features (especially turn-level replay) are experimental. We welcome PRs and contributions — [open an issue](https://github.com/dreadnode/agent-lens/issues) if you run into bugs.
0 commit comments