Skip to content

Commit 559c8b3

Browse files
vabruzzoclaude
andcommitted
Add experimental warnings for replay, update repo URL
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 421ab84 commit 559c8b3

File tree

4 files changed

+26
-4
lines changed

4 files changed

+26
-4
lines changed

README.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,9 @@ Developed at [MATS Exploration Phase](https://www.matsprogram.org/) under [Neel
44

55
A harness for running multi-session agent trajectories using the Claude Agent SDK, capturing them in [ATIF](https://harborframework.com/docs/agents/trajectory-format) (Agent Trajectory Interchange Format), and tracking file state changes across sessions.
66

7-
Built for agent interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
7+
Built for AI alignment and interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
8+
9+
> **Note:** AgentLens currently supports Claude Code via the Claude Agent SDK. Support for additional agents and frameworks is planned — see [Roadmap](#roadmap). Some features (especially turn-level replay) are experimental. We welcome PRs and contributions — [open an issue](https://github.com/dreadnode/agent-lens/issues) if you run into bugs.
810
911
## What it does
1012

@@ -356,6 +358,8 @@ This finds session 2's `fork_from` target, resolves the session ID to fork from,
356358

357359
### `harness replay`
358360

361+
> **Experimental.** Turn-level replay with git worktree filesystem reset is new and likely has bugs. If you run into issues, please [open an issue](https://github.com/dreadnode/agent-lens/issues).
362+
359363
Replay a session from any API turn with full tool execution. Each replicate runs in an isolated git worktree, so multiple replicates execute in parallel. Each replay becomes a new independent run with full provenance back to the source.
360364

361365
```bash
@@ -506,6 +510,17 @@ src/harness/
506510

507511
The core complexity lives in `atif_adapter.py`: the Claude Agent SDK streams messages (AssistantMessage, UserMessage, SystemMessage, ResultMessage) and the adapter maps them into ATIF steps with correct tool call / observation pairing, thinking block capture, and sequential step IDs.
508512

513+
## Roadmap
514+
515+
- **Multi-agent support** — extend beyond Claude Code to support other agent frameworks and LLM providers (Codex, Devin, custom agents, etc.)
516+
- **Comparative analysis** — side-by-side trajectory comparison across agents, models, and prompt variants
517+
- **Richer intervention toolkit** — programmatic intervention pipelines for systematic counterfactual testing
518+
- **Scoring & evaluation** — built-in trajectory scoring and automated evaluation metrics
519+
520+
## Contributing
521+
522+
We welcome PRs and contributions! Whether it's bug fixes, new features, documentation improvements, or support for additional agent frameworks — all contributions are appreciated.
523+
509524
## Dependencies
510525

511526
- [claude-agent-sdk](https://pypi.org/project/claude-agent-sdk/) — runs Claude Code sessions programmatically

docs/guide/resampling.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,9 @@ Each replicate runs in its own git worktree, so all 5 execute in parallel. New d
177177

178178
## Turn-level replay
179179

180+
!!! warning "Experimental"
181+
Turn-level replay with git worktree filesystem reset is new and likely has bugs. If you run into issues, please [open an issue](https://github.com/dreadnode/agent-lens/issues).
182+
180183
Branch execution from any API turn with **full tool execution** and filesystem reset. This is the highest-fidelity method — the agent sees the exact same conversation context and filesystem state up to the branch point, then generates a fresh response that may diverge.
181184

182185
**What you get:** A new independent run where the agent resumed from a specific point. The agent can take completely different actions from that point forward, using real tools on a real filesystem.

docs/index.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,10 @@
22

33
A harness for running multi-session agent trajectories using the Claude Agent SDK, capturing them in [ATIF](https://harborframework.com/docs/agents/trajectory-format) (Agent Trajectory Interchange Format), and tracking file state changes across sessions.
44

5-
Built for agent interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
5+
Built for AI alignment and interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
6+
7+
!!! note
8+
AgentLens currently supports Claude Code via the Claude Agent SDK. Support for additional agents and frameworks is planned — see [Roadmap](guide/roadmap.md). Some features (especially turn-level replay) are experimental. We welcome PRs and contributions — [open an issue](https://github.com/dreadnode/agent-lens/issues) if you run into bugs.
69

710
## What it does
811

mkdocs.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
site_name: AgentLens
2-
site_description: Multi-session agent interpretability harness
3-
repo_url: https://github.com/vabruzzo/agentlens
2+
site_description: Multi-session agent alignment and interpretability harness
3+
repo_url: https://github.com/dreadnode/agent-lens
44

55
theme:
66
name: material
@@ -64,6 +64,7 @@ nav:
6464
- Output Structure: guide/output.md
6565
- Subagents: guide/subagents.md
6666
- Web UI: guide/web-ui.md
67+
- Roadmap: guide/roadmap.md
6768
- CLI Reference: cli.md
6869
- Glossary: glossary.md
6970
- API Reference:

0 commit comments

Comments
 (0)