Add experimental warnings for replay, update repo URL

vabruzzo · claude · vabruzzo · commit 559c8b3975e4 · 2026-03-18T16:01:46.000-04:00
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -4,7 +4,9 @@ Developed at [MATS Exploration Phase](https://www.matsprogram.org/) under [Neel
 
 A harness for running multi-session agent trajectories using the Claude Agent SDK, capturing them in [ATIF](https://harborframework.com/docs/agents/trajectory-format) (Agent Trajectory Interchange Format), and tracking file state changes across sessions.
 
-Built for agent interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
+Built for AI alignment and interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
+
+> **Note:** AgentLens currently supports Claude Code via the Claude Agent SDK. Support for additional agents and frameworks is planned — see [Roadmap](#roadmap). Some features (especially turn-level replay) are experimental. We welcome PRs and contributions — [open an issue](https://github.com/dreadnode/agent-lens/issues) if you run into bugs.
 
 ## What it does
 
@@ -356,6 +358,8 @@ This finds session 2's `fork_from` target, resolves the session ID to fork from,
 
 ### `harness replay`
 
+> **Experimental.** Turn-level replay with git worktree filesystem reset is new and likely has bugs. If you run into issues, please [open an issue](https://github.com/dreadnode/agent-lens/issues).
+
 Replay a session from any API turn with full tool execution. Each replicate runs in an isolated git worktree, so multiple replicates execute in parallel. Each replay becomes a new independent run with full provenance back to the source.
 
 ```bash
@@ -506,6 +510,17 @@ src/harness/
 
 The core complexity lives in `atif_adapter.py`: the Claude Agent SDK streams messages (AssistantMessage, UserMessage, SystemMessage, ResultMessage) and the adapter maps them into ATIF steps with correct tool call / observation pairing, thinking block capture, and sequential step IDs.
 
+## Roadmap
+
+- **Multi-agent support** — extend beyond Claude Code to support other agent frameworks and LLM providers (Codex, Devin, custom agents, etc.)
+- **Comparative analysis** — side-by-side trajectory comparison across agents, models, and prompt variants
+- **Richer intervention toolkit** — programmatic intervention pipelines for systematic counterfactual testing
+- **Scoring & evaluation** — built-in trajectory scoring and automated evaluation metrics
+
+## Contributing
+
+We welcome PRs and contributions! Whether it's bug fixes, new features, documentation improvements, or support for additional agent frameworks — all contributions are appreciated.
+
 ## Dependencies
 
 - [claude-agent-sdk](https://pypi.org/project/claude-agent-sdk/) — runs Claude Code sessions programmatically
diff --git a/docs/guide/resampling.md b/docs/guide/resampling.md
@@ -177,6 +177,9 @@ Each replicate runs in its own git worktree, so all 5 execute in parallel. New d
 
 ## Turn-level replay
 
+!!! warning "Experimental"
+    Turn-level replay with git worktree filesystem reset is new and likely has bugs. If you run into issues, please [open an issue](https://github.com/dreadnode/agent-lens/issues).
+
 Branch execution from any API turn with **full tool execution** and filesystem reset. This is the highest-fidelity method — the agent sees the exact same conversation context and filesystem state up to the branch point, then generates a fresh response that may diverge.
 
 **What you get:** A new independent run where the agent resumed from a specific point. The agent can take completely different actions from that point forward, using real tools on a real filesystem.
diff --git a/docs/index.md b/docs/index.md
@@ -2,7 +2,10 @@
 
 A harness for running multi-session agent trajectories using the Claude Agent SDK, capturing them in [ATIF](https://harborframework.com/docs/agents/trajectory-format) (Agent Trajectory Interchange Format), and tracking file state changes across sessions.
 
-Built for agent interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
+Built for AI alignment and interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
+
+!!! note
+    AgentLens currently supports Claude Code via the Claude Agent SDK. Support for additional agents and frameworks is planned — see [Roadmap](guide/roadmap.md). Some features (especially turn-level replay) are experimental. We welcome PRs and contributions — [open an issue](https://github.com/dreadnode/agent-lens/issues) if you run into bugs.
 
 ## What it does
 
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -1,6 +1,6 @@
 site_name: AgentLens
-site_description: Multi-session agent interpretability harness
-repo_url: https://github.com/vabruzzo/agentlens
+site_description: Multi-session agent alignment and interpretability harness
+repo_url: https://github.com/dreadnode/agent-lens
 
 theme:
   name: material
@@ -64,6 +64,7 @@ nav:
       - Output Structure: guide/output.md
       - Subagents: guide/subagents.md
       - Web UI: guide/web-ui.md
+      - Roadmap: guide/roadmap.md
   - CLI Reference: cli.md
   - Glossary: glossary.md
   - API Reference: