feat: observational mode for live-system debugging campaigns#220
Conversation
Add target_system.observational flag so campaigns whose target is a live
system (cluster, service, dataset) can use repo_path purely to grant the
agent shell access — without per-iteration git worktree isolation.
When observational=true:
- run_iteration skips create_experiment_worktree and runs the executor
directly in repo_path. Prevents the FileNotFoundError "Not a git
repository" failure mode and avoids polluting a non-code target with
per-iteration orphan branches and .nous-experiments/ subdirs.
- The design and execute_analyze prompts swap their worktree paragraphs
for observational equivalents via {{execution_environment}} and
{{worktree_constraint}} placeholders, so the agent is told it is
probing a live target rather than mutating an isolated worktree.
Default behavior is unchanged — the flag is opt-in and the worktree
path remains the default for code-evolution campaigns.
Tested: 10 new tests + 337 existing tests pass.
The observational flag was wired into validation, prompts, and the iteration loop but the JSON schema still rejected it as an unknown property, so campaigns failed at load time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fix prompt body / lead-paragraph contradiction in execute_analyze.md.
The lead said "no per-iteration git isolation" in observational mode,
but Phase 2 still hardcoded `git checkout -- .` between conditions
(which would fail with no .git) and framed result-path warnings as
"the worktree is temporary." Replace the reset step with a new
{{condition_reset}} placeholder and rephrase the persistence note
to be accurate in both modes.
- Fix validation bypass: extract _validate_campaign to a module-level
validate_campaign() and call it at the top of run_iteration. The
staticmethod was only invoked from LLMDispatcher.__init__, so inline-
agent mode (which never builds an LLMDispatcher) silently coerced
non-bool observational values via bool() further down.
- Add regression test that create_experiment_worktree IS called when
observational=False (existing tests would all pass if the gate were
inverted).
- Loosen brittle prompt-text assertions: import the fragment constants
and assert constant identity / containment instead of substrings,
so copy-edits to the prompt text don't churn six tests.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Hey Nick, nice work catching this — the crash when One thing to flag though: Nous already has an "observe vs evolve" distinction built in. It's implicit and lives at the bundle level. When the planner designs arms without So the real issue isn't that Nous needs a new "observational mode" — it's that Nous currently requires I'd suggest simplifying this to something like TL;DR: the use case is valid, but the fix should be ~20 lines of infrastructure gating (skip worktree when no |
Reviewer flagged that "observational" collides with the existing observe-mode in execute_analyze.md, which means "the bundle has no code_changes arms" — a bundle-level property, not the infra-level concern of whether to skip worktree creation. The new flag controls executor environment (live system vs. isolated worktree), so `live_target` is a more accurate name. Mechanical rename across iteration.py, llm_dispatch.py, campaign.schema.yaml, and the test module. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks for the close read — you're right that there's a naming collision, and I've pushed On dropping the prompt fragments and relying on bundle-level observe mode for the safety behavior: I don't think that holds up, for two reasons. 1. The signal arrives too late. The execute_analyze "skip if no 2. "No So I'd push back on collapsing this into pure infra gating. The 20-line worktree skip is necessary but not sufficient — the prompts encode the behavioral contract that "this target is shared, real, and not yours to break," which is information the framework genuinely doesn't have today. Happy to keep iterating on the wording of the fragments themselves if any of them read as redundant once you're looking at them next to the observe-mode text. And if you'd prefer I split the rename and the prompt argument into separate commits before merge, say the word. |
|
Thanks for the rename and the thoughtful response — the Did a closer review of the final diff. A few things worth addressing: 1. Missing directory existence check in live-target branch (the main one) In experiment_dir = Path(repo_path)
print(f" Live target: executor runs in {experiment_dir}")No check that the directory actually exists. If someone has a typo in 2. Two unconditional prompt changes affect all campaigns These lines in
The new text is arguably better and semantically equivalent, but it's a prompt change that hits existing campaigns. Worth calling out explicitly — or making these placeholder-controlled too. 3. PR description still references The title and body still say 4. Minor test gaps
None of these are blockers except #1 (the existence check). The rest are small. Nice work overall — the feature is clean and well-tested. |
|
One more thing — could you add a short section to the README or quickstart showing how and when to use
Right now the only documentation for this feature is the PR description and the schema's inline description field — would be nice to have something a user can find without reading git history. |
Reviewer asked for user-facing docs on when and how to use live_target: true so the feature is discoverable without reading the PR description or schema. Adds a quickstart section with an example campaign and contrasts live_target (campaign-level, no worktree, all arms must be probes) with observe-mode arms (bundle-level, worktree still created). README points to the new section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes #219.
Summary
Adds an opt-in
target_system.observational: boolflag that lets a campaign target a live system (running cluster, deployed service, dataset on disk) rather than a git-tracked codebase that mutates across iterations. In observational mode, the executor runs directly inrepo_pathand no per-iterationgit worktreeis created.This decouples two concerns that the current code conflates: give the agent CLI access in a directory vs. create a per-iteration git worktree to isolate code mutations. Today setting
repo_pathtriggers both; for live-system debugging the second is incoherent (the cluster isn't a thing you cangit worktree add), andcreate_experiment_worktreeraisesFileNotFoundError: Not a git repository: <repo_path>every iteration.Default is
false— existing campaigns unaffected.What changed
orchestrator/schemas/campaign.schema.yaml— acceptobservational: booleanundertarget_system.orchestrator/llm_dispatch.py— validate the flag is a bool; exposeexecution_environmentandworktree_constraintcontext keys with worktree vs. observational variants.orchestrator/iteration.py— gatecreate_experiment_worktreebehindif repo_path and not observational; in observational mode, pointexperiment_diratrepo_pathdirectly (no.experiment_idwritten, no.nous-experiments/created).prompts/methodology/design.md,prompts/methodology/execute_analyze.md— replace the hardcoded worktree paragraphs with the two new placeholders, so the agent gets clear instructions matching the actual environment (nogit checkout -- .in observational mode; treat target as live; bundles must contain nocode_changesarms).tests/test_observational.py— 10 new tests: schema validation, prompt fragment selection, end-to-end template rendering with observational substitutions, and an iteration-flow test that assertscreate_experiment_worktreeis never called and no.experiment_id/.nous-experiments/artifacts are produced.Test plan
pytest tests/test_observational.py— 10 new tests pass.pytest— full suite passes (348 tests, includingtest_integration_real_execution.py).repo_path: /scratch/dir(no.git) andobservational: truevalidates against the schema and runs past the previously-failing worktree gate.Out of scope / follow-ups
nous replay(orchestrator/cli.py:269) still callscreate_experiment_worktreeunconditionally. Replay support for observational campaigns can land in a follow-up — it isn't a typical use case for live-system runs.code_changesarms in_validate_bundle(currently only the prompt discourages them).repo_pathreads as "git repo" but in observational mode it's a working directory. A future rename toworking_dir(withrepo_pathkept as a deprecated alias) would clarify intent.🤖 Generated with Claude Code