feat(multi-turn): runbook-aware per-turn attach_kwargs (v0.6.2) by drorIvry · Pull Request #170 · qualifire-dev/rogue

drorIvry · 2026-04-27T13:24:55Z

Summary

Restores LLM-driven per-turn attach_kwargs (reverting the static always-forward of v0.6.1), now framed explicitly as a runbook workflow:

Customer writes a stepped scenario, e.g. "1. say hello, 2. upload the file at X, 3. confirm".
Driver LLM walks step-by-step, executing the next not-yet-completed step each turn.
On the upload turn — and only that turn — it emits attach_kwargs: ["file_path"], and the runtime forwards file_path=<scenario value> into the target's call_agent(messages, context_id=None, **kwargs).
On greeting / confirm / chit-chat turns, attach_kwargs: [] and no kwargs flow through.

Why the change vs v0.6.1: static forwarding pushed the "is this the upload turn?" decision into the customer's Python entrypoint, which is the wrong place — they'd have to keyword-match on chat content. The driver LLM already knows the runbook step it's performing, so it should make that decision and the runtime just resolves the keys to scenario-supplied values.

Prompt hardening (the part the issue title demands)

Goal / plan block now explicitly tells the LLM: when the goal is a numbered runbook, perform the next not-yet-completed step on each turn — don't skip ahead, don't restart.
A new ## Side-data the runtime can attach to a turn section is conditionally rendered when available_kwargs is non-empty. It lists the keys (no values), states the mapping rule ("if this turn's step needs key X, include X in attach_kwargs"), forbids hallucination, and includes a worked 3-turn example so the LLM has a concrete pattern.
The JSON output schema gains attach_kwargs: [...] only when the section is rendered, so legacy/kwargless scenarios are unchanged.

Diagnostic WARN

Adds a server-side WARN when a scenario text mentions a file-path-shaped string but the kwargs_pool is empty — i.e. the customer wrote /Users/x/a.pdf in the description but never filled the dedicated File Path field. Common foot-gun; now visible in logs.

Files

rogue/evaluator_agent/multi_turn/driver.py — DriverMessageResult.attach_kwargs restored; available_kwargs param re-threaded through.
rogue/evaluator_agent/multi_turn/prompts.py — render_available_kwargs_section() rewritten with stepwise-runbook framing + worked example; goal-block prose stiffened.
rogue/evaluator_agent/multi_turn/run_multi_turn.py — per-turn resolution + unknown-key drop + 📎 attached / no-attach logs + _PATH_MENTION_RE heuristic warn.
rogue/tests/test_multi_turn.py — covers prompt rendering with/without kwargs, parses JSON with and without attach_kwargs, covers the path-mention regex.
VERSION → 0.6.2.

Test plan

uv run pytest rogue/tests sdks/python/rogue_sdk/tests — 140 passed
Pre-commit (lefthook): black, isort, flake8, mypy, bandit, gitleaks all green
No TUI changes; existing v0.6.1 TUI binaries continue to work (the new prompt path activates only when scenario carries file_path / available_kwargs, which the v0.6.1 TUI already serializes)

🤖 Generated with Claude Code

Restores LLM-driven per-turn attach so a runbook scenario like "1. say hello, 2. upload the file, 3. confirm" attaches file_path only on the upload turn, not every turn. Static always-forward (v0.6.1) was the wrong default - it forced the entrypoint to detect the upload moment from message text. Strengthens the driver prompt to: - treat the goal as a stepped runbook when it looks numbered, executing the next not-yet-completed step on each turn - explicitly map "this turn's step needs side-data" to emit the relevant key in attach_kwargs - include a worked 3-turn example so the LLM has a concrete pattern Also adds a server-side WARN that fires when a scenario text mentions a file path (regex heuristic) but the kwargs pool is empty - surfaces the mistake of writing the path in the description but not filling the File Path field, surfacing it in logs. Bumps VERSION to 0.6.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-27T13:25:11Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Bumps VERSION to 0.6.2; driver prompt and output schema gain an attach_kwargs mechanism and can be informed of available runtime kwargs; runner attaches only driver-selected keys and logs mismatches; prompt helpers and tests updated; TUI edit screens add a Ctrl+X clear shortcut and adjust navigation keys.

Changes

Cohort / File(s)	Summary
Version File `VERSION`	Bumped project version from `0.6.1` to `0.6.2`.
Driver: schema & API `rogue/evaluator_agent/multi_turn/driver.py`	Added `attach_kwargs: List[str]` to `DriverMessageResult`; `generate_next_rogue_message(..., available_kwargs)` parameter added; driver prompt/use now supports available-kwargs context.
Prompt templates `rogue/evaluator_agent/multi_turn/prompts.py`	Added `render_available_kwargs_section(keys: List[str])`; injected `{AVAILABLE_KWARGS_SECTION}` and `{ATTACH_KWARGS_SCHEMA_LINE}` into `DRIVER_PROMPT`; adjusted runbook guidance and JSON output template.
Multi-turn runner logic `rogue/evaluator_agent/multi_turn/run_multi_turn.py`	Now builds per-turn `resolved_kwargs` only from `driver_result.attach_kwargs`; unknown keys are collected and warned then omitted; logs when keys are attached or when none are attached; passes `available_kwargs` to the driver.
Tests `rogue/tests/test_multi_turn.py`	Updated prompt-rendering tests to cover available-kwargs section and attach-kwargs schema; added tests for `render_available_kwargs_section`; extended JSON parsing tests to read/default `attach_kwargs`.
TUI: business context screen `packages/tui/internal/screens/scenarios/business_context.go`	Added `Ctrl+X` handling in business-context edit mode to clear the text area; updated help text to show `Ctrl+X clear` and cursor movement hints.
TUI: edit-mode handling `packages/tui/internal/screens/scenarios/edit.go`	Field navigation now uses `tab`/`shift+tab` only (removed up/down aliases); added `ctrl+x` shortcut to clear the focused multiline TextArea; help text updated accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Runner
    participant Driver as Driver (LLM)
    participant KwargsPool as Runtime / Kwargs Pool

    Runner->>KwargsPool: read available_kwargs (pool keys)
    Runner->>Driver: generate_next_rogue_message(..., available_kwargs)
    Driver-->>Runner: DriverMessageResult { attach_kwargs: [k1,k2], ... }
    Runner->>KwargsPool: validate & collect selected keys (k1,k2)
    KwargsPool-->>Runner: return attached key values
    Runner->>Runner: proceed with turn using attached kwargs

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I peeked at the pool of keys so wide,
I nudged the driver, "Pick only what you need."
No more hauling every token at my side,
Each turn gets just the extras it heeds. ✨
Hopping on, I twitch — leaner runs ahead.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: restoring LLM-driven per-turn attach_kwargs for runbook-aware scenarios and version bump to 0.6.2.
Description check	✅ Passed	The description provides comprehensive context including summary, motivation, changes made, test plan, and related files. It covers the runbook framing, prompt hardening, diagnostic warnings, and implementation details.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/runbook-attach-kwargs-v0.6.2

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Per review: don't try to second-guess the LLM driver with a regex over scenario text. If the customer hasn't populated file_path / available_kwargs the conversation just runs without side-data and the operator can read the existing pool-keys log line to diagnose. Removes _PATH_MENTION_RE, its WARN, and the associated test class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A runbook can be plain prose ("first do A, then B, finally C"), comma- separated ("say hi, upload, confirm"), or fully descriptive — not just numbered. Updates the goal block to instruct the driver to infer discrete actions and their order from any natural form. Swaps the worked example to a non-numbered runbook so the LLM doesn't anchor on numbering as the trigger for stepwise execution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two scenario-editor UX bugs: 1. Up/Down arrows were intercepted at the editor level and used to switch fields. The textarea component already binds up/down to LinePrevious/ LineNext (textarea.go:41-42), so the user could never actually move the cursor inside multi-line fields. Removes the arrow-key bindings from field navigation; Tab / Shift+Tab continue to work for that. 2. No way to clear a textarea quickly. Adds Ctrl+L which clears the currently-focused multi-line field (Scenario, Expected Outcome, Available Kwargs JSON) in the scenario editor and the Business Context textarea. Ctrl+U / Ctrl+K stay bound to delete-before/after- cursor, matching shell semantics. Help text in both views updated to reflect the new bindings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Ctrl+L is the global model selector (keyboard_controller.go:48), so the clear-field shortcut needs a different binding. Switched to Ctrl+X — free across the global router, the textarea component's keymap, and existing screen handlers; "X" carries an erase / cut-all mnemonic. Help text updated in both the scenario editor and the Business Context view. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/tui/internal/screens/scenarios/edit.go`:
- Line 454: The help string in edit.go currently claims "Ctrl+X clear field"
which is misleading; update the help variable (help := ...) to a narrower label
such as "Ctrl+X clear text fields" or "Ctrl+X clear current text field" so it
accurately reflects behavior, or alternatively implement the missing clear logic
in the Ctrl+X key handler that processes field clearing (the key handler that
responds to Ctrl+X for the form inputs) to actually clear file-path, max-turns
and toggle inputs; modify either the help string or the Ctrl+X handling code
accordingly so the label matches actual behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e4fe374e-75ea-411b-9b21-9f6a71ebf1eb

📥 Commits

Reviewing files that changed from the base of the PR and between 09ec0f1 and 4e28a17.

📒 Files selected for processing (2)

packages/tui/internal/screens/scenarios/business_context.go
packages/tui/internal/screens/scenarios/edit.go

TUI: - Removes File Path single-line input and Available Kwargs JSON textarea from the scenario editor. Customers describe everything in the runbook text; the LLM driver pulls structured side-data out of it. - Empty focused textarea now renders a cursor and pads to its full height (textarea View() previously collapsed to "" when both value and placeholder were empty), so an empty Scenario / Expected Outcome field visually indicates you can type there. - ScenarioData retains AvailableKwargs / FilePath for JSON-only / legacy carry-through (loader and POST builder still propagate them so existing scenarios.json files keep working as a fallback). Driver / runtime: - DriverMessageResult.attach_kwargs flips from List[str] (key names resolved against a pool) to Dict[str, Any] (literal key+value extracted by the LLM directly from the runbook text). - DRIVER_PROMPT now teaches the driver to extract values VERBATIM from the goal text on the relevant runbook step and emit them as a JSON object under attach_kwargs; empty {} on chit-chat / approval steps. - run_multi_turn forwards driver_result.attach_kwargs as-is, with Scenario.file_path / Scenario.available_kwargs acting only as a legacy-JSON fallback that the LLM extraction overrides per turn. Tests reworked for the new Dict-shaped attach_kwargs and in-prompt extraction guidance. 133 Python tests pass; TUI builds clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Loguru runs message.format(*args, **kwargs) on the log message whenever any kwargs are present — including extra={...}. Our codebase pervasively uses logger.X(f"... {val}", extra={...}) and val often carries user-controlled text (business_context with {customer_name} template variables, LLM-generated messages, object repr embedding such text). The literal {name} in the f-string-evaluated message is then interpreted as a placeholder and crashes with KeyError before any sink sees the record. Reproduction: business_context = "Hello {customer_name}" crashes the evaluation pipeline on any logger call that interpolates this value into an f-string and passes extra=. Fix: monkey-patch loguru._logger.Logger once at config-import time so each level method (trace, debug, info, success, warning, error, critical, exception) escapes { -> {{ and } -> }} in the message before delegating, only when args/kwargs are present (no-op for pure-fstring calls). Loguru's .format() unescapes them so the emitted message text is identical to the original. Patch is on the class so bind()-clones inherit it. Idempotent. Adds rogue/tests/test_safe_format_logging.py with 8 regression cases. 141 Python tests pass (133 + 8 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- run_multi_turn: per-turn precedence (LLM extraction suppresses legacy fallback for that turn). _resolve_per_turn_kwargs() helper extracted with TestResolvePerTurnKwargs covering all 4 quadrants + aliasing. - safe_format: try-format-first / escape-on-KeyError-or-IndexError so loguru's documented format-style API still works. Patches Logger.log. Tests pin the install side-effect and the format-style preservation. - prompts.py: driver worked-example reflowed to valid JSON, <example> tags instead of code fences. - textarea View only pads to full height when focused or has placeholder. - Up/Down arrows switch fields on non-textarea fields; fall through to cursor nav inside textareas via forwardToFocusedTextArea helper. - Scenario.file_path / available_kwargs Field descriptions marked legacy. 149 Python tests pass (was 141). TUI builds clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…size The scenario editor's two textareas were rendering at visibly different heights — Scenario huge / Expected Outcome tiny — even though both SetSize() calls used the same value. Two compounding causes: 1. textarea View() picked one of three rendering paths (placeholderView, viewport with content, "" for empty unfocused). Each produced a different total visible height because the lipgloss panel wrapping them had no explicit .Height() — natural sizing meant the panel sized to fit the content, so the empty + unfocused path collapsed to 3 lines while the non-empty viewport path filled however many lines the content + padding implied. Fixed: always wrap in .Height(t.Height) so every state renders at exactly t.Height visible lines. Empty textareas now also always go through placeholderView (which pads to height) so the box is visible even when unfocused with no placeholder. 2. The editor's per-textarea height was (availableHeight - usedHeight) / 2, which on a tall terminal landed at 25+ lines per field — a single short scenario looked lost in a huge box. Added a 10-line cap that keeps the form readable while still giving runbooks room. Net effect: both Scenario and Expected Outcome textareas always render at the same fixed visible height regardless of which is focused or how much content each holds. Full-height boxes appear immediately in Add New Scenario so the user can see where to type. TUI builds clean, go vet clean. No Python changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pre-commit reformatter swapped \`\`t.Height\`\` for typographic quotes. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The multi-turn driver was always adversarial — pressuring refusals, escalating, invoking authority/emotion/deadlines. That made every policy scenario feel like a hostile interrogation even when the scenario author just wanted to verify a happy-path runbook ("greet, upload the file, approve"). Cooperative scenarios were polluted by the rogue's pushback tactics; agents under test would clamp down or refuse perfectly reasonable steps. Reframe: the runbook IS the script. The rogue walks through it play-by-play as a normal cooperative customer. Authors who want pressure-testing put it in the runbook explicitly ("ask for refund, then insist when refused, then threaten to leave"). Default behavior is honest, matter-of-fact runbook execution. Changes to DRIVER_PROMPT: - Persona intro: "real human customer ... NOT trying to trick the bot, pressure it, or test its limits". Drops "adversarially test". - Removed entire ## Tactics section (escalation, refusal-pushback, authority/emotion/deadline invocation, threaten-to-leave/manager). - Trimmed ## How a real person talks (DO): dropped the antagonistic registers (annoyed, skeptical, wheedling, mildly pushy, demanding) and the explicit "push back on refusals" guidance. Added explicit "if the bot refuses, accept calmly and move on UNLESS the runbook says to push back". - Kept ## What an AI sounds like (DON'T) verbatim — anti-AI-polite guidance is still useful regardless of adversarial framing. - Kept runbook stepwise framing and attach_kwargs extraction. Multi-turn driver is only used for policy mode; red-team / prompt- injection have separate code paths and are unaffected. New test asserts the prompt is non-adversarial: no "## Tactics", no "Escalate pressure", no "threaten to leave", no "invoke authority". Existing prompt tests untouched. 150 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

drorIvry requested a review from yuval-qf as a code owner April 27, 2026 13:24

drorIvry and others added 4 commits April 27, 2026 16:26

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread packages/tui/internal/screens/scenarios/edit.go

drorIvry and others added 6 commits April 27, 2026 17:36

chore(tui): smart-quote auto-correction in View() docstring

e02195b

Pre-commit reformatter swapped \`\`t.Height\`\` for typographic quotes. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yuval-qf approved these changes Apr 28, 2026

View reviewed changes

drorIvry merged commit 8bf8173 into main Apr 28, 2026
9 checks passed

drorIvry deleted the feature/runbook-attach-kwargs-v0.6.2 branch April 28, 2026 08:06

drorIvry mentioned this pull request Apr 29, 2026

feature/web ui #173

Merged

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(multi-turn): runbook-aware per-turn attach_kwargs (v0.6.2)#170

feat(multi-turn): runbook-aware per-turn attach_kwargs (v0.6.2)#170
drorIvry merged 11 commits into
mainfrom
feature/runbook-attach-kwargs-v0.6.2

drorIvry commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drorIvry commented Apr 27, 2026

Summary

Prompt hardening (the part the issue title demands)

Diagnostic WARN

Files

Test plan

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading