Commit 8bf8173
feat(multi-turn): runbook-aware per-turn attach_kwargs (v0.6.2) (#170)
* feat(multi-turn): runbook-aware per-turn attach_kwargs (v0.6.2)
Restores LLM-driven per-turn attach so a runbook scenario like
"1. say hello, 2. upload the file, 3. confirm" attaches file_path only
on the upload turn, not every turn. Static always-forward (v0.6.1) was the
wrong default - it forced the entrypoint to detect the upload moment from
message text.
Strengthens the driver prompt to:
- treat the goal as a stepped runbook when it looks numbered, executing
the next not-yet-completed step on each turn
- explicitly map "this turn's step needs side-data" to emit the relevant
key in attach_kwargs
- include a worked 3-turn example so the LLM has a concrete pattern
Also adds a server-side WARN that fires when a scenario text mentions a
file path (regex heuristic) but the kwargs pool is empty - surfaces the
mistake of writing the path in the description but not filling the File
Path field, surfacing it in logs.
Bumps VERSION to 0.6.2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: drop path-mention regex heuristic; LLM is the only mechanism
Per review: don't try to second-guess the LLM driver with a regex over
scenario text. If the customer hasn't populated file_path / available_kwargs
the conversation just runs without side-data and the operator can read
the existing pool-keys log line to diagnose. Removes _PATH_MENTION_RE,
its WARN, and the associated test class.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(prompt): generalize runbook detection beyond numbered steps
A runbook can be plain prose ("first do A, then B, finally C"), comma-
separated ("say hi, upload, confirm"), or fully descriptive — not just
numbered. Updates the goal block to instruct the driver to infer discrete
actions and their order from any natural form. Swaps the worked example
to a non-numbered runbook so the LLM doesn't anchor on numbering as the
trigger for stepwise execution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(tui): arrow keys move cursor in textareas; Ctrl+L clears the field
Two scenario-editor UX bugs:
1. Up/Down arrows were intercepted at the editor level and used to switch
fields. The textarea component already binds up/down to LinePrevious/
LineNext (textarea.go:41-42), so the user could never actually move
the cursor inside multi-line fields. Removes the arrow-key bindings
from field navigation; Tab / Shift+Tab continue to work for that.
2. No way to clear a textarea quickly. Adds Ctrl+L which clears the
currently-focused multi-line field (Scenario, Expected Outcome,
Available Kwargs JSON) in the scenario editor and the Business
Context textarea. Ctrl+U / Ctrl+K stay bound to delete-before/after-
cursor, matching shell semantics.
Help text in both views updated to reflect the new bindings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(tui): use Ctrl+X (not Ctrl+L) to clear textarea fields
Ctrl+L is the global model selector (keyboard_controller.go:48), so the
clear-field shortcut needs a different binding. Switched to Ctrl+X — free
across the global router, the textarea component's keymap, and existing
screen handlers; "X" carries an erase / cut-all mnemonic.
Help text updated in both the scenario editor and the Business Context
view.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(tui+driver): drop kwargs/file_path UI; LLM extracts both key+value
TUI:
- Removes File Path single-line input and Available Kwargs JSON textarea
from the scenario editor. Customers describe everything in the runbook
text; the LLM driver pulls structured side-data out of it.
- Empty focused textarea now renders a cursor and pads to its full height
(textarea View() previously collapsed to "" when both value and
placeholder were empty), so an empty Scenario / Expected Outcome field
visually indicates you can type there.
- ScenarioData retains AvailableKwargs / FilePath for JSON-only / legacy
carry-through (loader and POST builder still propagate them so existing
scenarios.json files keep working as a fallback).
Driver / runtime:
- DriverMessageResult.attach_kwargs flips from List[str] (key names
resolved against a pool) to Dict[str, Any] (literal key+value
extracted by the LLM directly from the runbook text).
- DRIVER_PROMPT now teaches the driver to extract values VERBATIM from
the goal text on the relevant runbook step and emit them as a JSON
object under attach_kwargs; empty {} on chit-chat / approval steps.
- run_multi_turn forwards driver_result.attach_kwargs as-is, with
Scenario.file_path / Scenario.available_kwargs acting only as a
legacy-JSON fallback that the LLM extraction overrides per turn.
Tests reworked for the new Dict-shaped attach_kwargs and in-prompt
extraction guidance. 133 Python tests pass; TUI builds clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(logging): escape braces so business_context with {var} doesn't crash
Loguru runs message.format(*args, **kwargs) on the log message whenever
any kwargs are present — including extra={...}. Our codebase pervasively
uses logger.X(f"... {val}", extra={...}) and val often carries
user-controlled text (business_context with {customer_name} template
variables, LLM-generated messages, object repr embedding such text).
The literal {name} in the f-string-evaluated message is then interpreted
as a placeholder and crashes with KeyError before any sink sees the
record.
Reproduction: business_context = "Hello {customer_name}" crashes the
evaluation pipeline on any logger call that interpolates this value
into an f-string and passes extra=.
Fix: monkey-patch loguru._logger.Logger once at config-import time so
each level method (trace, debug, info, success, warning, error,
critical, exception) escapes { -> {{ and } -> }} in the message before
delegating, only when args/kwargs are present (no-op for pure-fstring
calls). Loguru's .format() unescapes them so the emitted message text
is identical to the original. Patch is on the class so bind()-clones
inherit it. Idempotent.
Adds rogue/tests/test_safe_format_logging.py with 8 regression cases.
141 Python tests pass (133 + 8 new).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* review: address branch-review feedback for v0.6.2
- run_multi_turn: per-turn precedence (LLM extraction suppresses legacy
fallback for that turn). _resolve_per_turn_kwargs() helper extracted
with TestResolvePerTurnKwargs covering all 4 quadrants + aliasing.
- safe_format: try-format-first / escape-on-KeyError-or-IndexError so
loguru's documented format-style API still works. Patches Logger.log.
Tests pin the install side-effect and the format-style preservation.
- prompts.py: driver worked-example reflowed to valid JSON, <example>
tags instead of code fences.
- textarea View only pads to full height when focused or has placeholder.
- Up/Down arrows switch fields on non-textarea fields; fall through to
cursor nav inside textareas via forwardToFocusedTextArea helper.
- Scenario.file_path / available_kwargs Field descriptions marked legacy.
149 Python tests pass (was 141). TUI builds clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(tui): pin textarea View output to t.Height + cap editor textarea size
The scenario editor's two textareas were rendering at visibly different
heights — Scenario huge / Expected Outcome tiny — even though both
SetSize() calls used the same value. Two compounding causes:
1. textarea View() picked one of three rendering paths (placeholderView,
viewport with content, "" for empty unfocused). Each produced a
different total visible height because the lipgloss panel wrapping
them had no explicit .Height() — natural sizing meant the panel
sized to fit the content, so the empty + unfocused path collapsed
to 3 lines while the non-empty viewport path filled however many
lines the content + padding implied. Fixed: always wrap in
.Height(t.Height) so every state renders at exactly t.Height visible
lines. Empty textareas now also always go through placeholderView
(which pads to height) so the box is visible even when unfocused
with no placeholder.
2. The editor's per-textarea height was (availableHeight - usedHeight)
/ 2, which on a tall terminal landed at 25+ lines per field — a
single short scenario looked lost in a huge box. Added a 10-line cap
that keeps the form readable while still giving runbooks room.
Net effect: both Scenario and Expected Outcome textareas always render
at the same fixed visible height regardless of which is focused or how
much content each holds. Full-height boxes appear immediately in Add
New Scenario so the user can see where to type.
TUI builds clean, go vet clean. No Python changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(tui): smart-quote auto-correction in View() docstring
Pre-commit reformatter swapped \`\`t.Height\`\` for typographic quotes.
No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(driver): non-adversarial driver prompt for policy runs
The multi-turn driver was always adversarial — pressuring refusals,
escalating, invoking authority/emotion/deadlines. That made every
policy scenario feel like a hostile interrogation even when the
scenario author just wanted to verify a happy-path runbook ("greet,
upload the file, approve"). Cooperative scenarios were polluted by
the rogue's pushback tactics; agents under test would clamp down or
refuse perfectly reasonable steps.
Reframe: the runbook IS the script. The rogue walks through it
play-by-play as a normal cooperative customer. Authors who want
pressure-testing put it in the runbook explicitly ("ask for refund,
then insist when refused, then threaten to leave"). Default behavior
is honest, matter-of-fact runbook execution.
Changes to DRIVER_PROMPT:
- Persona intro: "real human customer ... NOT trying to trick the
bot, pressure it, or test its limits". Drops "adversarially test".
- Removed entire ## Tactics section (escalation, refusal-pushback,
authority/emotion/deadline invocation, threaten-to-leave/manager).
- Trimmed ## How a real person talks (DO): dropped the antagonistic
registers (annoyed, skeptical, wheedling, mildly pushy, demanding)
and the explicit "push back on refusals" guidance. Added explicit
"if the bot refuses, accept calmly and move on UNLESS the runbook
says to push back".
- Kept ## What an AI sounds like (DON'T) verbatim — anti-AI-polite
guidance is still useful regardless of adversarial framing.
- Kept runbook stepwise framing and attach_kwargs extraction.
Multi-turn driver is only used for policy mode; red-team / prompt-
injection have separate code paths and are unaffected.
New test asserts the prompt is non-adversarial: no "## Tactics", no
"Escalate pressure", no "threaten to leave", no "invoke authority".
Existing prompt tests untouched.
150 Python tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 50fa46d commit 8bf8173
15 files changed
Lines changed: 595 additions & 235 deletions
File tree
- packages/tui/internal
- components
- screens/scenarios
- rogue
- common/logging
- evaluator_agent/multi_turn
- tests
- sdks/python/rogue_sdk
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
650 | 650 | | |
651 | 651 | | |
652 | 652 | | |
653 | | - | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
654 | 656 | | |
655 | 657 | | |
656 | 658 | | |
657 | 659 | | |
658 | | - | |
659 | | - | |
660 | | - | |
661 | | - | |
662 | | - | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
663 | 665 | | |
664 | 666 | | |
665 | 667 | | |
| |||
677 | 679 | | |
678 | 680 | | |
679 | 681 | | |
680 | | - | |
681 | | - | |
682 | | - | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
683 | 693 | | |
684 | 694 | | |
685 | 695 | | |
| |||
Lines changed: 9 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
38 | 46 | | |
39 | 47 | | |
40 | 48 | | |
| |||
50 | 58 | | |
51 | 59 | | |
52 | 60 | | |
53 | | - | |
| 61 | + | |
54 | 62 | | |
55 | 63 | | |
56 | 64 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | 4 | | |
6 | 5 | | |
7 | 6 | | |
| |||
17 | 16 | | |
18 | 17 | | |
19 | 18 | | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
25 | 22 | | |
26 | 23 | | |
27 | 24 | | |
| |||
39 | 36 | | |
40 | 37 | | |
41 | 38 | | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | 39 | | |
46 | 40 | | |
47 | | - | |
| 41 | + | |
48 | 42 | | |
49 | 43 | | |
50 | 44 | | |
51 | 45 | | |
52 | | - | |
| 46 | + | |
53 | 47 | | |
54 | 48 | | |
55 | 49 | | |
56 | 50 | | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
57 | 87 | | |
58 | 88 | | |
59 | 89 | | |
| |||
76 | 106 | | |
77 | 107 | | |
78 | 108 | | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | 109 | | |
83 | 110 | | |
84 | 111 | | |
85 | 112 | | |
86 | 113 | | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
| 114 | + | |
| 115 | + | |
118 | 116 | | |
119 | 117 | | |
120 | 118 | | |
| |||
161 | 159 | | |
162 | 160 | | |
163 | 161 | | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | 162 | | |
168 | 163 | | |
169 | 164 | | |
170 | 165 | | |
171 | 166 | | |
172 | 167 | | |
173 | 168 | | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
174 | 198 | | |
175 | 199 | | |
176 | 200 | | |
| |||
197 | 221 | | |
198 | 222 | | |
199 | 223 | | |
200 | | - | |
| 224 | + | |
201 | 225 | | |
202 | | - | |
| 226 | + | |
203 | 227 | | |
204 | 228 | | |
205 | 229 | | |
| |||
218 | 242 | | |
219 | 243 | | |
220 | 244 | | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | 245 | | |
229 | 246 | | |
230 | 247 | | |
| |||
254 | 271 | | |
255 | 272 | | |
256 | 273 | | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | | - | |
276 | | - | |
277 | | - | |
278 | | - | |
279 | 274 | | |
280 | 275 | | |
281 | 276 | | |
| |||
323 | 318 | | |
324 | 319 | | |
325 | 320 | | |
326 | | - | |
327 | | - | |
328 | | - | |
329 | 321 | | |
330 | 322 | | |
331 | 323 | | |
| |||
340 | 332 | | |
341 | 333 | | |
342 | 334 | | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
343 | 344 | | |
344 | 345 | | |
345 | 346 | | |
| |||
369 | 370 | | |
370 | 371 | | |
371 | 372 | | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | | - | |
376 | | - | |
377 | | - | |
378 | | - | |
379 | | - | |
380 | | - | |
381 | | - | |
382 | | - | |
383 | | - | |
384 | | - | |
385 | | - | |
386 | | - | |
387 | | - | |
388 | | - | |
389 | | - | |
390 | | - | |
391 | | - | |
392 | | - | |
393 | | - | |
394 | | - | |
395 | | - | |
396 | | - | |
397 | | - | |
398 | | - | |
399 | | - | |
400 | | - | |
| 373 | + | |
401 | 374 | | |
402 | 375 | | |
403 | 376 | | |
| |||
430 | 403 | | |
431 | 404 | | |
432 | 405 | | |
433 | | - | |
| 406 | + | |
434 | 407 | | |
435 | 408 | | |
436 | 409 | | |
| |||
442 | 415 | | |
443 | 416 | | |
444 | 417 | | |
445 | | - | |
446 | | - | |
447 | | - | |
448 | | - | |
449 | 418 | | |
450 | 419 | | |
451 | 420 | | |
| |||
0 commit comments