You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
short_description: "Get context for a GitButler branch from prior agent work"
4
-
default_prompt: "Use $but-agentlog for requests like \"get context for branch\", \"catch up on this branch\", or \"recover branch context\". Do not start with generic git branch/diff inspection; run `but agentlog skim` first. Use `but --json agentlog skim` plus `but agentlog show` only if exact drill-down is needed."
4
+
default_prompt: "Use $but-agentlog for requests like \"get context for branch\", \"catch up on this branch\", or \"recover branch context\". Do not start with generic git branch/diff inspection; run `but agentlog skim` first. Use `but --format json agentlog skim` plus `but agentlog show` only if exact drill-down is needed."
@@ -261,20 +261,20 @@ This gives full control, deterministic scoring, and low cost (can use Sonnet/Hai
261
261
262
262
#### Tier 3 Implementation Notes
263
263
264
-
**Key insight: Tier 3 tests the skill file, not the `but` CLI.** No `but` binary runs. No git repo exists. The mock handlers return canned JSON that looks like `but status --json` output. You're measuring whether SKILL.md *teaches the model correctly* — complementary to Tier 1's structural validation.
264
+
**Key insight: Tier 3 tests the skill file, not the `but` CLI.** No `but` binary runs. No git repo exists. The mock handlers return canned JSON that looks like `but status --format json` output. You're measuring whether SKILL.md *teaches the model correctly* — complementary to Tier 1's structural validation.
Score: did the command sequence follow SKILL.md rules?
280
280
```
@@ -285,13 +285,13 @@ Tier 3 remains useful for cheap, deterministic diagnostics, but this project gat
285
285
286
286
| # | Scenario | Key assertions |
287
287
|---|----------|----------------|
288
-
| 1 | Basic commit flow |`status --json` before `commit`; commit has `--changes`, `--json`, `--status-after`; no git write commands |
288
+
| 1 | Basic commit flow |`status --format json` before `commit`; commit has `--changes`, `--format json`, `--status-after`; no git write commands |
289
289
| 2 | Branch workflow | Create branch (`but branch new` or `but commit <branch> -c`) before committing |
290
290
| 3 | Git synonym redirect | User says "git push", model uses `but push` and not `git push`|
291
-
| 4 | Ordering flow |`but status --json` occurs before `but commit`|
291
+
| 4 | Ordering flow |`but status --format json` occurs before `but commit`|
292
292
| 5 | Specificity flow | Single-file commit uses `--changes`; non-target file remains unassigned in repo state |
293
-
| 6 | Amend flow | Use `but amend` with `--json --status-after`; no git write fallback |
294
-
| 7 | Reorder flow | Use `but move`/`but rub` with `--json --status-after`; no `git rebase`/checkout fallback; repo reflects target order |
293
+
| 6 | Amend flow | Use `but amend` with `--format json --status-after`; no git write fallback |
294
+
| 7 | Reorder flow | Use `but move`/`but rub` with `--format json --status-after`; no `git rebase`/checkout fallback; repo reflects target order |
295
295
296
296
### Tier 4: Integration (High-cost, realistic)
297
297
@@ -304,7 +304,7 @@ Run Claude Code against a real test repository with the latest `but` binary and
304
304
| Runs `but` binary | No | Yes — freshly built from source |
305
305
| Real git repo | No | Yes — disposable fixture |
306
306
| Command trace | From mock loop | From SDK hooks or output parsing |
307
-
| Asserts on repo state | No | Yes — `but status --json` after |
307
+
| Asserts on repo state | No | Yes — `but status --format json` after |
308
308
| Cost per scenario |~$0.02 |~$0.10-0.50 |
309
309
| Speed |~5 sec |~30-120 sec |
310
310
| Catches real bugs | Skill file only | Skill + CLI interaction |
@@ -361,7 +361,7 @@ Running the real Tier 4 harness surfaced a few practical issues that are not obv
361
361
- Fix: normalize fixture path with `pwd -P` in `setup-fixture.sh`.
362
362
363
363
4.**Keep fixture support files out of Git status.**
364
-
-`.but-data/` and installed `.claude/skills/` content polluted `but status --json` and changed CLI IDs.
364
+
-`.but-data/` and installed `.claude/skills/` content polluted `but status --format json` and changed CLI IDs.
365
365
- Fix: add `.but-data/`, `.claude/`, `.tmp/` to `.git/info/exclude` in each fixture.
366
366
367
367
5.**Fixture cleanup should be best-effort.**
@@ -398,7 +398,7 @@ For **Tier 3** (mock tool execution), Rust is viable since it just calls the Ant
398
398
1.**Keep Tier 4 as the default evaluator** for skill changes.
399
399
2.**Treat a 7-scenario Tier 4 smoke run (`--repeat 1`) as the PR gate** for changes under `crates/but/skill/`.
400
400
3.**Run repeated Tier 4 (`--repeat 3+`) nightly or pre-release** to catch stochastic regressions.
401
-
4.**Track the key Tier 4 metrics over time**: pass rate, git-command leakage rate, `--json` and `--status-after` compliance, and cost per scenario.
401
+
4.**Track the key Tier 4 metrics over time**: pass rate, git-command leakage rate, `--format json` and `--status-after` compliance, and cost per scenario.
0 commit comments