Skip to content

Commit 266aef4

Browse files
authored
Merge pull request #44 from Digital-Threads/feat/compaction-distiller-subagent
feat: in-session compaction distiller subagent + advisory (0.25.0)
2 parents 1917c82 + ef3af88 commit 266aef4

10 files changed

Lines changed: 181 additions & 27 deletions

File tree

CHANGELOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.25.0] - 2026-06-13
11+
12+
### Added
13+
- **In-session compaction distiller.** A new `task-journal-distiller` subagent
14+
(Haiku, `background: true`) reads a just-compacted conversation segment from
15+
the transcript file and backfills the decisions / rejections / findings that
16+
weren't logged yet for the active task — via the journal MCP, never closing a
17+
task. Because it runs as an in-session subagent it costs no separate `claude
18+
-p` call (~5k token overhead vs ~46k) and doesn't block the main chat. After a
19+
compaction, the `SessionStart` hook now adds a short advisory suggesting the
20+
main agent delegate the segment to it (the platform doesn't let a hook spawn a
21+
subagent, so this is advisory; the existing deterministic catch-up remains the
22+
guaranteed safety net). Disable the hint with `TJ_DISTILLER_HINT=0`.
23+
24+
### Changed
25+
- **Cheaper, honest `complete` stats.** One-shot `claude -p` calls now pass
26+
`--disallowed-tools` (we never use tools), keeping the built-in tool schemas
27+
out of the prompt and roughly halving the harness overhead. The stats line now
28+
leads with the real dollar cost for `claude -p` (whose token counts are muddy —
29+
a big prompt lands in `cache_creation`, not `input_tokens`) and shows clean
30+
token counts only for API backends; token sizes scale to `M`. When a
31+
cost-reporting backend is used, a one-line tip points at `--backend anthropic`
32+
(direct Haiku API, ~50× cheaper per task by skipping Claude Code's overhead)
33+
or `--backend ollama` (free, local).
34+
1035
## [0.24.0] - 2026-06-13
1136

1237
### Added

Cargo.lock

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ members = [
77
]
88

99
[workspace.package]
10-
version = "0.24.0"
10+
version = "0.25.0"
1111
edition = "2021"
1212
rust-version = "1.88"
1313
license = "MIT"

crates/tj-cli/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ default = ["embed"]
2323
embed = ["tj-core/embed"]
2424

2525
[dependencies]
26-
tj-core = { package = "task-journal-core", version = "0.24.0", path = "../tj-core", default-features = false }
26+
tj-core = { package = "task-journal-core", version = "0.25.0", path = "../tj-core", default-features = false }
2727
anyhow = { workspace = true }
2828
clap = { workspace = true }
2929
tracing = { workspace = true }

crates/tj-cli/src/main.rs

Lines changed: 72 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2140,6 +2140,25 @@ fn main() -> Result<()> {
21402140
bundle.push_str(&reminder);
21412141
bundle.push_str("\n\n");
21422142
}
2143+
// Advisory (the hook can't force it): suggest the main agent
2144+
// delegate the just-compacted segment to the in-session
2145+
// distiller subagent, which backfills missed reasoning from
2146+
// the transcript file (which survives compaction) for the
2147+
// active task(s). Background → never blocks. Gated off by
2148+
// TJ_DISTILLER_HINT=0 for users who don't want it.
2149+
if std::env::var("TJ_DISTILLER_HINT").as_deref() != Ok("0") {
2150+
let transcript_hint = payload
2151+
.get("transcript_path")
2152+
.and_then(|v| v.as_str())
2153+
.map(|p| format!(" (transcript: {p})"))
2154+
.unwrap_or_default();
2155+
bundle.push_str(&format!(
2156+
"[task-journal] A compaction just occurred. If decisions, rejections, \
2157+
or findings from before it are not yet in the journal for the active task(s) above, delegate to \
2158+
the `task-journal-distiller` subagent to capture them from the transcript{transcript_hint}. It \
2159+
runs in the background and won't block you; it only fills gaps and never closes tasks.\n\n"
2160+
));
2161+
}
21432162
}
21442163
for tc in &recent {
21452164
let pack = tj_core::pack::assemble(
@@ -4149,31 +4168,34 @@ fn compute_savings(
41494168
})
41504169
}
41514170

4152-
/// Format a token count compactly: 980 → "980", 3_240 → "3.2k", 88_000 → "88k".
4171+
/// Format a token count compactly: 980 → "980", 3_240 → "3.2k", 88_000 → "88k",
4172+
/// 2_760_000 → "2.8M".
41534173
fn fmt_tokens(n: u64) -> String {
41544174
if n < 1_000 {
41554175
n.to_string()
41564176
} else if n < 100_000 {
41574177
format!("{:.1}k", n as f64 / 1_000.0)
4158-
} else {
4178+
} else if n < 1_000_000 {
41594179
format!("{}k", n / 1_000)
4180+
} else {
4181+
format!("{:.1}M", n as f64 / 1_000_000.0)
41604182
}
41614183
}
41624184

41634185
/// Human spent/saved suffix for a finalize line, e.g.
41644186
/// " | spent 3.2k tok ($0.0012) · saved ~88k→1.5k tok (59×)".
41654187
fn stats_suffix(spent: &tj_core::llm::LlmUsage, saved: &Option<Savings>) -> String {
41664188
let mut parts = Vec::new();
4167-
if spent.total_tokens() > 0 {
4168-
let cost = match spent.cost_usd {
4169-
Some(c) if c > 0.0 => format!(" (${c:.4})"),
4170-
_ => String::new(),
4171-
};
4172-
parts.push(format!(
4173-
"spent {} tok{}",
4174-
fmt_tokens(spent.total_tokens()),
4175-
cost
4176-
));
4189+
// claude -p reports a (notional) dollar cost but muddy token counts — its
4190+
// big prompt lands in `cache_creation`, not `input_tokens` — so lead with
4191+
// the cost there. API backends report no cost but clean tokens, so show
4192+
// those instead.
4193+
match spent.cost_usd {
4194+
Some(c) if c > 0.0 => parts.push(format!("cost ${c:.4}")),
4195+
_ if spent.total_tokens() > 0 => {
4196+
parts.push(format!("spent {} tok", fmt_tokens(spent.total_tokens())))
4197+
}
4198+
_ => {}
41774199
}
41784200
if let Some(s) = saved {
41794201
if s.pack_tokens > 0 && s.raw_tokens > s.pack_tokens {
@@ -4395,6 +4417,21 @@ fn finalize_one_task(
43954417
Ok(out)
43964418
}
43974419

4420+
/// A one-line nudge shown when a cost-reporting backend (claude -p) was used:
4421+
/// the same Haiku via a direct API skips Claude Code's harness overhead. Only
4422+
/// claude -p reports a non-zero `cost_usd`, so this fires for it alone.
4423+
fn backend_cost_tip(cost: Option<f64>) -> Option<String> {
4424+
match cost {
4425+
Some(c) if c > 0.0 => Some(
4426+
"tip: that cost is claude -p's Claude Code overhead (notional under a \
4427+
subscription). For ~50× cheaper per task, use --backend anthropic (direct Haiku API, \
4428+
needs ANTHROPIC_API_KEY) — or --backend ollama for free, local."
4429+
.to_string(),
4430+
),
4431+
_ => None,
4432+
}
4433+
}
4434+
43984435
/// Human-readable one-liner for a finalize result.
43994436
fn print_finalize_outcome(task_id: &str, out: &FinalizeOutcome) {
44004437
if out.skipped_no_backend {
@@ -4458,6 +4495,9 @@ fn run_complete_single(
44584495
};
44594496
let out = finalize_one_task(&ctx, task_id, enrich, dry_run, backend)?;
44604497
print_finalize_outcome(task_id, &out);
4498+
if let Some(tip) = backend_cost_tip(out.spent.cost_usd) {
4499+
eprintln!("{tip}");
4500+
}
44614501
Ok(())
44624502
}
44634503

@@ -4604,6 +4644,9 @@ fn run_complete_batch(
46044644
totals.trim_start_matches(" | ")
46054645
);
46064646
}
4647+
if let Some(tip) = backend_cost_tip(total_spent.cost_usd) {
4648+
eprintln!("{tip}");
4649+
}
46074650

46084651
if !left_open.is_empty() {
46094652
println!("\nLeft open ({}):", left_open.len());
@@ -5682,10 +5725,26 @@ mod inline_tests {
56825725
pack_tokens: 1_500,
56835726
});
56845727
let s = stats_suffix(&spent, &saved);
5685-
assert!(s.contains("spent 1.5k tok ($0.0012)"), "{s}");
5728+
// Cost-reporting backend (claude -p) → lead with cost, not muddy tokens.
5729+
assert!(s.contains("cost $0.0012"), "{s}");
56865730
assert!(s.contains("saved ~90.0k→1.5k tok (60×)"), "{s}");
56875731
}
56885732

5733+
#[test]
5734+
fn stats_suffix_shows_tokens_for_costless_backend() {
5735+
// API backend reports clean tokens, no cost → show the token count.
5736+
let spent = tj_core::llm::LlmUsage {
5737+
input_tokens: 1800,
5738+
output_tokens: 200,
5739+
cost_usd: None,
5740+
};
5741+
assert_eq!(
5742+
stats_suffix(&spent, &None),
5743+
" | spent 2.0k tok",
5744+
"API backend should show tokens"
5745+
);
5746+
}
5747+
56895748
#[test]
56905749
fn stats_suffix_empty_when_nothing_to_report() {
56915750
let spent = tj_core::llm::LlmUsage::default();

crates/tj-cli/tests/cli.rs

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4516,6 +4516,10 @@ fn session_start_compact_prepends_active_task_reminder() {
45164516
ctx.contains("Must ship before Friday"),
45174517
"reminder must include the in-force constraint: {ctx}"
45184518
);
4519+
assert!(
4520+
ctx.contains("task-journal-distiller"),
4521+
"compact SessionStart must advise delegating to the distiller subagent: {ctx}"
4522+
);
45194523
}
45204524

45214525
#[test]
@@ -4525,6 +4529,10 @@ fn session_start_startup_has_no_reminder() {
45254529
!ctx.contains("[Active task after compaction]"),
45264530
"non-compact SessionStart must NOT inject the reminder: {ctx}"
45274531
);
4532+
assert!(
4533+
!ctx.contains("task-journal-distiller"),
4534+
"non-compact SessionStart must NOT advise the distiller: {ctx}"
4535+
);
45284536
}
45294537

45304538
/// Recursively collect file names under `dir` that match a predicate.
@@ -5621,7 +5629,7 @@ fn complete_retitles_and_closes_via_fake_backend() {
56215629
.args(["complete", &task_id])
56225630
.assert()
56235631
.success()
5624-
.stdout(contains("spent 1.5k tok ($0.0012)"))
5632+
.stdout(contains("cost $0.0012"))
56255633
.stdout(contains("retitled"))
56265634
.stdout(contains("closed"));
56275635

crates/tj-core/src/classifier/agent_sdk.rs

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -54,10 +54,21 @@ fn base_claude_command(model: &str) -> Command {
5454
.arg("--output-format")
5555
.arg("json")
5656
.arg("--strict-mcp-config")
57+
// We never use tools in these one-shot text calls — denying the
58+
// built-in toolset keeps their schemas out of the prompt, roughly
59+
// halving the harness overhead. (The cache-creation cost floor
60+
// remains; for true pennies use a direct API backend.)
61+
.arg("--disallowed-tools")
62+
.arg(DISABLED_TOOLS)
5763
.env(IN_CLASSIFIER_ENV, "1");
5864
cmd
5965
}
6066

67+
/// Built-in tools denied in our one-shot `claude -p` calls (we only want a text
68+
/// completion, never tool use). Listed explicitly because there is no wildcard.
69+
const DISABLED_TOOLS: &str = "Bash Read Edit Write Glob Grep Task WebFetch \
70+
WebSearch NotebookEdit TodoWrite BashOutput KillBash";
71+
6172
/// Production runner: invokes the local `claude` binary in print mode, pinned
6273
/// to the given model, asking for the JSON envelope and an isolated MCP config
6374
/// (`--strict-mcp-config` keeps the project's own MCP servers — including this
@@ -259,10 +270,6 @@ struct EnvelopeUsage {
259270
input_tokens: u64,
260271
#[serde(default)]
261272
output_tokens: u64,
262-
#[serde(default)]
263-
cache_creation_input_tokens: u64,
264-
#[serde(default)]
265-
cache_read_input_tokens: u64,
266273
}
267274

268275
impl Classifier for ClaudeCliClassifier {
@@ -307,8 +314,14 @@ pub fn run_claude_json_usage(
307314
}
308315
let u = envelope.usage.unwrap_or_default();
309316
let usage = crate::llm::LlmUsage {
310-
// Count cache reads/writes as input so the total reflects real context.
311-
input_tokens: u.input_tokens + u.cache_creation_input_tokens + u.cache_read_input_tokens,
317+
// Only our *fresh* prompt tokens — NOT the cached Claude Code system
318+
// prompt + tool schemas (cache_read/creation), which are harness
319+
// overhead, not work the user asked for. The dollar `cost` below still
320+
// reflects everything (claude computes it with the cache discount), so
321+
// a small token count next to a few-cents cost is the honest signal
322+
// that claude -p's overhead dominates — switch to a direct API backend
323+
// to avoid it.
324+
input_tokens: u.input_tokens,
312325
output_tokens: u.output_tokens,
313326
cost_usd: envelope.total_cost_usd,
314327
};

crates/tj-mcp/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ path = "src/main.rs"
1717

1818
[dependencies]
1919
# Lean: the MCP server doesn't embed yet, so it skips the model2vec backend.
20-
tj-core = { package = "task-journal-core", version = "0.24.0", path = "../tj-core", default-features = false }
20+
tj-core = { package = "task-journal-core", version = "0.25.0", path = "../tj-core", default-features = false }
2121
anyhow = { workspace = true }
2222
tokio = { workspace = true }
2323
tracing = { workspace = true }

plugin/.claude-plugin/plugin.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "task-journal",
3-
"version": "0.24.0",
3+
"version": "0.25.0",
44
"description": "Append-only journal of AI-coding task reasoning chains: hypotheses, decisions, rejections, evidence. Renders compact resume packs so an agent can pick up a 2-week-old task with full context.",
55
"author": {
66
"name": "Mher Shahinyan"
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
name: task-journal-distiller
3+
description: Distills a conversation segment into task-journal memory. Use when a compaction just happened (or is about to), or when asked to "capture what we just did" — it reads the segment from the transcript, finds the decisions / rejections / findings that were NOT yet logged for the active task, and records them via the task-journal MCP. Runs in the background so it never blocks the main chat. Does NOT close tasks.
4+
model: haiku
5+
background: true
6+
tools: Read, Bash, Grep, Glob, mcp__plugin_task-journal_task-journal__task_search, mcp__plugin_task-journal_task-journal__task_pack, mcp__plugin_task-journal_task-journal__event_add
7+
---
8+
9+
You are the **task-journal distiller**. A segment of a coding conversation is
10+
about to be (or has just been) compacted away. Your one job: make sure the
11+
**reasoning** from that segment is preserved in the task journal as typed
12+
events, so nothing is lost and the task does not later look "interrupted".
13+
14+
You are dispatched with: the active **task id(s)**, the **transcript path**
15+
(a JSONL file), and optionally a **boundary timestamp** (the start of the
16+
segment — usually the task's last recorded event, or the previous compaction).
17+
18+
## Procedure
19+
20+
1. **Know what's already recorded.** For the task, call
21+
`task_pack` (or `task_search`) and read its existing events. You will NOT
22+
re-record anything already represented there.
23+
2. **Read the segment.** Read the transcript JSONL file (use `Read`; for large
24+
files read the tail or grep for the boundary timestamp and read forward).
25+
Focus on the assistant/user turns AFTER the boundary timestamp.
26+
3. **Extract only SIGNIFICANT, NOT-yet-logged reasoning** for the task:
27+
- `decision` — a committed choice. Pass `alternatives` (the options weighed).
28+
- `rejection` — an approach ruled out, and why.
29+
- `finding` — a fact verified from code/logs (cite file:line, ids, names).
30+
- `evidence` — a test/benchmark that proved something.
31+
- `constraint` — an external limit discovered.
32+
Skip chatter, restated tool output, greetings, and anything already in the
33+
existing events. When in doubt, leave it out — precision over volume.
34+
4. **Record** each via `event_add(task_id, event_type, text, ...)`. Write in the
35+
user's language, terse and specific. Append-only — never edit.
36+
37+
## Hard rules
38+
39+
- **Never close** a task and **never** mark it done — you only fill gaps.
40+
- **Never create** a new task unless the segment clearly pursued a *distinct*
41+
objective with no matching open task; prefer attaching to the given task id.
42+
- **De-dupe ruthlessly** — if the substance is already an event, skip it.
43+
- If the transcript is unreadable or the segment holds nothing new, do nothing
44+
and say so. Doing nothing is a valid, correct outcome.
45+
46+
## Output
47+
48+
One terse line: `distilled <N> event(s) into <task_id>: <comma-separated types>`
49+
(or `nothing new to record`). The main agent only needs this summary back.

0 commit comments

Comments
 (0)