feat(telemetry,ui): reasoning span attribute + collapsed thought summary (Phase 1 of #162)#165
Conversation
* feat(core): preserve task plan state in compaction summaries When context compaction fires, the agent loses awareness of its task plan (completed, in-progress, pending work) and may re-plan already-done tasks. Add extractTaskPlanSummary() that queries the TaskStore and produces a structured <task-plan> XML section with status markers ([x], [~], [ ], [-], [!]), priority labels, and parent-child indentation. Extend compactMessages() to accept an optional taskStore and append the plan to the compaction summary. Wire the TaskStore into agent-core at the compaction call site. Backward compatible: existing callers without taskStore remain unaffected. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix: add error handling and recursive nesting to compaction task plan Address PR feedback from CodeRabbit: - Wrap extractTaskPlanSummary call in try/catch so TaskStore failures don't break compaction - Replace flat 2-level subtask rendering with recursive renderTask() that supports arbitrary nesting depth - Add tests for multi-level nesting and error fallback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…houghts post-stream Phase 1 of the reasoning coordination tracked in #162. Captures delta.reasoning_content / delta.reasoning across stream chunks and surfaces it as gen_ai.response.thinking on the gen_ai chat span (gated on logPrompts, matching the completion event policy). Always emits gen_ai.usage.thinking_tokens when usage exposes it. Non-streaming responses get the same treatment by inspecting {thought:true} parts on the response — and the completion event no longer double-counts thoughts as content. Renders gemini_thought items as a compact "▸ thinking (N chars)" summary once the stream finalizes (live streaming render unchanged). Full text remains in ChatRecord, ACP agent_thought_chunk notifications, and Langfuse for downstream investigation. An in-TUI expand affordance is a follow-up. Once homelab-iac#31 (EMIT_REASONING_CONTENT) flips on, this also covers vLLM-served models that previously lost their <think> blocks at the gateway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WalkthroughThis pull request extends streaming support with explicit reasoning/thinking telemetry tracking and introduces task-plan state preservation during context compaction. Changes span pipeline telemetry (capturing and emitting Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/core/src/core/openaiContentGenerator/pipeline.ts (1)
334-345: ⚡ Quick winDeduplicate telemetry truncation policy into a shared helper.
10_000and...[truncated]are repeated across streaming and non-streaming paths. Centralizing this reduces drift risk when policy changes.♻️ Suggested refactor
+const TELEMETRY_TEXT_LIMIT = 10_000; +const TELEMETRY_TRUNCATION_SUFFIX = '...[truncated]'; + +function truncateTelemetryText(text: string): string { + return text.length > TELEMETRY_TEXT_LIMIT + ? text.slice(0, TELEMETRY_TEXT_LIMIT) + TELEMETRY_TRUNCATION_SUFFIX + : text; +} ... - context.span.setAttribute( - 'gen_ai.response.thinking', - reasoningText.length > 10_000 - ? reasoningText.slice(0, 10_000) + '...[truncated]' - : reasoningText, - ); + context.span.setAttribute( + 'gen_ai.response.thinking', + truncateTelemetryText(reasoningText), + ); ... - 'gen_ai.completion': - responseText.length > 10_000 - ? responseText.slice(0, 10_000) + '...[truncated]' - : responseText, + 'gen_ai.completion': truncateTelemetryText(responseText), ... - span.setAttribute( - 'gen_ai.response.thinking', - reasoningText.length > 10_000 - ? reasoningText.slice(0, 10_000) + '...[truncated]' - : reasoningText, - ); + span.setAttribute( + 'gen_ai.response.thinking', + truncateTelemetryText(reasoningText), + );Also applies to: 738-770
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/core/openaiContentGenerator/pipeline.ts` around lines 334 - 345, Centralize the telemetry truncation policy by adding a single helper (e.g., truncateForTelemetry or truncateTelemetryText) that takes a string, enforces the 10_000-char limit and appends '...[truncated]' when needed, then replace the inline logic in the non-streaming path that builds reasoningText and calls context.span.setAttribute('gen_ai.response.thinking', ...) (uses reasoningParts) and the equivalent streaming-path code (the block referenced around the other occurrence) to call this helper instead of duplicating 10_000 and '...[truncated]'; ensure the helper is exported/visible to both code paths and keep behavior identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/agents/runtime/agent-core.ts`:
- Around line 474-481: The current early-return when estimateTokens(masked) <=
targetTokens skips the `<task-plan>` enrichment that compactMessages(...)
applies; instead of returning masked directly, always invoke
compactMessages(masked, targetTokens, { taskStore:
this.runtimeContext.getTaskStore?.() }) and assign compacted to its result
(await if it returns a Promise) so the task-plan summary enrichment runs even
when masking already meets the token target; use the existing variables masked,
targetTokens, taskStore and preserve the async handling used in the else branch.
In `@packages/core/src/agents/runtime/compaction.ts`:
- Around line 58-85: The orphan pass currently emits only a flat row and loses
orphaned subtrees; update extractTaskPlanSummary to treat orphan nodes as
additional roots and recursively render their children by calling the existing
renderTask for each orphan root (use the same indent and append " (orphan)" to
the root line), and modify renderTask to accept or use a visited Set to detect
and stop cycles (self-parented/cyclic tasks) to avoid infinite recursion;
reference renderTask, childrenMap, taskMap, rootTasks, and tasks when locating
where to add the orphan-root recursion and the visited-cycle guard.
---
Nitpick comments:
In `@packages/core/src/core/openaiContentGenerator/pipeline.ts`:
- Around line 334-345: Centralize the telemetry truncation policy by adding a
single helper (e.g., truncateForTelemetry or truncateTelemetryText) that takes a
string, enforces the 10_000-char limit and appends '...[truncated]' when needed,
then replace the inline logic in the non-streaming path that builds
reasoningText and calls context.span.setAttribute('gen_ai.response.thinking',
...) (uses reasoningParts) and the equivalent streaming-path code (the block
referenced around the other occurrence) to call this helper instead of
duplicating 10_000 and '...[truncated]'; ensure the helper is exported/visible
to both code paths and keep behavior identical.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e3d1f9ba-54c8-4296-863d-e04c47a23387
📒 Files selected for processing (8)
docs/superpowers/plans/2026-04-30-p4-compaction-todo-preservation.mdpackages/cli/src/ui/components/messages/ConversationMessages.test.tsxpackages/cli/src/ui/components/messages/ConversationMessages.tsxpackages/core/src/agents/runtime/agent-core.tspackages/core/src/agents/runtime/compaction.test.tspackages/core/src/agents/runtime/compaction.tspackages/core/src/core/openaiContentGenerator/pipeline.thinking.test.tspackages/core/src/core/openaiContentGenerator/pipeline.ts
| let compacted: Content[]; | ||
| if (estimateTokens(masked) <= targetTokens) { | ||
| compacted = masked; | ||
| } else { | ||
| const taskStore = this.runtimeContext.getTaskStore?.(); | ||
| const result = compactMessages(masked, targetTokens, { taskStore }); | ||
| compacted = result instanceof Promise ? await result : result; | ||
| } |
There was a problem hiding this comment.
Keep the task-plan summary on the masking-only path.
When observation masking already gets masked under targetTokens, this branch returns masked directly and never calls compactMessages(...), so the new <task-plan> enrichment is skipped on those compaction passes. That defeats the preservation behavior in a common case.
Possible fix
- let compacted: Content[];
- if (estimateTokens(masked) <= targetTokens) {
- compacted = masked;
- } else {
- const taskStore = this.runtimeContext.getTaskStore?.();
- const result = compactMessages(masked, targetTokens, { taskStore });
- compacted = result instanceof Promise ? await result : result;
- }
+ const taskStore = this.runtimeContext.getTaskStore?.();
+ const shouldCompact =
+ estimateTokens(masked) > targetTokens || !!taskStore;
+ const compacted = shouldCompact
+ ? await compactMessages(masked, targetTokens, { taskStore })
+ : masked;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/agents/runtime/agent-core.ts` around lines 474 - 481, The
current early-return when estimateTokens(masked) <= targetTokens skips the
`<task-plan>` enrichment that compactMessages(...) applies; instead of returning
masked directly, always invoke compactMessages(masked, targetTokens, {
taskStore: this.runtimeContext.getTaskStore?.() }) and assign compacted to its
result (await if it returns a Promise) so the task-plan summary enrichment runs
even when masking already meets the token target; use the existing variables
masked, targetTokens, taskStore and preserve the async handling used in the else
branch.
| function renderTask(task: (typeof tasks)[0], indent: string) { | ||
| const marker = STATUS_MARKERS[task.status] ?? '[ ]'; | ||
| const priority = task.priority ? ` (${task.priority})` : ''; | ||
| lines.push(`${indent}${marker} ${task.title}${priority}`); | ||
|
|
||
| const children = childrenMap.get(task.id) ?? []; | ||
| for (const child of children) { | ||
| renderTask(child, indent + ' '); | ||
| } | ||
| } | ||
|
|
||
| // Render root tasks and their recursive children | ||
| for (const task of rootTasks) { | ||
| renderTask(task, ' '); | ||
| } | ||
|
|
||
| // Handle orphan subtasks (parent ID references a task not in the store) | ||
| const allIds = new Set(taskMap.keys()); | ||
| for (const task of tasks) { | ||
| if (task.parentTaskId && !allIds.has(task.parentTaskId)) { | ||
| const marker = STATUS_MARKERS[task.status] ?? '[ ]'; | ||
| const priority = task.priority ? ` (${task.priority})` : ''; | ||
| lines.push(` ${marker} ${task.title}${priority} (orphan)`); | ||
| } | ||
| } | ||
|
|
||
| lines.push('</task-plan>'); | ||
| return lines.join('\n'); |
There was a problem hiding this comment.
Don't drop orphaned task branches.
extractTaskPlanSummary() only recurses from root tasks, and the orphan pass emits a single flat row. That means self-parented/cyclic tasks or any orphan subtree with nested children disappear from the compacted summary, which undermines the state-preservation goal here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/agents/runtime/compaction.ts` around lines 58 - 85, The
orphan pass currently emits only a flat row and loses orphaned subtrees; update
extractTaskPlanSummary to treat orphan nodes as additional roots and recursively
render their children by calling the existing renderTask for each orphan root
(use the same indent and append " (orphan)" to the root line), and modify
renderTask to accept or use a visited Set to detect and stop cycles
(self-parented/cyclic tasks) to avoid infinite recursion; reference renderTask,
childrenMap, taskMap, rootTasks, and tasks when locating where to add the
orphan-root recursion and the visited-cycle guard.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…r nuke) — bumps to v0.28.0 (#169) * fix(core): preserve tool history when building no-tools requests /recap (and any other caller without tools, e.g. /btw) was sending an empty conversation to the model. The no-tools branch in pipeline buildRequest dropped every assistant turn with tool_calls and every tool-role message wholesale, so in tool-heavy sessions the recap saw only bare user prompts and hallucinated context. - generateRecap now passes tools: [] so the strip path doesn't fire, matching cc-2.18's awaySummary pattern. - pipeline.ts no-tools branch now flattens instead of dropping: keeps assistant prose content and removes only the tool_calls field; tool results become [tool result] assistant notes truncated at 2000 chars. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): release.yml now fires on auto-release/v* PRs (#160) auto-release.yml opens version-bump PRs from `auto-release/v*` branches into main, but release.yml's job gate only matched `head.ref == 'dev'`. Result: every auto-release PR was merging cleanly but skipping publish (v0.26.25 had to be dispatched manually). This adds the auto-release/* prefix to the gate and refreshes the stale top-of-file comment. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: release v0.26.26 (#161) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * feat(telemetry,ui): reasoning span attribute + collapsed thought summary (Phase 1 of #162) (#165) * feat(core): preserve task plan state in compaction summaries (#163) * feat(core): preserve task plan state in compaction summaries When context compaction fires, the agent loses awareness of its task plan (completed, in-progress, pending work) and may re-plan already-done tasks. Add extractTaskPlanSummary() that queries the TaskStore and produces a structured <task-plan> XML section with status markers ([x], [~], [ ], [-], [!]), priority labels, and parent-child indentation. Extend compactMessages() to accept an optional taskStore and append the plan to the compaction summary. Wire the TaskStore into agent-core at the compaction call site. Backward compatible: existing callers without taskStore remain unaffected. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix: add error handling and recursive nesting to compaction task plan Address PR feedback from CodeRabbit: - Wrap extractTaskPlanSummary call in try/catch so TaskStore failures don't break compaction - Replace flat 2-level subtask rendering with recursive renderTask() that supports arbitrary nesting depth - Add tests for multi-level nesting and error fallback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * feat(telemetry,ui): capture reasoning on Langfuse span and collapse thoughts post-stream Phase 1 of the reasoning coordination tracked in #162. Captures delta.reasoning_content / delta.reasoning across stream chunks and surfaces it as gen_ai.response.thinking on the gen_ai chat span (gated on logPrompts, matching the completion event policy). Always emits gen_ai.usage.thinking_tokens when usage exposes it. Non-streaming responses get the same treatment by inspecting {thought:true} parts on the response — and the completion event no longer double-counts thoughts as content. Renders gemini_thought items as a compact "▸ thinking (N chars)" summary once the stream finalizes (live streaming render unchanged). Full text remains in ChatRecord, ACP agent_thought_chunk notifications, and Langfuse for downstream investigation. An in-TUI expand affordance is a follow-up. Once homelab-iac#31 (EMIT_REASONING_CONTENT) flips on, this also covers vLLM-served models that previously lost their <think> blocks at the gateway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: release v0.26.27 (#166) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * chore(telemetry): rebrand to proto-cli, nuke qwen-logger Alibaba RUM ping (#167) * chore(telemetry): rebrand qwen-code identifiers to proto-cli Aligns telemetry / public-facing identifiers with the actual product name. Verified against the Langfuse instance: new spans land with service.name=proto-cli on scope=proto.openai-pipeline; existing proto.* tracers (proto.llm, proto.turn, proto.tools, proto.harness, etc.) were already correct. Changes: - SERVICE_NAME: qwen-code → proto-cli (resource attribute, the marquee label in Langfuse's service column) - All EVENT_* constants: qwen-code.* → proto.* (matches the existing proto.harness.* convention already in this file) - pipeline.ts tracer: qwen-code.openai-pipeline → proto.openai-pipeline (one straggler vs. the 9 other proto.* tracers in core/) - types.ts event.name literals (PromptSuggestion, Speculation): qwen-code.* → proto.* - acpAgent.ts agentInfo.name: qwen-code → proto-cli (visible to ACP clients like Zed when listing agents) - marketplace.ts User-Agent: qwen-code → proto-cli (extension fetch identifier sent to api.github.com / raw.githubusercontent.com) Out of scope (deliberately): - packages/core/src/telemetry/qwen-logger/* — separate analytics ping to gb4w8c3ygj-default-sea.rum.aliyuncs.com (Alibaba RUM, the upstream Qwen team's endpoint). Should be disabled rather than rebranded; tracking separately. - DEFAULT_SERVICE_NAME='qwen-code-oauth' in mcp/token-storage — renaming would orphan existing keychain entries. - Misc qwen-code-* file paths, tmp dir names, sandbox image tag, test fixtures — not telemetry / not user-visible labels. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(telemetry): remove qwen-logger Alibaba RUM ping; keep useful events on Langfuse The qwen-logger system shipped usage telemetry to a fixed Alibaba RUM endpoint (gb4w8c3ygj-default-sea.rum.aliyuncs.com) — the upstream Qwen Code team's analytics pipeline. We don't operate that endpoint, the data isn't visible to us, and it labelled traffic as qwen-code-cli / qwen-code@${version}. Confirmed unused on our deployment; nuking. What's removed: - packages/core/src/telemetry/qwen-logger/ (entire directory: logger, event-types, tests) - packages/core/src/telemetry/integration.test.circular.ts (was a qwen-logger-specific circular-reference proxy-agent test, no longer applicable) - ~30 QwenLogger.getInstance(config)?.logXxxEvent(event) callsites in loggers.ts - QwenLogger exports from telemetry/index.ts and core/index.ts - QwenLogger spies and assertions in config.test.ts and the describe('logHookCall', ...) block in loggers.test.ts that was exclusively QwenLogger-shaped What's kept and rerouted to OTel/Langfuse: - HookCallEvent type and the logHookCall function — hook execution data is genuinely useful telemetry (which hook fired, success, duration, exit code, captured stdout/stderr, error). Now emits a proto.hook_call OTel log record via logs.getLogger(SERVICE_NAME) instead of the Alibaba ping. Existing call site in hookEventHandler.ts:619 still fires per hook execution. - LoopDetectionDisabledEvent likewise: was an empty no-op after the qwen-logger pull; rerouted to a proto.loop_detection_disabled OTel log record so the signal still reaches Langfuse. - New tests in loggers.test.ts assert OTel emission shape for logHookCall (success, error, sdk-not-initialized branches). Renamed (per "all not used" — no existing keychain entries to invalidate): - DEFAULT_SERVICE_NAME 'qwen-code-oauth' → 'proto-cli-oauth' - FORCE_ENCRYPTED_FILE_ENV_VAR 'QWEN_CODE_…' → 'PROTO_CLI_…' - file-token-storage encryption salt prefix and scrypt key seed switched to proto-cli; only invalidates non-existent tokens Verified live: kimi-k2.6 turn through the rebuilt CLI lands a Langfuse trace with service=proto-cli, scope=proto.openai-pipeline, gen_ai.response.thinking present. No outbound traffic to aliyuncs.com. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: release v0.26.28 (#168) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Summary
Phase 1 of the reasoning coordination tracked in #162. Surfaces the model's reasoning/thinking text on the
gen_ai chat ${model}Langfuse span and renders a compact summary post-stream so the chat UI stops dumping long thought blocks above each answer.gen_ai.response.thinking— accumulated fromdelta.reasoning_content/delta.reasoningacross stream chunks (and from{thought:true}parts on non-streaming responses), truncated to 10K chars to match the existinggen_ai.completionpolicy. Gated onlogPromptssince reasoning may carry user data references.gen_ai.usage.thinking_tokens— numeric, always emitted when usage exposesthoughtsTokenCount.gen_ai.content.completionevent no longer includes thought parts (they were silently mixed in before). Reasoning lives on its own attribute now.ThinkMessagerenders▸ thinking (N chars)intheme.text.secondaryonceisPending=false. Live streaming render is unchanged so users still see the model thinking. Continuation chunks (gemini_thought_content) render nothing post-stream — the parent ThinkMessage owns the summary.ACP delivery is untouched.
MessageEmitter.emitAgentThoughtcontinues to streamagent_thought_chunknotifications token-by-token, so Zed (and any other ACP client) keeps rendering thinking natively. 120 ACP tests pass.Phase 0 (gateway side) dependency
homelab-iac#31 adds the
EMIT_REASONING_CONTENTflag (default off) so the LiteLLM gateway re-emits stripped<think>text viadelta.reasoning_contentin-stream. Once that flag flips, vLLM-served models (QwQ etc.) that previously lost thinking entirely at the gateway start landing on this span and in this UI. Coordination thread: #162.Test plan
packages/core/.../pipeline.thinking.test.ts— 5 new tests: streaming reasoning_content, fallback toreasoning, logPrompts gating, 10K truncation, non-streaming{thought:true}extractionpackages/cli/.../ConversationMessages.test.tsx— 5 new tests: streaming render, post-stream summary, thousands separator, continuation visibility under both statespipeline.test.ts(24) and ACP suite (120) still passpackages/cli/src/uisuite (2108 tests) andpackages/core/src/core/openaiContentGeneratorsuite (302 tests) passnpm run typecheckcleannpm run lintcleanEMIT_REASONING_CONTENT=true— verify Langfuse trace showsgen_ai.response.thinkingfor a vLLM model, and TUI shows▸ thinking (N chars)after the answerKnown caveats (deferred to follow-up)
gemini_thought+gemini_thought_contentitems, the displayed char count counts only the first chunk. True total requires post-finalize coalescing inuseGeminiStream. Comment inConversationMessages.tsxflags this./thinkslash command or keyboard binding is a sensible next step.Closes none yet — keeps #162 open until Phase 2 (full Option B for gateway internal spans).
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes