fix(agent-workspace): recover title hits across scopes

Jacobinwwey · Jacobinwwey · commit 3067cd81e3ba · 2026-06-06T17:15:25.000+08:00
Add document-only planner scope recovery when a scoped query returns no evidence but a title-like knowledge document exists outside the active corpus. Surface the active scope and recovered source in the Knowledge Workspace API status strip, and cover the behavior with backend and frontend regressions.
diff --git a/docs/diataxis/en/explanation/development-progress-dashboard.md b/docs/diataxis/en/explanation/development-progress-dashboard.md
@@ -3,6 +3,30 @@
 This page is the implementation-facing dashboard for the Knowledge Mastery evolution plan.
 It tracks what is already implemented, where the hard gaps remain, and how to verify progress from code and runtime behavior.
 
+## 2026-06-06 Active-Scope Miss Recovery and Document-Augmented RAG Patch
+
+This patch resolves the live "what is water glass?" failure that reproduced while the WebView was already running on `npm run tauri:dev:mini:gpu`.
+
+Runtime probes showed that the current sidecar could answer correctly when called with an explicit `waterglass` scope: it returned one grouped knowledge point, eight citations, and `matchedSpans`.
+The WebView, however, had `folder-select=financial`, `localStorage.nc_last_target=financial`, and `window.__NC_ACTIVE_SOURCE_TARGET.scope.sourcePathPrefixes=["Knowledge_Base/financial"]`.
+The user question was therefore sent as a scoped financial query. The existing planner found the global title-like `water glass` document, but then intersected that document id with the explicit financial workspace/corpus/prefix scope, reducing the retrieval candidate set to zero indexed atoms.
+
+Code-vs-plan reconciliation for this patch:
+
+| Requirement | Current implementation evidence | Progress call |
+|---|---|---|
+| Positive answer when the selected scope misses but the query clearly names another knowledge point | `buildQueryBackendContext()` now distinguishes title hits inside the requested scope from title hits outside it. If an explicit scope has no compatible title hit but a document title/alias hit exists elsewhere, retrieval switches to a document-only `planner_scope_recovery` scope instead of intersecting incompatible corpus constraints. | Implemented |
+| Return results by knowledge point, not duplicated sections | The prior document-level conversation grouping remains intact. The recovery query still returns segment-level evidence internally, then `mergeAgentConversationKnowledgePoints()` groups hits by `documentId` and exposes `matchedSpans` inside the single knowledge-point card. | Implemented |
+| RSE + document augmentation direction | The implementation keeps Relevant Segment Extraction behavior at retrieval time while adding document augmentation at planning time: title-like queries can recover the target document, and section hits inside that document become marked evidence spans rather than duplicated cards. | Operational baseline |
+| User-visible diagnosis of scope behavior | The Knowledge Workspace API status strip now includes the active scope label and, when recovery is used, the recovered source path. This directly exposes cases such as "Scope: financial" plus "Recovered: Knowledge_Base/waterglass/water glass.md". | Implemented |
+| Backward compatibility | Public response fields remain additive. Existing `assistantMessage`, `answer`, `assistantBlocks`, citations, and legacy sync/SSE flows remain supported. `scopeSource` gains a new optional value, `planner_scope_recovery`, without removing existing values. | Preserved |
+
+Verification for this patch:
+
+- Red/green backend regression: `KnowledgeLearningPlatform.test.ts` now covers `financial` active scope plus a `water glass` title-like query recovering the `waterglass` document and returning one grouped knowledge point with multiple matched spans.
+- Red/green frontend regression: `agent_workspace.frontend.test.ts` now covers status-strip scope and recovered-source visibility.
+- Live root-cause evidence: CDP showed the running WebView was scoped to `financial`; direct sidecar probing with `waterglass` scope returned grouped evidence correctly.
+
 ## 2026-06-06 Knowledge Workspace RAG Answering and API Observability Slice
 
 This update closes a practical Knowledge Workspace gap observed while `npm run tauri:dev:mini:gpu` was already running: the live sidecar could retrieve scoped `waterglass` evidence after hydration, but the user-facing answer still used the old "strongest scoped match" template and returned repeated section-level cards from the same knowledge point.
diff --git a/docs/diataxis/zh/explanation/development-progress-dashboard.md b/docs/diataxis/zh/explanation/development-progress-dashboard.md
@@ -3,6 +3,30 @@
 本页是“知识彻底掌握演进方案”的实现侧进度看板。
 它用于回答三件事：哪些能力已落地、哪些关键缺口仍在、如何用代码与运行时证据验证推进结果。
 
+## 2026-06-06 active scope miss recovery 与 document-augmented RAG 修复
+
+本次补丁修复了 WebView 已在 `npm run tauri:dev:mini:gpu` 中运行时复现的 “what is water glass?” 失败。
+
+运行时探针显示：当前 sidecar 如果显式使用 `waterglass` scope 调用，会正确返回 1 个按知识点合并后的结果、8 条引用以及 `matchedSpans`。
+但 WebView 当前状态是 `folder-select=financial`、`localStorage.nc_last_target=financial`，并且 `window.__NC_ACTIVE_SOURCE_TARGET.scope.sourcePathPrefixes=["Knowledge_Base/financial"]`。
+因此用户问题实际上被发送成了 financial 限定范围内的 scoped query。旧 planner 虽然能在全局找到 `water glass` 的 title-like 文档命中，但随后把该 document id 与显式 financial workspace/corpus/prefix scope 做交集，最终把候选集压成 0 个 indexed atoms。
+
+本补丁的代码 / 方案对齐结果：
+
+| 要求 | 当前实现证据 | 进度判断 |
+|---|---|---|
+| 当前 scope 未命中但问题明确指向另一个知识点时仍能正面回答 | `buildQueryBackendContext()` 现在会区分 title hit 是否落在请求 scope 内。如果显式 scope 内没有兼容 title hit，但其他位置存在明确文档标题 / 别名命中，检索会切换到 document-only 的 `planner_scope_recovery` scope，而不是继续相交不兼容的 corpus 约束。 | 已实现 |
+| 按知识点返回，而不是重复返回 section | 之前的 document-level conversation grouping 继续保留。recovery query 内部仍保留 segment-level evidence，然后由 `mergeAgentConversationKnowledgePoints()` 按 `documentId` 合并，并把命中的 section 作为单一知识点卡片内的 `matchedSpans` 展示。 | 已实现 |
+| RSE + document augmentation 推进方向 | 当前实现把 Relevant Segment Extraction 留在检索阶段，同时在 planning 阶段加入 document augmentation：title-like query 可以恢复目标文档，文档内 section 命中会成为标注证据片段，而不是重复卡片。 | Operational baseline |
+| 用户可见 scope 诊断 | Knowledge Workspace API 状态条现在会显示 active scope；如果触发 recovery，还会显示恢复到的 source path。用户可以直接看到类似 “Scope: financial” 与 “Recovered: Knowledge_Base/waterglass/water glass.md” 的状态。 | 已实现 |
+| 向前兼容 | 公共响应字段只做加法。既有 `assistantMessage`、`answer`、`assistantBlocks`、citations、legacy sync/SSE 流程都继续保留。`scopeSource` 仅新增可选值 `planner_scope_recovery`，不删除旧值。 | 已保留 |
+
+本补丁验证：
+
+- Red/green 后端回归：`KnowledgeLearningPlatform.test.ts` 现在覆盖 active scope 为 `financial`、title-like query 为 `water glass` 时恢复到 `waterglass` 文档，并返回 1 个包含多个 matched spans 的合并知识点。
+- Red/green 前端回归：`agent_workspace.frontend.test.ts` 现在覆盖状态条中的 active scope 与 recovered source 可见性。
+- 运行时根因证据：CDP 显示当前 WebView scope 是 `financial`；直接用 `waterglass` scope 探针调用 sidecar 时，后端已能正确返回分组证据。
+
 ## 2026-06-06 知识工作区 RAG 回答与 API 可观测性切片
 
 本次更新修复的是 `npm run tauri:dev:mini:gpu` 已经运行时暴露出的实际知识工作区问题：运行中的 sidecar 在完成 hydration 后已经可以召回 `waterglass` 作用域证据，但用户可见回答仍使用旧的 “strongest scoped match” 模板，并且会把同一知识点文档内的多个 section 命中渲染成重复知识点卡片。
diff --git a/src-tauri/bin/server-x86_64-pc-windows-msvc.exe b/src-tauri/bin/server-x86_64-pc-windows-msvc.exe
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:714f60fd700e5df870a16d0bb28c2e5d90a3bc429523c013fba49a69e5aadcb4
-size 77472205
+oid sha256:bd24633a79b1ae330d2e0c3384f75f622d57dcb9f761316791534523fc7d9ea6
+size 77471155
diff --git a/src/agent_workspace.frontend.test.ts b/src/agent_workspace.frontend.test.ts
@@ -3001,6 +3001,15 @@ describe('agent workspace learning-path integration', () => {
         if (!fetchMock) {
             throw new Error('expected fetch mock');
         }
+        (window as any).__NC_ACTIVE_SOURCE_TARGET = {
+            target: 'financial',
+            source: 'test',
+            scope: {
+                workspaceId: 'financial',
+                corpusId: 'financial',
+                sourcePathPrefixes: ['Knowledge_Base/financial'],
+            },
+        };
 
         fetchMock.mockImplementationOnce(async () => createSseResponse([
             {
@@ -3041,6 +3050,27 @@ describe('agent workspace learning-path integration', () => {
                             recalledMemoryCount: 0,
                             queryEvidenceCoverageRatioPct: 100,
                         },
+                        trace: {
+                            usedScope: {
+                                source: 'scoped',
+                                workspaceId: null,
+                                corpusId: null,
+                                documentIds: ['doc_status'],
+                                atomIds: [],
+                                sourcePathPrefixes: [],
+                                languages: [],
+                                matchedAtomCount: 1,
+                                scopeSource: 'planner_scope_recovery',
+                            },
+                            retrieval: {
+                                retrievalModes: ['keyword', 'planner_scope_recovery'],
+                                scopeRecovery: {
+                                    reason: 'title_like_document_hit_outside_requested_scope',
+                                    recoveredDocumentIds: ['doc_status'],
+                                    recoveredSourcePaths: ['Knowledge_Base/waterglass/water glass.md'],
+                                },
+                            },
+                        },
                     },
                 },
             },
@@ -3058,6 +3088,8 @@ describe('agent workspace learning-path integration', () => {
         expect(statusText).toContain('SSE');
         expect(statusText).toContain('1 knowledge point');
         expect(statusText).toContain('1 citation');
+        expect(statusText).toContain('Scope: financial');
+        expect(statusText).toContain('Recovered: Knowledge_Base/waterglass/water glass.md');
         expect(statusText).toMatch(/\d+ ms/);
     });
 
diff --git a/src/frontend/agent_workspace.js b/src/frontend/agent_workspace.js
@@ -634,8 +634,19 @@
             ? Math.max(0, Math.round(Number(status.latencyMs)))
             : null;
         const error = String(status && status.error || '').trim();
+        const activeTarget = String(status && status.activeTarget || '').trim();
         const result = status && typeof status.result === 'object' ? status.result : null;
         const summary = result && typeof result.summary === 'object' ? result.summary : {};
+        const trace = result && typeof result.trace === 'object' ? result.trace : {};
+        const retrievalTrace = trace && typeof trace.retrieval === 'object' ? trace.retrieval : {};
+        const scopeRecovery = retrievalTrace && typeof retrievalTrace.scopeRecovery === 'object'
+            ? retrievalTrace.scopeRecovery
+            : null;
+        const recoveredSourcePaths = Array.isArray(scopeRecovery && scopeRecovery.recoveredSourcePaths)
+            ? scopeRecovery.recoveredSourcePaths
+                .map((sourcePath) => String(sourcePath || '').trim())
+                .filter(Boolean)
+            : [];
         const knowledgePointCount = Number.isFinite(Number(summary.returnedKnowledgePoints))
             ? Number(summary.returnedKnowledgePoints)
             : (Array.isArray(result && result.knowledgePoints) ? result.knowledgePoints.length : 0);
@@ -656,9 +667,17 @@
             endpoint,
             transport,
             latencyMs !== null ? `${latencyMs} ms` : '',
+            activeTarget
+                ? translate('agentWorkspace.apiStatus.scope', 'Scope: {scope}', { scope: activeTarget })
+                : '',
             state === 'ok' ? pluralizeApiStatusCount(knowledgePointCount, 'knowledge point', 'knowledge points') : '',
             state === 'ok' ? pluralizeApiStatusCount(citationCount, 'citation', 'citations') : '',
             state === 'ok' ? pluralizeApiStatusCount(memoryCount, 'memory', 'memories') : '',
+            state === 'ok' && recoveredSourcePaths.length > 0
+                ? translate('agentWorkspace.apiStatus.recovered', 'Recovered: {sources}', {
+                    sources: recoveredSourcePaths.slice(0, 2).join(', '),
+                })
+                : '',
             error,
         ].filter(Boolean);
         node.setAttribute('data-api-state', state);
@@ -3255,9 +3274,11 @@
         input.value = '';
         appendUserMessage(message);
         const sendStartedAt = Date.now();
+        let requestActiveTarget = '';
         try {
             const userId = getUserId();
             const requestContext = resolveKnowledgeWorkspaceRequestContext();
+            requestActiveTarget = requestContext.activeTarget;
             const requestPayload = {
                 userId,
                 sessionId: getOrCreateConversationSessionId(userId),
@@ -3271,6 +3292,7 @@
                 state: 'pending',
                 endpoint: AGENT_CONVERSATION_ENDPOINT,
                 transport: 'SSE',
+                activeTarget: requestContext.activeTarget,
             });
             const conversationCall = await requestConversationWithStreamingFallback(requestPayload);
             const result = conversationCall && typeof conversationCall === 'object' && conversationCall.result
@@ -3281,6 +3303,7 @@
                 endpoint: AGENT_CONVERSATION_ENDPOINT,
                 transport: String(conversationCall && conversationCall.transport || 'SSE'),
                 latencyMs: Number(conversationCall && conversationCall.latencyMs),
+                activeTarget: requestContext.activeTarget,
                 result,
             });
             const appendedAssistant = await appendAssistantConversationResult(result);
@@ -3307,6 +3330,7 @@
                 state: 'error',
                 endpoint: AGENT_CONVERSATION_ENDPOINT,
                 latencyMs: Date.now() - sendStartedAt,
+                activeTarget: requestActiveTarget,
                 error: String(error && error.message || error || 'unknown_error'),
             });
             appendLocalizedAssistantMessage(
diff --git a/src/frontend/locales/en.json b/src/frontend/locales/en.json
@@ -407,7 +407,9 @@
       "idle": "Idle",
       "pending": "Checking",
       "ok": "Available",
-      "error": "Failed"
+      "error": "Failed",
+      "scope": "Scope: {scope}",
+      "recovered": "Recovered: {sources}"
     },
     "graphFocus": {
       "title": "Knowledge Focus",
diff --git a/src/frontend/locales/zh.json b/src/frontend/locales/zh.json
@@ -407,7 +407,9 @@
       "idle": "空闲",
       "pending": "检测中",
       "ok": "可用",
-      "error": "失败"
+      "error": "失败",
+      "scope": "范围：{scope}",
+      "recovered": "已扩展：{sources}"
     },
     "graphFocus": {
       "title": "知识聚焦",
diff --git a/src/learning/KnowledgeLearningPlatform.test.ts b/src/learning/KnowledgeLearningPlatform.test.ts
@@ -1553,6 +1553,65 @@ describe('KnowledgeLearningPlatform', () => {
         );
     });
 
+    test('agent conversation recovers a title-like knowledge point when the active scope misses another corpus', async () => {
+        await platform.ingestKnowledge({
+            incremental: true,
+            documents: [
+                {
+                    documentId: 'doc_financial_scope',
+                    sourcePath: 'Knowledge_Base/financial/liquidity.md',
+                    language: 'en',
+                    workspaceId: 'financial',
+                    corpusId: 'financial',
+                    content: '# Liquidity\nLiquidity analysis explains cash conversion and working capital timing.',
+                },
+                {
+                    documentId: 'doc_water_glass_scope_recovery',
+                    sourcePath: 'Knowledge_Base/waterglass/water glass.md',
+                    language: 'en',
+                    workspaceId: 'waterglass',
+                    corpusId: 'waterglass',
+                    content: [
+                        '# Water Glass',
+                        'A water glass is a transparent drinking vessel that contains water for use.',
+                        '',
+                        '## Material role',
+                        'The water glass body provides a boundary between the liquid and the environment.',
+                    ].join('\n'),
+                },
+            ],
+        });
+
+        const response = await platform.agentConversation({
+            userId: 'agent_scope_recovery_user',
+            sessionId: 'session_scope_recovery',
+            message: 'what is water glass?',
+            scope: {
+                workspaceId: 'financial',
+                corpusId: 'financial',
+                sourcePathPrefixes: ['Knowledge_Base/financial'],
+            },
+            topK: 8,
+            persistMemory: false,
+        });
+
+        expect(response.answer).toMatch(/^A water glass is/i);
+        expect(response.knowledgePoints).toHaveLength(1);
+        expect(response.summary.returnedKnowledgePoints).toBe(1);
+        expect(response.summary.returnedCitations).toBeGreaterThanOrEqual(2);
+        expect(response.trace.usedScope.scopeSource).toBe('planner_scope_recovery');
+        expect(response.trace.retrieval.retrievalModes).toContain('planner_scope_recovery');
+        expect(response.trace.planner?.titleHitDocumentIds).toContain('doc_water_glass_scope_recovery');
+
+        const recoveredPoint = response.knowledgePoints[0] as any;
+        expect(recoveredPoint.documentId).toBe('doc_water_glass_scope_recovery');
+        expect(recoveredPoint.sourcePath).toBe('Knowledge_Base/waterglass/water glass.md');
+        expect(recoveredPoint.matchCount).toBeGreaterThanOrEqual(2);
+        expect(recoveredPoint.matchedSpans.map((span: any) => span.title)).toEqual(
+            expect.arrayContaining(['Water Glass', 'Material role'])
+        );
+    });
+
     test('agent conversation explanation and next actions adapt to comparison-style queries', async () => {
         await platform.ingestKnowledge({
             incremental: true,
diff --git a/src/learning/KnowledgeLearningPlatform.ts b/src/learning/KnowledgeLearningPlatform.ts
diff --git a/src/learning/types.ts b/src/learning/types.ts