Skip to content

feat(drive): add +list-comments shortcut with anchor-aware filtering#1114

Draft
kyalpha313 wants to merge 3 commits into
larksuite:mainfrom
kyalpha313:feat/drive-list-comments-shortcut
Draft

feat(drive): add +list-comments shortcut with anchor-aware filtering#1114
kyalpha313 wants to merge 3 commits into
larksuite:mainfrom
kyalpha313:feat/drive-list-comments-shortcut

Conversation

@kyalpha313
Copy link
Copy Markdown
Contributor

@kyalpha313 kyalpha313 commented May 26, 2026

Summary

新增 drive +list-comments shortcut,以智能默认值返回 docx 评论,对齐用户在飞书 UI 侧栏看到的评论卡片。详见 #1111 的背景与设计讨论。

Changes

  • 新增 shortcuts/drive/drive_list_comments.go:实现 shortcut 主体(Validate / DryRun / Execute)+ 锚定状态识别 + 排序
  • 新增 shortcuts/drive/drive_list_comments_test.go:单测覆盖 parse / validate / normalize / buildCommentItem / sort / DryRun,以及两个 httpmock E2E(默认过滤 / --include-orphaned
  • 修改 shortcuts/drive/shortcuts.go:注册 DriveListComments
  • 修改 shortcuts/drive/shortcuts_test.go:在 expected commands 表里加入 +list-comments

CLI 形态

# 默认:筛选=未解决 + 锚定有效,输出=含评论的回复 + reaction,顺序=按 anchor 出现顺序
lark-cli drive +list-comments --doc <docx-url-or-token>

# Opt-in 扩展
--include-orphaned   # 加上孤立锚的评论(尾部)
--include-resolved   # 加上已解决评论
--no-reactions       # 不拉取 reaction(节省一次接口往返)
--order=created      # 按 create_time 排序

输出结构

复用 drive file.comments list 原响应,每条评论多两个派生字段:

{
  "comment_id": "...",
  "quote": "...",
  "is_solved": false,
  "is_whole": false,
  "anchor_state": "valid" | "structural" | "orphaned",
  "anchor_position": 12345,
  "reply_list": { "replies": [ { ..., "reactions": [...] } ] }
}

外层附 counts: {total, valid, structural, orphaned}file_token / wiki_token,便于上层做断言。

内部流程

  1. 解析 --doc,wiki URL 通过 wiki/v2/spaces/get_node 解包到 docx token。
  2. 分页拉 GET /open-apis/drive/v1/files/:file_token/comments,按 flag 决定是否带 is_solved=falseneed_reaction=true
  3. 一次 POST /open-apis/docs_ai/v1/documents/:token/fetch 拉取正文(format=xml,避免 fix: remove unsupported docs fetch text format #1109text 不可用)。
  4. 对每条评论计算 anchor_state
    • is_whole=truevalid
    • quote == "[Sticky note]" → 文档存在 <readonly-block> 元素 → structural,否则 orphaned
    • 其它 → quote 首行经过剥 HTML 标签 + 解 HTML 实体 + 去除所有空白 后做子串匹配。去空白是为了让被内联标签(如 <b>)拆开的锚文本仍能匹配——例如 <p>由 <b>P2 / P3 事件聚类</b>触发:…</p> 中的 quote 由 P2 / P3 事件聚类触发:…
  5. --include-* 过滤后排序输出。

已知限制(issue 中也已说明)

  1. 短锚多处出现:API 不返回 anchor.block_id,无法区分多个相同文本的锚。算法取第一个出现位置,可能:
    • 假阳性 valid:该评论锚定的具体实例已被删除,但相同文本在文档其它位置仍存在;
    • 排序偏前:锚定的实际位置在文档中部/末尾,但首个出现位置在前部。
  2. Sticky-note 顺序:使用 <readonly-block> 元素在 XML 中的位置,可能与渲染位置不完全一致。

Test Plan

  • go test -race -count=1 ./shortcuts/drive/... 通过(含本 PR 新增的所有 unit 和 httpmock E2E 测试)
  • go test -race -count=1 ./shortcuts/... 通过(无回归)
  • go vet ./shortcuts/drive/ 通过
  • gofmt -l shortcuts/drive/drive_list_comments.go shortcuts/drive/drive_list_comments_test.go 无输出
  • go mod tidy 不改动 go.mod / go.sum
  • 真实 docx 端到端验证:拉取 https://<host>/docx/<token> 上一份共 29 条未解决评论的真实文档,UI 显示 22 条,本 shortcut 默认返回 23 条(22 valid + 1 structural),与 UI 对齐到一条短锚多处出现的已知限制($ID 的"已知错误"评论:算法误判为 valid,UI 实际隐藏)。--include-orphaned 后返回全部 29 条。

Closes #1111

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a drive "+list-comments" shortcut to list DOCX comments with wiki resolution, pagination, normalized text anchoring, and classification (valid/structural/orphaned).
    • Filtering: orphaned and resolved comments are hidden by default; flags to include orphaned or resolved and toggle reactions.
    • Sorting: by anchor position (default) or by creation time.
    • Dry-run mode previews the multi-step API call plan.
  • Tests

    • Comprehensive coverage for parsing, normalization, anchoring, sorting, filtering, and dry-run behaviors.

Review Change Stack

Adds drive +list-comments, a new shortcut that lists docx comments with
smart defaults aligned to the Lark UI side panel:

  - filter: unresolved + non-orphan anchors
  - output: each comment with its reply_list (replies + reactions)
  - order: by anchor position in the document body

Opt-in flags --include-orphaned, --include-resolved, --no-reactions, and
--order=created widen the result set or change ordering.

Each item is augmented with derived fields:

  - anchor_state: "valid" | "structural" | "orphaned"
  - anchor_position: byte offset in the normalized document text

Orphan detection strategy:

  - is_whole=true -> valid (full-document comments)
  - quote == "[Sticky note]" -> "structural" if any <readonly-block>
    element still exists in the document, otherwise "orphaned"
  - Otherwise: first-line of quote (after HTML strip + entity decode +
    whitespace removal) must occur in the normalized document text

Two known limitations are documented in code and the linked issue:

  1. Short anchor text appearing in multiple places cannot be uniquely
     localized; the first occurrence wins. May falsely classify a
     deleted-anchor comment as valid when the text still appears
     elsewhere.
  2. Sticky-note ordering uses the readonly-block XML element position,
     which may differ from the rendered visual placement in some cases.

Closes larksuite#1111
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 26, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot added domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths labels May 26, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ec8f2aeb-e934-4281-93c1-f1f94bbac481

📥 Commits

Reviewing files that changed from the base of the PR and between 64e65b6 and b0267d3.

📒 Files selected for processing (1)
  • shortcuts/drive/drive_list_comments_test.go
💤 Files with no reviewable changes (1)
  • shortcuts/drive/drive_list_comments_test.go

📝 Walkthrough

Walkthrough

Adds a new drive +list-comments shortcut that lists DOCX comments with smart defaults: it resolves wiki nodes to DOCX tokens, paginates comments, fetches DOCX XML, normalizes text for anchor matching, classifies anchors as valid/structural/orphaned, supports filtering/sorting, and includes dry-run and tests.

Changes

Drive +list-comments Shortcut Implementation

Layer / File(s) Summary
Shortcut definition and input validation
shortcuts/drive/drive_list_comments.go
Defines the shortcut flags (--doc, --include-orphaned, --include-resolved, --no-reactions, --order) and implements parsing/validation for docx, wiki, and bare-token inputs.
API orchestration: planning, listing, and document fetch
shortcuts/drive/drive_list_comments.go
Dry-run emits multi-step plan (wiki resolve when needed), lists paginated comments to bounded limits, and fetches DOCX XML via docs AI fetch (format=xml).
Text normalization and anchor classification
shortcuts/drive/drive_list_comments.go
Normalizes XML/text (strip tags, unescape entities, collapse whitespace), projects raw positions to normalized coordinates, and classifies anchors: full-doc (valid), sticky (structural if readonly-block present, else orphaned), quote-based (valid if located, else orphaned).
Execution flow, sorting, and output formatting
shortcuts/drive/drive_list_comments.go
Main flow: resolve wiki → list comments → fetch/normalize document → build anchor-classified items → filter by orphan/resolved → sort (orphans last; anchor or created order) → output JSON with per-state counts. Includes map clone and quote truncation helpers.
Unit tests: input parsing and validation
shortcuts/drive/drive_list_comments_test.go
Tests parseListCommentsDocRef and validateListComments for URL/token parsing and validation error/success cases.
Unit tests: text normalization and anchor detection
shortcuts/drive/drive_list_comments_test.go
Tests normalizeDocContent/normalizeQuoteNeedle, buildCommentItem, sortCommentItems, and projectRawPosToNormalized (tag stripping, entity decoding, whitespace handling, first-line needle behavior, classification, orphan fallback, and sorting).
Integration tests: dry-run and end-to-end
shortcuts/drive/drive_list_comments_test.go
Tests DriveListComments.DryRun for docx/wiki (resolve step), filter-parameter behavior, and httpmock E2E tests verifying default orphan filtering and --include-orphaned behavior.
Shortcut registry integration
shortcuts/drive/shortcuts.go, shortcuts/drive/shortcuts_test.go
Registers DriveListComments in the shortcuts list and updates the test expectation to include "+list-comments".

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • larksuite/cli#947: Implements related wiki URL-to-{type,token} resolution used before listing DOCX comments.

Suggested reviewers

  • wittam-01
  • fangshuyu-768

Poem

🐰 I hopped through XML, tags set free,

I sniffed each quote where it ought to be,
Orphans tucked gently at the end,
Valid ones bloom, structural defend,
A rabbit's nod to tidy commentary.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: addition of a new +list-comments shortcut with anchor-aware filtering for the drive service.
Description check ✅ Passed The PR description is comprehensive and follows the template structure with Summary, Changes, CLI behavior, output structure, internal flow, test plan, and related issue references, though written primarily in Chinese.
Linked Issues check ✅ Passed The implementation successfully addresses all coding objectives from #1111: anchor-state detection (valid/structural/orphaned), document XML fetching, text normalization with HTML tag stripping, quote matching, sorting by anchor position, filtering flags, comprehensive test coverage, and proper output structure with derived fields.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the #1111 objectives: new shortcut implementation, comprehensive tests, and registration in the shortcuts module. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@shortcuts/drive/drive_list_comments_test.go`:
- Around line 458-593: Both tests
(TestDriveListComments_E2E_FiltersOrphanedByDefault and
TestDriveListComments_E2E_IncludeOrphanedKeepsAll) rely on global config and
must isolate config state; add t.Setenv("LARKSUITE_CLI_CONFIG_DIR", t.TempDir())
at the start of each test (or in a shared test helper called before
cmdutil.TestFactory/mountAndRunDrive) so the tests use a temporary
LARKSUITE_CLI_CONFIG_DIR and avoid cross-test state leakage.
- Around line 86-132: Replace uses of common.TestNewRuntimeContext(...) in these
tests with the repo-standard test factory cmdutil.TestFactory(t,
&core.CliConfig{}); specifically, when building the runtime for
validateListComments calls (tests referencing newListCommentsCmd and
validateListComments), call cmdutil.TestFactory(t, &core.CliConfig{}) to obtain
the test factory/runtime instead of common.TestNewRuntimeContext, and update any
variable names as needed so validateListComments receives the factory/runtime
produced by cmdutil.TestFactory.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 177b228c-d297-4257-8756-8d8f50e92a48

📥 Commits

Reviewing files that changed from the base of the PR and between 1135fc2 and e35ed9e.

📒 Files selected for processing (4)
  • shortcuts/drive/drive_list_comments.go
  • shortcuts/drive/drive_list_comments_test.go
  • shortcuts/drive/shortcuts.go
  • shortcuts/drive/shortcuts_test.go

Comment thread shortcuts/drive/drive_list_comments_test.go
Comment thread shortcuts/drive/drive_list_comments_test.go
Add t.Setenv("LARKSUITE_CLI_CONFIG_DIR", t.TempDir()) to the two httpmock E2E
tests for drive +list-comments, matching the pattern used by
drive_inspect_test.go and AGENTS.md guidance for config-state isolation.

Addresses CodeRabbit review on larksuite#1114.
…tests

Revert the t.Setenv("LARKSUITE_CLI_CONFIG_DIR", ...) added in 64e65b6. Per the
repo learning from PR larksuite#343, tests built via cmdutil.TestFactory(t, config) use
an in-memory config closure and never touch the filesystem, so isolating
CONFIG_DIR has no effect. Only the real NewDefault() factory path needs it.

Verified: cmdutil.TestFactory sets Config: func() (*core.CliConfig, error) {
return config, nil } — no filesystem access.
hasStructuralAnchor := strings.Contains(docXML, structuralAnchorTag)
// Approximate position of the FIRST sticky-note anchor in normalized space.
// All sticky-anchored comments share this position for sorting purposes.
structuralPos := projectRawPosToNormalized(docXML, normalized, structuralAnchorTag)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This computes a single structuralPos from the first <readonly-block> and then reuses that same position for every sticky-note comment. As a result, multiple structural comments do not actually sort by document position under the default order=anchor; they all tie on the same anchor position and then fall back to create_time. That does not match the PR description about approximating sticky-note order from XML position. If we cannot map individual sticky-note comments to distinct structural anchors, we should either avoid claiming anchor-order fidelity for them or make this limitation explicit in output and tests.

return ""
}
decoded := html.UnescapeString(firstLine)
return whitespacePattern.ReplaceAllString(decoded, "")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rich-text quotes containing inline markup will be misclassified here. The PR description says quote matching should strip HTML tags before comparison, but normalizeQuoteNeedle only unescapes entities and removes whitespace. If the comments API returns formatting tags inside quote (for example <b> or <i>), the needle will still contain markup and fail to match the tag-stripped document text, causing a still-valid comment to be classified as orphaned and filtered out by default. Please strip tags here as well, and add a test where the quote itself contains inline markup.

@wittam-01
Copy link
Copy Markdown
Collaborator

wittam-01 commented May 28, 2026

感谢这个PR,看了下实现,现在是根据评论的quota去匹配正文的,定位在xml内容的起始位置,有几个问题:

  1. 对于block块的评论(如高亮块block的评论,quota为“[高亮块]”)会因为匹配不到直接被过滤掉;
  2. 只返回position的话,如果后续文档内容有变更,位置发生偏移,这个position就定位不到了。

另外,我们下周会在评论接口返回评论所在的blockID位置,这个会更加准确,也能覆盖block评论场景,可以等下我们下周的版本。

@kyalpha313 kyalpha313 marked this pull request as draft May 30, 2026 00:51
@kyalpha313
Copy link
Copy Markdown
Contributor Author

@wittam-01 感谢 review 和详细反馈,这两点都很到位:

  1. [高亮块] 之类的 block 占位符确实会被我的文本匹配漏掉——我目前只特判了 [Sticky note](对应 <readonly-block>),其它 block 类锚没覆盖。
  2. position 不稳定也是这个方案的硬伤,只对"取完立即排序"友好,持久化使用会漂。

听到下周接口会返回 blockID,这才是真正的根治方案——可以唯一定位锚、自然覆盖 block 评论场景,连 fetch 正文都省了。我先把这个 PR 转 draft 暂停,等新接口上线后用 blockID 重写,代码也能砍掉一大半。

不用提前给我看新字段 schema,下周看到了再说。期间如果有什么我能帮忙的(比如新接口上线时优先试一下、或者拆其它周边小工具),随时 ping 我。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] drive +list-comments:默认按用户实际可见口径返回评论,含孤立锚识别与按位置排序

4 participants