Skip to content

feat: add compaction timeout mechanism and health check#490

Open
kingl3540 wants to merge 1 commit into
jiulingyun:mainfrom
kingl3540:feature/compaction-timeout-health-check
Open

feat: add compaction timeout mechanism and health check#490
kingl3540 wants to merge 1 commit into
jiulingyun:mainfrom
kingl3540:feature/compaction-timeout-health-check

Conversation

@kingl3540
Copy link
Copy Markdown

  • Add safety timeout (5 minutes) to prevent compaction from hanging indefinitely
  • Implement health check module to track compaction operations
  • Monitor for stuck compactions (>10 minutes) and consecutive failures
  • Emit diagnostic events for external monitoring and debugging
  • Improve error handling in compaction lifecycle events

Changes:

  • compact.ts: Apply safety timeout wrapper, add health tracking
  • compaction-health-check.ts: New health monitoring module
  • subscribe.handlers.lifecycle.ts: Better error handling in compaction end

- Add safety timeout (5 minutes) to prevent compaction from hanging indefinitely
- Implement health check module to track compaction operations
- Monitor for stuck compactions (>10 minutes) and consecutive failures
- Emit diagnostic events for external monitoring and debugging
- Improve error handling in compaction lifecycle events

Changes:
- compact.ts: Apply safety timeout wrapper, add health tracking
- compaction-health-check.ts: New health monitoring module
- subscribe.handlers.lifecycle.ts: Better error handling in compaction end
@Elegying
Copy link
Copy Markdown

根因已经找到了:这是服务器上 openclaw-cn 0.1.8-fix.3 的一个运行时兼容问题。

具体是:

社区飞书插件调用的是旧签名:readAllowFromStore("feishu")
你的 openclaw-cn 运行时只兼容新签名:readAllowFromStore({ channel, accountId })
结果就是每次读取授权名单都读成空,所以它反复让你重新配对

@jiulingyun
Copy link
Copy Markdown
Owner

感谢提交!这个 PR 的出发点(防止 compaction 卡死)是有价值的,但目前有几个问题需要先解决,暂时无法合并:

1. 缺少关键文件

compact.ts 中导入了 ./compaction-safety-timeout.jscompactWithSafetyTimeout, EMBEDDED_COMPACTION_TIMEOUT_MS),但 PR 中没有包含这个文件,代码无法编译。

2. package-lock.json 包含大量无关变更

lockfile 中混入了与本 PR 无关的改动:

  • 版本号从 0.1.5-fix.2 变为 0.1.8-fix.2
  • 升级了 hono 版本
  • 新增了 better-sqlite3 等 optional dependencies
  • 新增了 openclaw 二进制别名

请只提交与 compaction timeout 功能相关的变更,不要混入本地的 lockfile 状态差异。

3. clearTimeout 用于清理 setInterval 创建的定时器

compaction-health-check.tsstartCompactionTracking 使用 setInterval 创建定时器,但 cleanupHealthCheck 中用 clearTimeout 清理,应该使用 clearInterval

4. 过度工程化

compaction-health-check.ts 有 260 行,导出了大量函数(getAllCompactionHealthStates, getCompactionHealthSummary, isCompactionHealthy 等),但实际只使用了 start/complete/failCompactionTracking 三个。建议精简到实际需要的功能。

5. 缺少测试

新增了两个模块但没有对应的测试文件。

6. 根因可能不在这里

PR 评论中 @Elegying 已指出实际问题是飞书插件 API 签名兼容性问题(readAllowFromStore 新旧签名不匹配),这个超时/健康检查机制并不能解决该根因。


建议:

  1. 补充缺失的 compaction-safety-timeout.ts 文件
  2. 移除 package-lock.json 的无关变更
  3. 修复 clearTimeout/clearInterval 的 bug
  4. 精简 health check 模块,移除未使用的导出
  5. 添加对应的单元测试
  6. 确认 compaction timeout 是否是在根因(飞书 API 兼容性)修复之外仍有必要的防护措施

期待你的更新!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants