Skip to content

Commit c93a2bc

Browse files
feat(feishu): harden ingress relay and decisions
1 parent a95e6cf commit c93a2bc

18 files changed

Lines changed: 1324 additions & 42 deletions

README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,12 +130,40 @@ export default defineConfig({
130130
messageFormat: 'text',
131131
appId: process.env.FEISHU_APP_ID,
132132
appSecret: process.env.FEISHU_APP_SECRET,
133+
webhookVerificationToken: process.env.FEISHU_WEBHOOK_VERIFICATION_TOKEN,
134+
webhookEncryptKey: process.env.FEISHU_WEBHOOK_ENCRYPT_KEY,
135+
webhookMaxSkewSeconds: 300,
136+
webhookDedupTtlSeconds: 600,
137+
relaySecret: process.env.OPENCLAW_RELAY_SECRET,
138+
relayMaxSkewSeconds: 300,
139+
relayNonceTtlSeconds: 600,
140+
deliveryMaxRetries: 2,
141+
deliveryRetryBaseMs: 300,
142+
deliveryRetryMaxMs: 5000,
133143
baseTaskUrl: process.env.OPENCLAW_FEISHU_PROGRESS_BASE_TASK_URL ?? 'http://127.0.0.1:8765',
134144
progressThrottlePercent: 15,
135145
},
136146
});
137147
```
138148

149+
如果配置了 `feishu.relaySecret`,OpenClaw 调用 `/api/feishu/relay``/api/feishu/relay/event` 时必须附带:
150+
151+
- `x-openclaw-timestamp`
152+
- `x-openclaw-nonce`
153+
- `x-openclaw-signature`
154+
155+
签名内容为 `METHOD + path + timestamp + nonce + stable JSON body` 的 HMAC-SHA256。
156+
157+
如果配置了 `feishu.webhookEncryptKey`,飞书事件订阅调用 `/api/feishu/webhook` 时会校验:
158+
159+
- `x-lark-request-timestamp`
160+
- `x-lark-request-nonce`
161+
- `x-lark-signature`
162+
163+
如果同时配置了 `feishu.webhookVerificationToken`,还会校验回调体中的 token。事件去重状态会持久化到 `.opencroc/feishu-webhook-dedup.json`,这样服务重启后 TTL 内的重复投递仍会被拦截。
164+
165+
出站发消息与卡片更新默认会对 `429/5xx/网络错误` 做有限次指数退避重试,可用 `deliveryMaxRetries``deliveryRetryBaseMs``deliveryRetryMaxMs` 调整。
166+
139167
启动服务:
140168

141169
开发态直启:

docs/roadmap.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,9 @@
3131
- 已完成:Topic 基座(strict-by-thread):为每个 Feishu/OpenClaw 对话线程生成确定性的 `topicId`,并禁止跨 `topicId` 的自动关联,避免“莫名其妙把两件事混在一起”
3232
- 已完成:详情页增强:在 `/tasks/:id` 详情中展示该话题的机器人 roster 与话题任务串(基于 `topicId` / 星球关系)
3333
- 已完成:手动关联作为补充:支持手动把不同话题的任务用星球关系关联(等价 topic merge 的最小形态)
34-
- 进行中`/api/feishu/webhook` 的安全校验与幂等去重“可持久”:当前已有进程内去重,重启后 TTL 持久去重与完整安全校验尚未闭环
35-
- 未开始`/api/feishu/relay``/api/feishu/relay/event` 增加最小鉴权(共享密钥/HMAC),并有回放保护(timestamp/nonce)
36-
- 未开始:出站投递稳态:对 429/5xx/网络抖动的重试退避策略明确,可配置,且不会造成无限重试与消息风暴
34+
- 已完成`/api/feishu/webhook` 的安全校验与幂等去重“可持久”:支持 verification token / signature 校验,且重启后 TTL 内仍可持续去重
35+
- 已完成`/api/feishu/relay``/api/feishu/relay/event` 增加最小鉴权(共享密钥/HMAC),并有回放保护(timestamp/nonce)
36+
- 已完成:出站投递稳态:对 429/5xx/网络抖动的重试退避策略明确,可配置,且不会造成无限重试与消息风暴
3737

3838
交付(主线 B:OpenCroc scan/pipeline)
3939

@@ -42,7 +42,7 @@
4242

4343
Sprint 1 DoD(验收)
4444

45-
- 未开始:针对 webhook/relay 鉴权与去重的单测覆盖完成,并在 CI 里跑
45+
- 已完成:针对 webhook/relay 鉴权与去重的单测覆盖完成,并在 CI 里跑
4646
- 进行中:本地 smoke:能在飞书看到 ACK -> 多次 progress -> done/failed(主链路已具备,含重试场景的闭环未完成)
4747
- 进行中:文档:新增/更新“鉴权配置与部署注意事项”小节(已有 Troubleshooting / systemd 基础,未形成鉴权专节)
4848

@@ -53,7 +53,7 @@ Sprint 1 DoD(验收)
5353
交付(主线 A:飞书卡片交互)
5454

5555
- 进行中:`waiting` 状态生成带按钮的卡片(例如:继续执行/停止/只生成报告/仅 scan),并支持回调;当前已有 waiting 卡片骨架,缺少决策按钮回调
56-
- 未开始:增加决策提交接口(例如:`POST /api/tasks/:id/decision`),支持 option id 与可选 free text
56+
- 已完成:增加决策提交接口(例如:`POST /api/tasks/:id/decision`),支持 option id 与可选 free text
5757
- 进行中:卡片 `card-live` 原地更新:基础更新能力已存在,但尚未接入完整决策流
5858
- 未开始:Topic 决策门禁:机器人自动产出“会影响外部/会写入产物/会修改代码”的内容时,必须先进入 `waiting`,由你在飞书卡片里确认
5959

@@ -65,7 +65,7 @@ Sprint 1 DoD(验收)
6565
Sprint 2 DoD(验收)
6666

6767
- 未开始:一条完整流程:飞书触发 -> 进入 waiting -> 点按钮 -> 任务继续 -> 最终在飞书收到完成摘要 + 任务链接
68-
- 未开始:决策回调与任务状态变更有单测(含重复点击/重复回调幂等
68+
- 进行中:决策回调与任务状态变更有单测(决策提交与状态恢复测试已补,重复点击/重复回调幂等仍待完成
6969

7070
## Sprint 3(2026-04-20 ~ 2026-05-03):可观测性与生产部署一键化(进行中)
7171

docs/troubleshooting.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,18 +16,25 @@
1616
- 确认飞书应用权限与范围:机器人是否有发消息权限,是否能在目标群/私聊中发言。
1717
- 确认 `OPENCLAW_FEISHU_PROGRESS_BASE_TASK_URL` 可被飞书用户访问,否则卡片/文本里的“任务详情链接”会打不开。
1818
- 如果使用 `messageFormat: 'card-live'`:需要能成功拿到 `message_id` 并允许 PATCH 更新;否则会退化为多条消息。
19-
- 如果出现 429:属于飞书限流,需要降低更新频率或启用退避重试策略
19+
- 如果出现 429:当前版本会做有限次退避重试;如果仍失败,需要降低更新频率,或调大 `deliveryRetryBaseMs` / `deliveryRetryMaxMs`
2020

2121
## Webhook 收不到或验签失败
2222

2323
- 确认飞书事件订阅 URL 指向 `/api/feishu/webhook`,并能通过公网访问。
2424
- 首次校验会发送 `url_verification`,服务需要正确回传 challenge。
25-
- 若启用了验签/加密:确保配置的 token/key 与飞书后台一致。
25+
- 若启用了 `feishu.webhookVerificationToken`:确保它和飞书事件订阅后台的 verification token 一致。
26+
- 若启用了 `feishu.webhookEncryptKey`:确认飞书请求里带有 `x-lark-request-timestamp``x-lark-request-nonce``x-lark-signature`,并且 encrypt key 与后台一致。
27+
- 如果报时间戳过期:检查服务器时钟偏差,或适当调大 `webhookMaxSkewSeconds`
28+
- 如果怀疑重复事件没有挡住:检查 `.opencroc/feishu-webhook-dedup.json` 是否可写,服务重启后去重状态会从这里恢复。
2629

2730
## relay 进度能进来但飞书不更新
2831

32+
- 如果启用了 `feishu.relaySecret`:确认 OpenClaw 已发送 `x-openclaw-timestamp``x-openclaw-nonce``x-openclaw-signature` 三个请求头。
33+
- 确认 OpenClaw 与本服务机器时钟偏差不要超过 `relayMaxSkewSeconds`,否则会被当成过期请求拦截。
34+
- 如果出现 409:通常是相同 `timestamp + nonce` 被重复发送,属于回放保护命中。
2935
- 检查 `/api/feishu/relay/event` 是否成功返回(是否被鉴权拦截、是否 4xx/5xx)。
3036
- 检查任务是否已 bind 到飞书(`chatId` 是否正确;是否记录了 `messageId` 用于 live update)。
37+
- 如果飞书 API 偶发 429/5xx 或网络错误:确认 `deliveryMaxRetries``deliveryRetryBaseMs``deliveryRetryMaxMs` 是否配置合理。
3138
- 如果 OpenClaw 负责最终答案:确保 relay start 时设置 `finalAnswerSource=openclaw`,避免重复发送最终摘要。
3239

3340
## Studio 页面没数据

src/server/croc-office.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ import type { OpenCrocConfig, PipelineRunResult, GeneratedTestFile, ExecutionMet
33
import type { BackendStatus, ExecutionQualityGateResult, ExecutionRunMode, AuthStatus } from '../execution/types.js';
44
import type { ScanResult } from '../graph/types.js';
55
import type { SummonPlan } from '../agents/task-router.js';
6-
import { TaskStore, type TaskDecisionPrompt, type TaskRecord } from './task-store.js';
6+
import { TaskStore, type TaskDecisionPrompt, type TaskDecisionSubmission, type TaskRecord } from './task-store.js';
77
import type { FeishuProgressBridge, FeishuTaskTarget } from './feishu-bridge.js';
88
import { buildProjectChatAnswer, collectProjectChatSnapshot } from './chat-analysis.js';
99

@@ -269,6 +269,12 @@ export class CrocOffice {
269269
return task;
270270
}
271271

272+
async submitTaskDecision(taskId: string, submission: TaskDecisionSubmission, waitForDelivery = false): Promise<TaskRecord | undefined> {
273+
const task = this.taskStore.resolveWaiting(taskId, submission);
274+
await this.emitTaskUpdate(task, waitForDelivery);
275+
return task;
276+
}
277+
272278
getAgents(): CrocAgent[] {
273279
return this.agents;
274280
}

src/server/feishu-bridge.ts

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,16 @@ export interface FeishuBridgeConfig {
44
enabled?: boolean;
55
baseTaskUrl?: string;
66
progressThrottlePercent?: number;
7+
relaySecret?: string;
8+
relayMaxSkewSeconds?: number;
9+
relayNonceTtlSeconds?: number;
10+
webhookVerificationToken?: string;
11+
webhookEncryptKey?: string;
12+
webhookMaxSkewSeconds?: number;
13+
webhookDedupTtlSeconds?: number;
14+
deliveryMaxRetries?: number;
15+
deliveryRetryBaseMs?: number;
16+
deliveryRetryMaxMs?: number;
717
appId?: string;
818
appSecret?: string;
919
tenantAccessToken?: string;

src/server/feishu-delivery.test.ts

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,4 +157,112 @@ describe('FeishuApiDelivery', () => {
157157
expect(String(secondInit.body)).toContain('"root_id":"om_root_1"');
158158
expect(String(secondInit.body)).toContain('"reply_to_message_id":"om_ack_1"');
159159
});
160+
161+
it('retries message delivery on 429 and succeeds without infinite retry', async () => {
162+
const fetchMock = vi.fn()
163+
.mockResolvedValueOnce({
164+
ok: false,
165+
status: 429,
166+
statusText: 'Too Many Requests',
167+
headers: new Headers({ 'retry-after': '0' }),
168+
text: async () => 'rate limited',
169+
})
170+
.mockResolvedValueOnce({
171+
ok: true,
172+
json: async () => ({ code: 0, msg: 'ok', data: { message_id: 'om_retry_1' } }),
173+
});
174+
global.fetch = fetchMock as typeof fetch;
175+
const delivery = new FeishuApiDelivery({
176+
enabled: true,
177+
mode: 'live',
178+
tenantAccessToken: 'tenant_token_xxx',
179+
deliveryMaxRetries: 2,
180+
deliveryRetryBaseMs: 0,
181+
deliveryRetryMaxMs: 0,
182+
});
183+
184+
const receipt = await delivery.send(sampleMessage);
185+
186+
expect(receipt).toEqual({ messageId: 'om_retry_1', rootId: undefined, threadId: undefined });
187+
expect(fetchMock).toHaveBeenCalledTimes(2);
188+
});
189+
190+
it('retries message delivery on network failure and succeeds', async () => {
191+
const fetchMock = vi.fn()
192+
.mockRejectedValueOnce(new Error('network down'))
193+
.mockResolvedValueOnce({
194+
ok: true,
195+
json: async () => ({ code: 0, msg: 'ok', data: { message_id: 'om_retry_network' } }),
196+
});
197+
global.fetch = fetchMock as typeof fetch;
198+
const delivery = new FeishuApiDelivery({
199+
enabled: true,
200+
mode: 'live',
201+
tenantAccessToken: 'tenant_token_xxx',
202+
deliveryMaxRetries: 2,
203+
deliveryRetryBaseMs: 0,
204+
deliveryRetryMaxMs: 0,
205+
});
206+
207+
const receipt = await delivery.send(sampleMessage);
208+
209+
expect(receipt).toEqual({ messageId: 'om_retry_network', rootId: undefined, threadId: undefined });
210+
expect(fetchMock).toHaveBeenCalledTimes(2);
211+
});
212+
213+
it('does not retry non-retryable 4xx delivery errors', async () => {
214+
const fetchMock = vi.fn().mockResolvedValue({
215+
ok: false,
216+
status: 400,
217+
statusText: 'Bad Request',
218+
headers: new Headers(),
219+
text: async () => 'invalid payload',
220+
});
221+
global.fetch = fetchMock as typeof fetch;
222+
const delivery = new FeishuApiDelivery({
223+
enabled: true,
224+
mode: 'live',
225+
tenantAccessToken: 'tenant_token_xxx',
226+
deliveryMaxRetries: 3,
227+
deliveryRetryBaseMs: 0,
228+
deliveryRetryMaxMs: 0,
229+
});
230+
231+
await expect(delivery.send(sampleMessage)).rejects.toThrow('Failed to send Feishu message: 400 Bad Request - invalid payload');
232+
expect(fetchMock).toHaveBeenCalledTimes(1);
233+
});
234+
235+
it('retries tenant token fetch on 5xx before sending', async () => {
236+
const fetchMock = vi.fn()
237+
.mockResolvedValueOnce({
238+
ok: false,
239+
status: 502,
240+
statusText: 'Bad Gateway',
241+
headers: new Headers(),
242+
text: async () => 'upstream error',
243+
})
244+
.mockResolvedValueOnce({
245+
ok: true,
246+
json: async () => ({ code: 0, msg: 'ok', tenant_access_token: 'tenant_token_retry', expire: 7200 }),
247+
})
248+
.mockResolvedValueOnce({
249+
ok: true,
250+
json: async () => ({ code: 0, msg: 'ok', data: { message_id: 'om_retry_token' } }),
251+
});
252+
global.fetch = fetchMock as typeof fetch;
253+
const delivery = new FeishuApiDelivery({
254+
enabled: true,
255+
mode: 'live',
256+
appId: 'cli_xxx',
257+
appSecret: 'sec_xxx',
258+
deliveryMaxRetries: 2,
259+
deliveryRetryBaseMs: 0,
260+
deliveryRetryMaxMs: 0,
261+
});
262+
263+
const receipt = await delivery.send(sampleMessage);
264+
265+
expect(receipt).toEqual({ messageId: 'om_retry_token', rootId: undefined, threadId: undefined });
266+
expect(fetchMock).toHaveBeenCalledTimes(3);
267+
});
160268
});

src/server/feishu-delivery.ts

Lines changed: 60 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,21 @@ function resolveMessagePayload(message: FeishuOutboundMessage): { msgType: 'text
8080
};
8181
}
8282

83+
function isRetryableStatus(status: number): boolean {
84+
return status === 429 || status >= 500;
85+
}
86+
87+
function parseRetryAfter(value: string | null): number | null {
88+
if (!value) return null;
89+
const seconds = Number.parseInt(value, 10);
90+
if (Number.isFinite(seconds) && seconds >= 0) return seconds * 1000;
91+
const until = Date.parse(value);
92+
if (Number.isFinite(until)) {
93+
return Math.max(0, until - Date.now());
94+
}
95+
return null;
96+
}
97+
8398
export class FeishuApiDelivery implements FeishuBridgeDelivery {
8499
private readonly config: FeishuBridgeConfig;
85100
private tokenCache: TokenCache | null = null;
@@ -92,6 +107,48 @@ export class FeishuApiDelivery implements FeishuBridgeDelivery {
92107
return this.config.enabled !== false && this.config.mode === 'live';
93108
}
94109

110+
private retryLimit(): number {
111+
return Math.max(0, this.config.deliveryMaxRetries ?? 2);
112+
}
113+
114+
private retryBaseMs(): number {
115+
return Math.max(0, this.config.deliveryRetryBaseMs ?? 300);
116+
}
117+
118+
private retryMaxMs(): number {
119+
return Math.max(this.retryBaseMs(), this.config.deliveryRetryMaxMs ?? 5_000);
120+
}
121+
122+
private async sleep(ms: number): Promise<void> {
123+
if (ms <= 0) return;
124+
await new Promise((resolve) => setTimeout(resolve, ms));
125+
}
126+
127+
private computeRetryDelay(attempt: number, retryAfterHeader: string | null): number {
128+
const hinted = parseRetryAfter(retryAfterHeader);
129+
if (hinted !== null) return Math.min(hinted, this.retryMaxMs());
130+
const delay = this.retryBaseMs() * (2 ** attempt);
131+
return Math.min(delay, this.retryMaxMs());
132+
}
133+
134+
private async fetchWithRetry(input: string, init: RequestInit): Promise<Response> {
135+
const maxRetries = this.retryLimit();
136+
137+
for (let attempt = 0; ; attempt += 1) {
138+
try {
139+
const response = await fetch(input, init);
140+
if (response.ok || !isRetryableStatus(response.status) || attempt >= maxRetries) {
141+
return response;
142+
}
143+
144+
await this.sleep(this.computeRetryDelay(attempt, response.headers.get('retry-after')));
145+
} catch (error) {
146+
if (attempt >= maxRetries) throw error;
147+
await this.sleep(this.computeRetryDelay(attempt, null));
148+
}
149+
}
150+
}
151+
95152
private async getTenantAccessToken(): Promise<string> {
96153
if (this.config.tenantAccessToken) return this.config.tenantAccessToken;
97154
if (this.tokenCache && this.tokenCache.expiresAt > Date.now() + 30_000) {
@@ -101,7 +158,7 @@ export class FeishuApiDelivery implements FeishuBridgeDelivery {
101158
throw new Error('Feishu live delivery requires tenantAccessToken or appId/appSecret');
102159
}
103160

104-
const response = await fetch(`${resolveApiBaseUrl(this.config)}/auth/v3/tenant_access_token/internal`, {
161+
const response = await this.fetchWithRetry(`${resolveApiBaseUrl(this.config)}/auth/v3/tenant_access_token/internal`, {
105162
method: 'POST',
106163
headers: {
107164
'Content-Type': 'application/json; charset=utf-8',
@@ -141,7 +198,7 @@ export class FeishuApiDelivery implements FeishuBridgeDelivery {
141198
const token = await this.getTenantAccessToken();
142199
const payload = resolveMessagePayload(message);
143200
const target = resolveReceiveTarget(message.target.chatId);
144-
const response = await fetch(`${resolveApiBaseUrl(this.config)}/im/v1/messages?receive_id_type=${target.receiveIdType}`, {
201+
const response = await this.fetchWithRetry(`${resolveApiBaseUrl(this.config)}/im/v1/messages?receive_id_type=${target.receiveIdType}`, {
145202
method: 'POST',
146203
headers: {
147204
'Content-Type': 'application/json; charset=utf-8',
@@ -185,7 +242,7 @@ export class FeishuApiDelivery implements FeishuBridgeDelivery {
185242

186243
const token = await this.getTenantAccessToken();
187244
const payload = resolveMessagePayload(message);
188-
const response = await fetch(`${resolveApiBaseUrl(this.config)}/im/v1/messages/${encodeURIComponent(messageId)}`, {
245+
const response = await this.fetchWithRetry(`${resolveApiBaseUrl(this.config)}/im/v1/messages/${encodeURIComponent(messageId)}`, {
189246
method: 'PATCH',
190247
headers: {
191248
'Content-Type': 'application/json; charset=utf-8',

0 commit comments

Comments
 (0)