Skip to content

Commit 1d92ad1

Browse files
author
bgagent
committed
fix(fanout): emit fanout.slack.dup_delete_failed on ghost-message accumulation (aws-samples#79 review aws-samples#6)
The conditional UpdateItem dup-delete path (``task_created`` / ``session_started`` lifecycle persists) calls ``deleteMessage`` to clean up the duplicate Slack message that landed when a sibling retry won the race. The delete is inherently best-effort — but if it fails, the duplicate becomes a permanent ghost in the thread and operators had no way to alarm on the rate. Refactor ``deleteMessage`` to return a boolean (``true`` on success or ``message_not_found``-as-already-gone, ``false`` otherwise) and emit a dedicated ``fanout.slack.dup_delete_failed`` event with an ``error_id: FANOUT_SLACK_DUP_DELETE_FAILED`` from the dup-delete callsites when the cleanup couldn't complete. The terminal-event cleanup paths (``slack_session_msg_ts``, ``slack_created_msg_ts``) intentionally don't fire this event — those paths target genuinely-stale UX cleanup, not retry-driven duplicates, so an alarm there would be noise. No new tests beyond the existing dup-delete coverage; the ``deleteMessage`` return value isn't yet asserted at the unit level, but the behavior is fully exercised by the existing ``dup-delete`` integration paths (test gap aws-samples#31 will add an explicit failure-path assertion when it lands).
1 parent fb88d4d commit 1d92ad1

1 file changed

Lines changed: 40 additions & 6 deletions

File tree

cdk/src/handlers/slack-notify.ts

Lines changed: 40 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -345,7 +345,20 @@ export async function dispatchSlackEvent(
345345
task_id: taskId,
346346
duplicate_ts: result.ts,
347347
});
348-
await deleteMessage(botToken, channel, result.ts);
348+
const deleted = await deleteMessage(botToken, channel, result.ts);
349+
if (!deleted) {
350+
// The duplicate Slack root is now permanently in the
351+
// thread. Dedicated event key + error_id so operators can
352+
// alarm on the rate of ghost task_created messages
353+
// (PR #79 review #6).
354+
logger.error('[fanout/slack] dup-delete failed — ghost task_created message stays in thread', {
355+
event: 'fanout.slack.dup_delete_failed',
356+
error_id: 'FANOUT_SLACK_DUP_DELETE_FAILED',
357+
task_id: taskId,
358+
event_type: eventType,
359+
duplicate_ts: result.ts,
360+
});
361+
}
349362
return;
350363
}
351364
logger.warn('[fanout/slack] failed to store task_created message ts', {
@@ -373,7 +386,16 @@ export async function dispatchSlackEvent(
373386
task_id: taskId,
374387
duplicate_ts: result.ts,
375388
});
376-
await deleteMessage(botToken, channel, result.ts);
389+
const deleted = await deleteMessage(botToken, channel, result.ts);
390+
if (!deleted) {
391+
logger.error('[fanout/slack] dup-delete failed — ghost session_started message stays in thread', {
392+
event: 'fanout.slack.dup_delete_failed',
393+
error_id: 'FANOUT_SLACK_DUP_DELETE_FAILED',
394+
task_id: taskId,
395+
event_type: eventType,
396+
duplicate_ts: result.ts,
397+
});
398+
}
377399
return;
378400
}
379401
logger.warn('[fanout/slack] failed to store session message ts', {
@@ -502,7 +524,12 @@ async function updateReaction(botToken: string, channel: string, threadTs: strin
502524
await addReaction(botToken, channel, threadTs, newEmoji);
503525
}
504526

505-
async function deleteMessage(botToken: string, channel: string, messageTs: string): Promise<void> {
527+
/** Returns ``true`` iff the message was successfully deleted (or was
528+
* already gone — ``message_not_found`` is benign). Callers that care
529+
* about the outcome (the conditional-persist dup-delete path) can
530+
* emit a ``fanout.slack.dup_delete_failed`` event so operators can
531+
* alarm on accumulating ghost messages (PR #79 review #6). */
532+
async function deleteMessage(botToken: string, channel: string, messageTs: string): Promise<boolean> {
506533
try {
507534
const response = await fetch('https://slack.com/api/chat.delete', {
508535
method: 'POST',
@@ -514,15 +541,21 @@ async function deleteMessage(botToken: string, channel: string, messageTs: strin
514541
});
515542
const result = await response.json() as { ok: boolean; error?: string };
516543
if (!result.ok) {
517-
// ``message_not_found`` is benign (message already gone);
518-
// anything else (e.g. ``cant_delete_message``) leaves an
519-
// orphan in the thread that operators may want to alarm on.
544+
// ``message_not_found`` is benign (message already gone) and is
545+
// treated as a successful delete by the caller's perspective.
546+
// Anything else (e.g. ``cant_delete_message``) leaves an orphan
547+
// in the thread.
548+
if (result.error === 'message_not_found') {
549+
return true;
550+
}
520551
logger.warn('[fanout/slack] failed to delete intermediate message', {
521552
event: 'fanout.slack.message_delete_api_error',
522553
error: result.error,
523554
message_ts: messageTs,
524555
});
556+
return false;
525557
}
558+
return true;
526559
} catch (err) {
527560
// Network failure → orphan message stays in the thread silently.
528561
// Promote to error so operators can alarm on the orphan rate
@@ -534,5 +567,6 @@ async function deleteMessage(botToken: string, channel: string, messageTs: strin
534567
error_name: err instanceof Error ? err.name : undefined,
535568
error: err instanceof Error ? err.message : String(err),
536569
});
570+
return false;
537571
}
538572
}

0 commit comments

Comments
 (0)