Skip to content

Commit fb88d4d

Browse files
author
bgagent
committed
fix(fanout): promote Slack reaction/delete network errors to error logs (aws-samples#79 review aws-samples#5)
The reaction / delete helpers (``addReaction``, ``removeReaction``, ``deleteMessage``) used to log every catch at warn with a single generic event key, lumping API-level rejections (e.g. ``no_reaction``) together with infrastructure failures (DNS lookup, TLS handshake, fetch timeout, JSON parse error from a hostile gateway). Operators who alarmed on the warn rate saw a flat signal that masked genuine infra problems. Split the boundary: - API-level (``!result.ok`` after a successful HTTP call) stays at warn with channel-specific event keys (``fanout.slack.reaction_add_api_error``, ``fanout.slack.reaction_remove_api_error``, ``fanout.slack.message_delete_api_error``). These are per-message UX problems; operators don't page. - Network errors (the outer ``catch (err)`` after ``fetch``) promote to ``logger.error`` with dedicated event keys (``fanout.slack.reaction_add_network_error``, ``fanout.slack.reaction_remove_network_error``, ``fanout.slack.message_delete_network_error``) and ``error_id``s (``FANOUT_SLACK_REACTION_NETWORK``, ``FANOUT_SLACK_DELETE_NETWORK``) so each has its own alarmable signal. User-visible symptoms when these fire silently: stale emoji reactions (hourglass never swaps to ✅) and orphaned intermediate messages. Behaviour unchanged: errors are still swallowed (per-message reactions and intermediate cleanup are best-effort by design; they must not fail the batch), but operators now get distinct metrics for each failure class.
1 parent a29c157 commit fb88d4d

1 file changed

Lines changed: 44 additions & 6 deletions

File tree

cdk/src/handlers/slack-notify.ts

Lines changed: 44 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -435,11 +435,25 @@ async function addReaction(botToken: string, channel: string, timestamp: string,
435435
});
436436
const result = await response.json() as { ok: boolean; error?: string };
437437
if (!result.ok && result.error !== 'already_reacted') {
438-
logger.warn('[fanout/slack] failed to add reaction', { emoji, error: result.error });
438+
// API-level rejection: per-message UX problem (channel locked,
439+
// emoji unknown). Stays at warn — operators don't page.
440+
logger.warn('[fanout/slack] failed to add reaction', {
441+
event: 'fanout.slack.reaction_add_api_error',
442+
emoji,
443+
error: result.error,
444+
});
439445
}
440446
} catch (err) {
441-
logger.warn('[fanout/slack] error adding reaction', {
447+
// Network / DNS / TLS / timeout / SyntaxError — infra class.
448+
// Promote to error with a dedicated event key so the rate of
449+
// network failures has its own alarmable signal, distinct from
450+
// API-level rejections (PR #79 review #5). User-visible symptom
451+
// when this fires unnoticed: stale ⏳ emoji never swaps to ✅.
452+
logger.error('[fanout/slack] network error adding reaction', {
453+
event: 'fanout.slack.reaction_add_network_error',
454+
error_id: 'FANOUT_SLACK_REACTION_NETWORK',
442455
emoji,
456+
error_name: err instanceof Error ? err.name : undefined,
443457
error: err instanceof Error ? err.message : String(err),
444458
});
445459
}
@@ -457,11 +471,21 @@ async function removeReaction(botToken: string, channel: string, timestamp: stri
457471
});
458472
const result = await response.json() as { ok: boolean; error?: string };
459473
if (!result.ok && result.error !== 'no_reaction') {
460-
logger.warn('[fanout/slack] failed to remove reaction', { emoji, error: result.error });
474+
logger.warn('[fanout/slack] failed to remove reaction', {
475+
event: 'fanout.slack.reaction_remove_api_error',
476+
emoji,
477+
error: result.error,
478+
});
461479
}
462480
} catch (err) {
463-
logger.warn('[fanout/slack] error removing reaction', {
481+
// See addReaction — network failures get their own ``error_id``
482+
// so operators can alarm on stale-emoji rate distinctly from
483+
// Slack API rejections.
484+
logger.error('[fanout/slack] network error removing reaction', {
485+
event: 'fanout.slack.reaction_remove_network_error',
486+
error_id: 'FANOUT_SLACK_REACTION_NETWORK',
464487
emoji,
488+
error_name: err instanceof Error ? err.name : undefined,
465489
error: err instanceof Error ? err.message : String(err),
466490
});
467491
}
@@ -490,10 +514,24 @@ async function deleteMessage(botToken: string, channel: string, messageTs: strin
490514
});
491515
const result = await response.json() as { ok: boolean; error?: string };
492516
if (!result.ok) {
493-
logger.warn('[fanout/slack] failed to delete intermediate message', { error: result.error });
517+
// ``message_not_found`` is benign (message already gone);
518+
// anything else (e.g. ``cant_delete_message``) leaves an
519+
// orphan in the thread that operators may want to alarm on.
520+
logger.warn('[fanout/slack] failed to delete intermediate message', {
521+
event: 'fanout.slack.message_delete_api_error',
522+
error: result.error,
523+
message_ts: messageTs,
524+
});
494525
}
495526
} catch (err) {
496-
logger.warn('[fanout/slack] error deleting intermediate message', {
527+
// Network failure → orphan message stays in the thread silently.
528+
// Promote to error so operators can alarm on the orphan rate
529+
// (PR #79 review #5).
530+
logger.error('[fanout/slack] network error deleting intermediate message', {
531+
event: 'fanout.slack.message_delete_network_error',
532+
error_id: 'FANOUT_SLACK_DELETE_NETWORK',
533+
message_ts: messageTs,
534+
error_name: err instanceof Error ? err.name : undefined,
497535
error: err instanceof Error ? err.message : String(err),
498536
});
499537
}

0 commit comments

Comments
 (0)