Skip to content

Commit 3633fee

Browse files
Improve sustained alert aggregation
Improve sustained alert aggregation - add rate-aware early handoff from ramp to sustained mode - add periodCount metadata and clearer sustained formatter output - update defaults, docs, and release metadata for quieter long-running incidents
1 parent 18c9bae commit 3633fee

19 files changed

Lines changed: 176 additions & 39 deletions
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
'@iqai/alert-logger': minor
3+
---
4+
5+
Make sustained alerting quieter and more informative by:
6+
7+
- adding rate-aware early handoff from ramp to sustained mode
8+
- changing the default sustained update interval from 5 minutes to 15 minutes
9+
- adding `aggregation.periodCount` for per-update deltas while keeping `suppressedSince` for compatibility
10+
- exposing `aggregation.rampExitRatePerSecond` and `aggregation.rampExitRateWindowMs` configuration knobs
11+
- updating sustained formatter output to show both per-period and total counts

README.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Stop drowning in alert storms. `@iqai/alert-logger` groups repeated errors using
77
## ✨ Features
88

99
- **Unified API**`logger.error('msg', error, { fields })` routes to every configured adapter
10-
- **Exponential suppression** — alerts fire at 1, 2, 4, 8, 16, 32, 64... then switch to periodic digests
10+
- **Rate-aware suppression** — alerts ramp quickly, then switch to quieter periodic updates when an incident is clearly ongoing
1111
- **Resolution detection** — get a "resolved" message when an error stops occurring
1212
- **Error fingerprinting** — same bug from different requests groups automatically (strips IDs, timestamps, UUIDs)
1313
- **Multi-channel routing** — route by severity level or custom tags to different channels
@@ -152,12 +152,14 @@ When the same error fires repeatedly, the library doesn't spam your channel:
152152
| Phase | Trigger | What gets sent |
153153
|-------|---------|----------------|
154154
| **Onset** | 1st occurrence | Full alert with stack trace, fields, tags |
155-
| **Ramp** | 2nd, 4th, 8th, 16th, 32nd, 64th | Compact: `"Payment failed (x8 — 4 suppressed)"` |
156-
| **Sustained** | >64 in window | Digest every 5min: `"x4,812 in last 5m"` |
155+
| **Ramp** | 2nd, 4th, 8th, 16th, 32nd, 64th until rate/count handoff | Compact: `"Payment failed (x8 — 4 suppressed)"` |
156+
| **Sustained** | >64 total, or current rate crosses threshold after at least one ramp alert | Digest every 15min: `"x37 since last update · x412 total"` |
157157
| **Resolution** | 0 hits for 2min | `"Resolved: Payment failed — 12,847 total over 23m"` |
158158

159159
Errors are grouped by **fingerprint** — the library strips variable parts (IDs, timestamps, UUIDs, hex addresses) from the error message and hashes it with the top stack frames. Same bug, different request = same group.
160160

161+
By default, the rate check uses a 1-minute sliding window and exits ramp early at `0.5` events/sec after the first ramp checkpoint has been sent.
162+
161163
## 🌍 Per-Environment Config
162164

163165
Same codebase, different behavior per environment. Dev won't bug you as much as prod:
@@ -169,15 +171,19 @@ AlertLogger.init({
169171
environments: {
170172
production: {
171173
levels: ['warning', 'critical'],
172-
aggregation: { digestIntervalMs: 5 * 60_000 },
174+
aggregation: { digestIntervalMs: 15 * 60_000 },
173175
},
174176
staging: {
175177
levels: ['critical'], // only errors, no warnings
176178
aggregation: { digestIntervalMs: 15 * 60_000 },
177179
},
178180
development: {
179181
levels: ['critical'],
180-
aggregation: { rampThreshold: 8, digestIntervalMs: 30 * 60_000 },
182+
aggregation: {
183+
rampThreshold: 8,
184+
rampExitRatePerSecond: 0.25,
185+
digestIntervalMs: 30 * 60_000,
186+
},
181187
},
182188
},
183189
})
@@ -279,8 +285,10 @@ AlertLogger.init({
279285

280286
// Aggregation tuning
281287
aggregation: {
282-
rampThreshold: 64, // switch from ramp to digest phase
283-
digestIntervalMs: 5 * 60_000, // how often to send digests
288+
rampThreshold: 64, // count-based handoff into sustained mode
289+
rampExitRatePerSecond: 0.5, // early sustained handoff after a ramp alert
290+
rampExitRateWindowMs: 60_000, // sliding window used for current-rate calculation
291+
digestIntervalMs: 15 * 60_000, // how often to send sustained updates
284292
resolutionCooldownMs: 2 * 60_000, // silence before "resolved"
285293
},
286294

src/adapters/console/console-adapter.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,9 @@ export class ConsoleAdapter implements AlertAdapter {
5656
lines.push(` fields: ${pairs}`)
5757
}
5858

59-
lines.push(` count: ${aggregation.count} | phase: ${aggregation.phase}`)
59+
lines.push(
60+
` count: ${aggregation.count} | periodCount: ${aggregation.periodCount} | phase: ${aggregation.phase}`,
61+
)
6062

6163
return lines.join('\n')
6264
}
@@ -72,6 +74,7 @@ export class ConsoleAdapter implements AlertAdapter {
7274
aggregation: {
7375
phase: alert.aggregation.phase,
7476
count: alert.aggregation.count,
77+
periodCount: alert.aggregation.periodCount,
7578
},
7679
})
7780
}

src/adapters/discord/discord-adapter.test.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ function makeAlert(overrides: Partial<FormattedAlert> = {}): FormattedAlert {
1515
phase: 'onset',
1616
fingerprint: 'abc123',
1717
count: 1,
18+
periodCount: 0,
1819
suppressedSince: 0,
1920
firstSeen: Date.now(),
2021
lastSeen: Date.now(),

src/adapters/discord/formatter.test.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ function makeAlert(overrides: Partial<FormattedAlert> = {}): FormattedAlert {
1414
phase: 'onset',
1515
fingerprint: 'abc123',
1616
count: 1,
17+
periodCount: 0,
1718
suppressedSince: 0,
1819
firstSeen: Date.now(),
1920
lastSeen: Date.now(),
@@ -86,6 +87,7 @@ describe('formatDiscordEmbed', () => {
8687
phase: 'ramp',
8788
fingerprint: 'abc123',
8889
count: 10,
90+
periodCount: 5,
8991
suppressedSince: 5,
9092
firstSeen: Date.now(),
9193
lastSeen: Date.now(),
@@ -106,6 +108,7 @@ describe('formatDiscordEmbed', () => {
106108
phase: 'sustained',
107109
fingerprint: 'abc123',
108110
count: 200,
111+
periodCount: 37,
109112
suppressedSince: 0,
110113
firstSeen: Date.now(),
111114
lastSeen: Date.now(),
@@ -114,7 +117,8 @@ describe('formatDiscordEmbed', () => {
114117
})
115118
const embed = formatDiscordEmbed(alert)
116119

117-
expect(embed.title).toContain('x200')
120+
expect(embed.title).toContain('x37 since last update')
121+
expect(embed.title).toContain('x200 total')
118122
expect(embed.title).toContain('peak rate: 3.7/s')
119123
})
120124
})
@@ -127,6 +131,7 @@ describe('formatDiscordEmbed', () => {
127131
phase: 'resolution',
128132
fingerprint: 'abc123',
129133
count: 50,
134+
periodCount: 0,
130135
suppressedSince: 0,
131136
firstSeen: now - 3_600_000,
132137
lastSeen: now,
@@ -146,6 +151,7 @@ describe('formatDiscordEmbed', () => {
146151
phase: 'resolution',
147152
fingerprint: 'abc123',
148153
count: 1,
154+
periodCount: 0,
149155
suppressedSince: 0,
150156
firstSeen: Date.now(),
151157
lastSeen: Date.now(),

src/adapters/discord/formatter.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ export function formatDiscordEmbed(alert: FormattedAlert): DiscordEmbed {
9494

9595
case 'sustained': {
9696
const title = truncate(
97-
`${badge} [${alert.level.toUpperCase()}] ${safeTitle} (x${aggregation.count} in last digest period \u00B7 peak rate: ${aggregation.peakRate.toFixed(1)}/s)`,
97+
`${badge} [${alert.level.toUpperCase()}] ${safeTitle} (x${aggregation.periodCount} since last update \u00B7 x${aggregation.count} total \u00B7 peak rate: ${aggregation.peakRate.toFixed(1)}/s)`,
9898
256,
9999
)
100100

src/adapters/slack/formatter.test.ts

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ function makeAlert(overrides: Partial<FormattedAlert> = {}): FormattedAlert {
1414
phase: 'onset',
1515
fingerprint: 'abc123',
1616
count: 1,
17+
periodCount: 0,
1718
suppressedSince: 0,
1819
firstSeen: Date.now(),
1920
lastSeen: Date.now(),
@@ -97,6 +98,7 @@ describe('formatSlackPayload', () => {
9798
phase: 'ramp',
9899
fingerprint: 'abc123',
99100
count: 10,
101+
periodCount: 5,
100102
suppressedSince: 5,
101103
firstSeen: Date.now(),
102104
lastSeen: Date.now(),
@@ -116,6 +118,7 @@ describe('formatSlackPayload', () => {
116118
phase: 'resolution',
117119
fingerprint: 'abc123',
118120
count: 1,
121+
periodCount: 0,
119122
suppressedSince: 0,
120123
firstSeen: Date.now(),
121124
lastSeen: Date.now(),
@@ -137,6 +140,7 @@ describe('formatSlackPayload', () => {
137140
phase: 'ramp',
138141
fingerprint: 'abc123',
139142
count: 10,
143+
periodCount: 5,
140144
suppressedSince: 5,
141145
firstSeen: Date.now(),
142146
lastSeen: Date.now(),
@@ -158,6 +162,7 @@ describe('formatSlackPayload', () => {
158162
phase: 'sustained',
159163
fingerprint: 'abc123',
160164
count: 200,
165+
periodCount: 37,
161166
suppressedSince: 0,
162167
firstSeen: Date.now(),
163168
lastSeen: Date.now(),
@@ -167,7 +172,8 @@ describe('formatSlackPayload', () => {
167172
const payload = formatSlackPayload(alert)
168173

169174
const header = payload.attachments[0].blocks[0]
170-
expect(header.text?.text).toContain('x200')
175+
expect(header.text?.text).toContain('x37 since last update')
176+
expect(header.text?.text).toContain('x200 total')
171177
expect(header.text?.text).toContain('peak: 3.7/s')
172178
})
173179
})
@@ -180,6 +186,7 @@ describe('formatSlackPayload', () => {
180186
phase: 'resolution',
181187
fingerprint: 'abc123',
182188
count: 50,
189+
periodCount: 0,
183190
suppressedSince: 0,
184191
firstSeen: now - 3_600_000,
185192
lastSeen: now,
@@ -200,6 +207,7 @@ describe('formatSlackPayload', () => {
200207
phase: 'resolution',
201208
fingerprint: 'abc123',
202209
count: 1,
210+
periodCount: 0,
203211
suppressedSince: 0,
204212
firstSeen: Date.now(),
205213
lastSeen: Date.now(),

src/adapters/slack/formatter.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ export function formatSlackPayload(alert: FormattedAlert): SlackPayload {
105105

106106
case 'sustained': {
107107
const title = truncate(
108-
`${badge} [${alert.level.toUpperCase()}] ${alert.title} (x${aggregation.count} \u00B7 peak: ${aggregation.peakRate.toFixed(1)}/s)`,
108+
`${badge} [${alert.level.toUpperCase()}] ${alert.title} (x${aggregation.periodCount} since last update \u00B7 x${aggregation.count} total \u00B7 peak: ${aggregation.peakRate.toFixed(1)}/s)`,
109109
150,
110110
)
111111

src/adapters/slack/slack-adapter.test.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ function makeAlert(overrides: Partial<FormattedAlert> = {}): FormattedAlert {
1515
phase: 'onset',
1616
fingerprint: 'abc123',
1717
count: 1,
18+
periodCount: 0,
1819
suppressedSince: 0,
1920
firstSeen: Date.now(),
2021
lastSeen: Date.now(),

src/adapters/telegram/formatter.test.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ function makeAlert(overrides: Partial<FormattedAlert> = {}): FormattedAlert {
1515
phase: 'onset',
1616
fingerprint: 'abc123',
1717
count: 1,
18+
periodCount: 0,
1819
suppressedSince: 0,
1920
firstSeen: Date.now(),
2021
lastSeen: Date.now(),
@@ -81,6 +82,7 @@ describe('formatTelegramMessage', () => {
8182
phase: 'ramp',
8283
fingerprint: 'abc123',
8384
count: 10,
85+
periodCount: 5,
8486
suppressedSince: 5,
8587
firstSeen: Date.now(),
8688
lastSeen: Date.now(),
@@ -101,6 +103,7 @@ describe('formatTelegramMessage', () => {
101103
phase: 'sustained',
102104
fingerprint: 'abc123',
103105
count: 200,
106+
periodCount: 37,
104107
suppressedSince: 0,
105108
firstSeen: Date.now(),
106109
lastSeen: Date.now(),
@@ -109,7 +112,8 @@ describe('formatTelegramMessage', () => {
109112
})
110113
const msg = formatTelegramMessage(alert)
111114

112-
expect(msg).toContain('x200')
115+
expect(msg).toContain('x37 since last update')
116+
expect(msg).toContain('x200 total')
113117
expect(msg).toContain('peak: 3.7/s')
114118
})
115119
})
@@ -122,6 +126,7 @@ describe('formatTelegramMessage', () => {
122126
phase: 'resolution',
123127
fingerprint: 'abc123',
124128
count: 50,
129+
periodCount: 0,
125130
suppressedSince: 0,
126131
firstSeen: now - 3_600_000,
127132
lastSeen: now,
@@ -142,6 +147,7 @@ describe('formatTelegramMessage', () => {
142147
phase: 'resolution',
143148
fingerprint: 'abc123',
144149
count: 1,
150+
periodCount: 0,
145151
suppressedSince: 0,
146152
firstSeen: Date.now(),
147153
lastSeen: Date.now(),

0 commit comments

Comments
 (0)