-
Notifications
You must be signed in to change notification settings - Fork 71
284 lines (230 loc) · 15 KB
/
sentry-scanner.yml
File metadata and controls
284 lines (230 loc) · 15 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
name: Sentry Proactive Scanner
on:
schedule:
- cron: '0 * * * *'
workflow_dispatch:
permissions:
contents: read
issues: write
id-token: write
jobs:
scan:
runs-on: ubuntu-latest
timeout-minutes: 25
steps:
- name: Checkout
uses: actions/checkout@v4
with:
sparse-checkout: |
CLAUDE.md
run/
.github/
- name: Setup SSH
uses: webfactory/ssh-agent@v0.9.0
with:
ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
- name: Add known hosts
run: |
mkdir -p ~/.ssh
ssh-keyscan -H 157.90.154.200 >> ~/.ssh/known_hosts
- name: Claude Code - Scan
uses: anthropics/claude-code-action@v1
with:
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
claude_args: "--model claude-sonnet-4-20250514 --max-turns 60 --allowedTools Bash,Read,Glob,Grep"
show_full_output: true
prompt: |
You are a proactive Sentry scanner for the Ethernal project. Your job is to scan ALL unresolved Sentry issues (errors AND performance) and decide which ones need GitHub issues created.
Sentry base URL: https://sentry.tryethernal.com
Org: sentry
Backend project: ethernal-backend (id=2)
Frontend project: ethernal-frontend (id=3)
**IMPORTANT API NOTES for this Sentry instance (v26.2.1)**:
- Only `statsPeriod` values of `24h` and `14d` work. Do NOT use `1h`.
- Always save curl output to a temp file first, then process with jq. Do NOT pipe curl directly to jq — it fails intermittently.
- Pattern: `curl -s -o /tmp/resp.json "URL" -H "Authorization: Bearer $SENTRY_API_TOKEN" && jq '...' /tmp/resp.json`
- **CRITICAL — `count` is LIFETIME, not 24h.** The `count` field on each issue in the list response is the lifetime `times_seen` total, NOT the 24h rolling volume. Do NOT use it for thresholds or report it as "Events (24h)" — that caused a false-incident storm (issues #1218–#1232 were all closed as self-resolved because the scanner reported 6,000+ "24h events" when actual 24h volume was 2–16). The TRUE 24h volume is `([.stats."24h"[]?[1]] | add)` — sum of the hourly buckets in the `stats."24h"` array. Always compute and use this value.
## Step 1: Query Sentry for all unresolved issues
Run ALL 6 queries in a single bash command to save turns. Save each to a temp file:
```bash
AUTH="Authorization: Bearer $SENTRY_API_TOKEN"
BASE="https://sentry.tryethernal.com/api/0/projects/sentry"
curl -s -o /tmp/be_err.json "$BASE/ethernal-backend/issues/?query=is:unresolved+issue.category:error&statsPeriod=24h&limit=100" -H "$AUTH"
curl -s -o /tmp/be_perf.json "$BASE/ethernal-backend/issues/?query=is:unresolved+issue.category:performance&statsPeriod=24h&limit=100" -H "$AUTH"
curl -s -o /tmp/be_reg.json "$BASE/ethernal-backend/issues/?query=is:regressed&statsPeriod=24h&limit=50" -H "$AUTH"
curl -s -o /tmp/fe_err.json "$BASE/ethernal-frontend/issues/?query=is:unresolved+issue.category:error&statsPeriod=24h&limit=100" -H "$AUTH"
curl -s -o /tmp/fe_perf.json "$BASE/ethernal-frontend/issues/?query=is:unresolved+issue.category:performance&statsPeriod=24h&limit=100" -H "$AUTH"
curl -s -o /tmp/fe_reg.json "$BASE/ethernal-frontend/issues/?query=is:regressed&statsPeriod=24h&limit=50" -H "$AUTH"
# ALWAYS project events_24h (rolling) — never use raw `count` (lifetime) for decisions or reports.
PROJ='[.[] | {id, title, lifetime_count: (.count | tonumber), events_24h: ([.stats."24h"[]?[1]] | add // 0), lastSeen, shortId, isRegression: (.isRegression // false)}]'
echo "=== Backend Errors ===" && jq "$PROJ" /tmp/be_err.json
echo "=== Backend Performance ===" && jq "$PROJ" /tmp/be_perf.json
echo "=== Backend Regressed ===" && jq "$PROJ" /tmp/be_reg.json
echo "=== Frontend Errors ===" && jq "$PROJ" /tmp/fe_err.json
echo "=== Frontend Performance ===" && jq "$PROJ" /tmp/fe_perf.json
echo "=== Frontend Regressed ===" && jq "$PROJ" /tmp/fe_reg.json
```
From this point on, **always use `events_24h` (computed above) for thresholds and reporting**. The `lifetime_count` is informational only — useful as context ("issue has fired 6,953 times total but only 2 in last 24h") but never a decision input.
Deduplicate by `(project, id)` — regressed issues may appear in both the error/performance and regressed queries.
## Step 2: Filter already-tracked issues
In the SAME bash call, also fetch existing GitHub sentry issues:
```bash
gh issue list --label sentry --state all --limit 200 --json number,title,body -q '.[].body' > /tmp/gh_issues.txt
```
For each Sentry issue, check: `grep -c "issues/SENTRY_ID/" /tmp/gh_issues.txt`
Skip any issue that already has a GitHub issue (open or closed).
## Step 3: Correlate related issues into incidents
Before evaluating issues individually, **group them by error class**. Issues that share the same exception type (e.g., `SequelizeConnectionAcquireTimeoutError`, `SequelizeDatabaseError: query_wait_timeout`) AND have `lastSeen` within 30 minutes of each other are symptoms of a single incident, not independent bugs.
For each group of 3+ correlated issues:
- Create ONE umbrella GitHub issue (not one per symptom)
- Title: `Sentry (incident): [shared error class] across [N] endpoints`
- Body: list all affected Sentry issues, their stack traces, and event counts
- Add labels: `sentry`, `incident`, `needs-human`
- Do NOT create individual issues for the symptoms
- This signals to the auto-fix workflow that it should NOT attempt individual fixes
For groups of 1-2 issues, proceed to individual evaluation in Step 4.
## Step 4: Evaluate each new issue
For each new issue that passes the filter AND was not grouped into an incident above, fetch event context. Use temp files:
```bash
curl -s -o /tmp/event.json "https://sentry.tryethernal.com/api/0/issues/{id}/events/latest/" -H "Authorization: Bearer $SENTRY_API_TOKEN"
jq '{message: .message, tags: [.tags[]? | select(.key == "transaction" or .key == "url") | {key, value}], exception: .entries[0]?.data.values[0]?.stacktrace.frames[-3:]?}' /tmp/event.json
```
Then categorize into ONE of:
### AUTO-SKIP + RESOLVE in Sentry
- **Any issue with `events_24h == 0`** (regardless of lifetime count) — it has self-resolved, mark resolved in Sentry
- Connection/transient errors (SequelizeConnectionError, ECONNRESET, "Connection terminated unexpectedly") — UNLESS `events_24h >= 30`
- Rate limiting errors
- Expected validation errors (user input)
- Third-party service errors we can't control
- Low-impact edge cases (deprecated browser, obscure user input)
- Performance issues on endpoints that no longer exist
- Performance issues with `events_24h == 0` that are stale
To resolve: `curl -s -X PUT "https://sentry.tryethernal.com/api/0/issues/{id}/" -H "Authorization: Bearer $SENTRY_API_TOKEN" -H "Content-Type: application/json" -d '{"status": "resolved"}'`
### CREATE GITHUB ISSUE (prioritized) — all event counts below are `events_24h`, NOT lifetime
**Priority 1 — Regressions** (always create, regardless of event count):
- Any issue where `isRegression: true` AND `events_24h >= 1` — a previous fix didn't hold AND it's still firing
- Regressions with `events_24h == 0` go to AUTO-SKIP+RESOLVE — the regression flag is stale
- Title prefix: "Sentry (regression):" for errors, "Perf (regression):" for performance
**Priority 2 — High-impact errors**:
- Null/type errors, unhandled promise rejections with `events_24h >= 2`
- Systematic issues with `events_24h >= 5`
**Priority 3 — High-impact performance** (user-facing hot paths only):
- N+1 queries with `events_24h >= 50` AND on a user-facing endpoint or blockSync hot path
- Slow DB queries with `events_24h >= 50` AND p95 > 2s
**Priority 4 — Medium performance**:
- N+1 queries with `events_24h` 20–49 on user-facing endpoints
- Background job performance issues with `events_24h >= 100` (higher bar since they don't affect UX)
### SKIP (leave unresolved in Sentry)
- Non-regressed issues with `events_24h < 2` (might be one-off)
- Performance issues on admin/debug endpoints (e.g., /bull/*)
- Performance issues on background jobs with `events_24h < 100`
- Performance issues where the slow span is < 2s on a background job
- Issues where the slow span is an external API call we can't control
- Issues that need more data to evaluate
- Performance issues where multiple related Sentry issues point to the same code path — group them mentally and only create ONE issue for the root cause, not one per symptom
### AUTO-SKIP — BullMQ Redis ops misclassified as N+1
Sentry's N+1 detector occasionally flags BullMQ's normal Lua-script Redis operations as N+1 query patterns. These are NOT real N+1 bugs and should be auto-resolved.
**Detection — auto-resolve in Sentry without creating a GH issue if ALL of:**
1. Issue type is `N+1 Query` (or `issue.category:performance` with N+1 fingerprint)
2. The repeated span is a Redis op — `db.redis`, `cache.get`, or the span description contains `evalsha` / `evalSha`
3. The span description references a BullMQ key — matches `bull:`, `bullmq:`, or one of the BullMQ Lua script SHAs (`840cf612b9e4155aeb79853f3502814792769274` and similar 40-char hex SHAs paired with `bull:` keys)
Resolve these in Sentry with a short comment ("BullMQ Redis ops are not N+1 — scanner filter") and skip creating a GitHub issue. Issue #1221 was a recent example.
## Step 5: Create GitHub issues
For **error** issues:
```bash
gh issue create \
--title "Sentry: [error title]" \
--label "sentry" \
--label "[backend|frontend]" \
--body "## Sentry Error
**Project:** [ethernal-backend|ethernal-frontend]
**Level:** [error|warning]
**Events (24h):** [events_24h] ← rolling, used for thresholds
**Lifetime events:** [lifetime_count]
**Regression:** [Yes/No]
**Link:** https://sentry.tryethernal.com/organizations/sentry/issues/[ID]/
### Error
\`\`\`
[error message and key stack trace frames]
\`\`\`
### Context
[any relevant tags, transaction name, or URL]
---
*Created by Sentry Scanner*"
```
For **performance** issues:
```bash
gh issue create \
--title "Perf: [concise description]" \
--label "sentry" \
--label "performance" \
--label "[backend|frontend]" \
--body "## Performance Issue
**Project:** [ethernal-backend|ethernal-frontend]
**Type:** [N+1 Query | Slow DB | Slow Transaction | Regression | ...]
**Impact (24h):** [events_24h] events ← rolling, used for thresholds
**Lifetime events:** [lifetime_count]
**Regression:** [Yes/No]
**Transaction:** \`[transaction name]\`
**Link:** https://sentry.tryethernal.com/organizations/sentry/issues/[ID]/
### Problem
[Clear description of the bottleneck — what's slow and why]
### Suggested Fix
[Concrete suggestion: add eager loading, use Promise.all, add index, batch queries, etc.]
### Evidence
\`\`\`
[Key spans, query patterns, or timing data]
\`\`\`
---
*Created by Sentry Scanner*"
```
For regressions, prefix the title with "(regression)" e.g. `Sentry (regression): [title]` or `Perf (regression): [title]`.
**IMPORTANT: Stagger issue creation.** After each `gh issue create`, sleep 30 seconds before creating the next:
```bash
sleep 30
```
This prevents concurrent workflow storms.
**Limit to 3 issues per scan.** If more than 3 issues qualify, create only the top 3 by priority (regressions first, then by event count). The rest will be picked up in the next hourly scan.
## Step 6: Notify dashboard
For each issue created, notify the dashboard webhook:
```bash
curl -s -X POST "$APP_URL/webhooks/github-actions" \
-H "Authorization: Bearer $ETHERNAL_WEBHOOK_SECRET" \
-H "Content-Type: application/json" \
-d "{
\"githubIssueNumber\": ISSUE_NUMBER,
\"sentryIssueId\": \"SENTRY_ID\",
\"sentryProject\": \"PROJECT\",
\"sentryTitle\": \"TITLE\",
\"sentryLevel\": \"LEVEL\",
\"sentryEventCount\": EVENTS_24H,
\"sentryLink\": \"https://sentry.tryethernal.com/organizations/sentry/issues/SENTRY_ID/\",
\"status\": \"discovered\",
\"currentStep\": \"Discovered by scanner\"
}"
```
## Step 7: Print summary
At the end, print a summary:
```
=== Sentry Scanner Summary ===
Errors scanned: X
Performance issues scanned: Y
Regressions found: Z
GitHub issues created: A (B errors, C performance)
Auto-resolved: D
Skipped (already tracked): E
Skipped (not actionable): F
```
## Rules
- NEVER create duplicate GitHub issues — always check first
- Be conservative: only create issues for clear code bugs or significant performance problems
- When resolving in Sentry, always include a reason
- Keep issue descriptions concise but include enough for the auto-fix agent
- For N+1 queries, always identify the model/relation involved
- For slow queries, suggest specific indexes or query optimizations
- Regressions are ALWAYS high priority — create issues for them even with low event counts
env:
SENTRY_API_TOKEN: ${{ secrets.SENTRY_API_TOKEN }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
APP_URL: ${{ secrets.APP_URL }}
ETHERNAL_WEBHOOK_SECRET: ${{ secrets.ETHERNAL_WEBHOOK_SECRET }}