fix: add global setErrorHandler and harden getLiveUsercount against ClickHouse failures#1002
Conversation
…lickHouse failures
|
@calebcgates is attempting to deploy a commit to the goldflag's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThis PR adds input validation to the live-user endpoint, wraps its ClickHouse query/result processing in try/catch with logging and safe defaults, and registers global Fastify handlers for uncaught errors and 404 routes that return standardized JSON responses. ChangesError Handling Strategy
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
server/src/api/analytics/getLiveUsercount.ts (1)
12-13: ⚡ Quick winConsider adding Zod validation for query parameters.
While the try/catch block makes the current implementation safe, adding Zod validation before line 15 would provide better error responses for invalid input. For example,
Number("abc")producesNaN, which causes a ClickHouse error and returns a generic 500. With Zod, you could return a 400 with a clear message like "Invalid siteId or minutes parameter" before hitting the database.✨ Example Zod validation
+import { z } from "zod"; + +const querySchema = z.object({ + minutes: z.coerce.number().int().positive().default(5), +}); + +const paramsSchema = z.object({ + siteId: z.coerce.number().int().positive(), +}); + export const getLiveUsercount = async ( req: FastifyRequest<{ Params: { siteId: string }; Querystring: { minutes: number }; }>, res: FastifyReply ) => { - const { siteId } = req.params; - const { minutes } = req.query; + const parseParams = paramsSchema.safeParse(req.params); + const parseQuery = querySchema.safeParse(req.query); + + if (!parseParams.success || !parseQuery.success) { + return res.status(400).send({ error: "Invalid parameters" }); + } + + const { siteId } = parseParams.data; + const { minutes } = parseQuery.data; try {As per coding guidelines,
server/src/api/**/*.tsroute handlers should "Validate untrusted request bodies and query params with Zod before using them."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/api/analytics/getLiveUsercount.ts` around lines 12 - 13, Add Zod validation for the untrusted inputs before converting/parsing and calling the DB: define a Zod schema that validates req.params.siteId (string/nonempty) and req.query.minutes (optional number or numeric string), then use schema.parse or safeParse at the top of the handler (before the existing try/catch) to return a 400 with a clear message like "Invalid siteId or minutes parameter" on failure; after successful validation, assign the validated siteId and minutes (converted to number if needed) and continue using the existing logic that references the siteId and minutes variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@server/src/api/analytics/getLiveUsercount.ts`:
- Around line 12-13: Add Zod validation for the untrusted inputs before
converting/parsing and calling the DB: define a Zod schema that validates
req.params.siteId (string/nonempty) and req.query.minutes (optional number or
numeric string), then use schema.parse or safeParse at the top of the handler
(before the existing try/catch) to return a 400 with a clear message like
"Invalid siteId or minutes parameter" on failure; after successful validation,
assign the validated siteId and minutes (converted to number if needed) and
continue using the existing logic that references the siteId and minutes
variables.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0536934c-02d7-42b9-858a-77394aceb204
📒 Files selected for processing (2)
server/src/api/analytics/getLiveUsercount.tsserver/src/index.ts
Adds Zod schemas for params and query so invalid inputs return 400 with a structured error instead of falling through to the ClickHouse query and returning a generic 500. Matches the repo convention (server/src/api/user/updateAccountSettings.ts uses the same safeParse + 400-on-failure shape) and aligns with CLAUDE.md's stated backend stack (Fastify, Drizzle, ClickHouse, Zod). minutes now defaults to 5 inside the schema instead of in the query params object, so the default applies even if the query string omits the field.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
server/src/api/analytics/getLiveUsercount.ts (1)
24-29: ⚡ Quick winConsider showing both validation errors when both fail.
The current ternary logic shows only the
paramserror when both validations fail, which may confuse API consumers who need to fix multiple issues.♻️ Proposed improvement
if (!paramsValidation.success || !queryValidation.success) { + const errors: Record<string, unknown> = {}; + if (!paramsValidation.success) errors.params = paramsValidation.error; + if (!queryValidation.success) errors.query = queryValidation.error; + return res.status(400).send({ error: "Invalid request parameters", - details: paramsValidation.success ? queryValidation.error : paramsValidation.error, + details: errors, }); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/api/analytics/getLiveUsercount.ts` around lines 24 - 29, In the getLiveUsercount handler return path, the current ternary chooses only one validation error (paramsValidation or queryValidation); change the response to include both validation results when both fail by building a combined details object (e.g., { params: paramsValidation.error, query: queryValidation.error } or an array) instead of selecting one via the ternary, so that paramsValidation and queryValidation errors are both surfaced to the client.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/src/api/analytics/getLiveUsercount.ts`:
- Around line 15-18: The Fastify request typing is incorrect: update the req
type annotation so Params stays { siteId: string } but Querystring types reflect
pre-validation strings, e.g. make minutes optional as string (minutes?: string
or minutes: string | undefined) instead of number; adjust the FastifyRequest
generic where you see "Params" and "Querystring" so callers reading
req.query.minutes see the raw string/undefined (the zod schema's .default(5)
will handle coercion later).
---
Nitpick comments:
In `@server/src/api/analytics/getLiveUsercount.ts`:
- Around line 24-29: In the getLiveUsercount handler return path, the current
ternary chooses only one validation error (paramsValidation or queryValidation);
change the response to include both validation results when both fail by
building a combined details object (e.g., { params: paramsValidation.error,
query: queryValidation.error } or an array) instead of selecting one via the
ternary, so that paramsValidation and queryValidation errors are both surfaced
to the client.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d2e13a5f-0241-4720-9e1e-5e94d63a4e50
📒 Files selected for processing (1)
server/src/api/analytics/getLiveUsercount.ts
| req: FastifyRequest<{ | ||
| Params: { siteId: string }; | ||
| Querystring: { minutes: number }; | ||
| }>, |
There was a problem hiding this comment.
Fastify type annotations don't match the raw request shape.
The type annotations should reflect the pre-validation request types:
siteIdis correctly typed asstring(URL params are strings)minutesshould be typed asstring | undefinedor marked optional withminutes?: stringbecause:- Query parameters arrive as strings (or undefined) before zod coerces them
- The zod schema's
.default(5)makes this parameter optional
The current minutes: number suggests it's already a required number, which is misleading and reduces type safety for anyone accessing req.query.minutes directly.
🔧 Proposed fix
req: FastifyRequest<{
Params: { siteId: string };
- Querystring: { minutes: number };
+ Querystring: { minutes?: string };
}>,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/api/analytics/getLiveUsercount.ts` around lines 15 - 18, The
Fastify request typing is incorrect: update the req type annotation so Params
stays { siteId: string } but Querystring types reflect pre-validation strings,
e.g. make minutes optional as string (minutes?: string or minutes: string |
undefined) instead of number; adjust the FastifyRequest generic where you see
"Params" and "Querystring" so callers reading req.query.minutes see the raw
string/undefined (the zod schema's .default(5) will handle coercion later).
Summary
server/src/index.tsdoes not register a globalsetErrorHandlerorsetNotFoundHandler. When an async route handler throws (e.g. ClickHouse unreachable, Postgres connection drop), fastify's default error handler runs — and on routes that opt out of request logging vialogLevel: "silent"(currently/api/sites/:siteId/live-user-countatindex.ts:240and/api/healthatindex.ts:401), the error is never logged. The dashboard returns 500 to every open tab; the server logs are empty.This PR adds (1) a small global
setErrorHandlerthat logs every unhandled route error through the per-request pino logger withreqId/method/urlcontext, and (2) a defensive try/catch ongetLiveUsercount(the highest-traffic endpoint in the product) so the inner handler also surfaces ClickHouse failures explicitly.Why
getLiveUsercountis called every ~5s per open dashboard tab. Today it does:…with no try/catch and
logLevel: "silent"on the route. If ClickHouse is unreachable for any reason (transient network blip, OOM, query queue full), the await rejects, fastify falls through to its default error handler, returns 500, and no log line is emitted — silenced by the route-level log level. The dashboard goes dark for every user, butserver.logshows nothing. Same shape applies togetRetentionand any other handler without local try/catch.CLAUDE.mdline 19 explicitly states: "Error handling: Use try/catch blocks with specific error types". Most handlers (collectTelemetry.ts,recordSessionReplay.ts, etc.) already follow this;getLiveUsercount.tsis one of the gaps.The global
setErrorHandleris the DRY safety net for both today's gaps and any future handler that gets added without local try/catch —request.log.error(...)always logs aterrorseverity regardless of route-levellogLevel.Solution
Two small additions:
server.setErrorHandler(...)andserver.setNotFoundHandler(...)immediately after theFastify({...})constructor inserver/src/index.ts. Routes errors throughrequest.log.error(...)with structured context. Returns generic"Internal Server Error"for 5xx (no leakage of internal error messages) and the original message for 4xx.clickhouse.query(...)inserver/src/api/analytics/getLiveUsercount.tsmatching the existing pattern inserver/src/api/admin/collectTelemetry.ts. Also adds an?? 0fallback for the empty-result-set edge case.Testing
Reproduced manually: stopped local ClickHouse, observed
/api/sites/1/live-user-countreturning 500 every 5s with no log output underlogLevel: "silent". After the fix: same 500 response, but logs now show structured error lines from both the inner handler and the global handler with full request context.Happy to add a vitest case if you'd like one —
server/src/utils.test.tsandserver/src/services/import/mappers/plausible.test.tsshow the existing pattern. Left it out by default to keep the PR small.Scope
Two files, +24 / -2 lines, no new imports, no new dependencies, no schema changes. Does not touch any of the 60+ route handlers (the global
setErrorHandlercovers them automatically).Pre-flight
cd server && npx tsc --noEmitcleancd server && npx tscbuilds cleancd server && npx prettier --check src/index.ts src/api/analytics/getLiveUsercount.tspassespackage.jsonchangesfix: ...conventional-commit form (matches Fix invalid empty array literal in identifyService traits upsert #976, fix: set high API key rate limit for self-hosted instances #977)(No issue — happy to file one if you'd prefer.)
Summary by CodeRabbit