Skip to content

fix: add global setErrorHandler and harden getLiveUsercount against ClickHouse failures#1002

Open
calebcgates wants to merge 2 commits into
rybbit-io:masterfrom
calebcgates:fix/server-add-setErrorHandler-and-handler-hardening
Open

fix: add global setErrorHandler and harden getLiveUsercount against ClickHouse failures#1002
calebcgates wants to merge 2 commits into
rybbit-io:masterfrom
calebcgates:fix/server-add-setErrorHandler-and-handler-hardening

Conversation

@calebcgates
Copy link
Copy Markdown

@calebcgates calebcgates commented May 20, 2026

Summary

server/src/index.ts does not register a global setErrorHandler or setNotFoundHandler. When an async route handler throws (e.g. ClickHouse unreachable, Postgres connection drop), fastify's default error handler runs — and on routes that opt out of request logging via logLevel: "silent" (currently /api/sites/:siteId/live-user-count at index.ts:240 and /api/health at index.ts:401), the error is never logged. The dashboard returns 500 to every open tab; the server logs are empty.

This PR adds (1) a small global setErrorHandler that logs every unhandled route error through the per-request pino logger with reqId/method/url context, and (2) a defensive try/catch on getLiveUsercount (the highest-traffic endpoint in the product) so the inner handler also surfaces ClickHouse failures explicitly.

Why

getLiveUsercount is called every ~5s per open dashboard tab. Today it does:

const query = await clickhouse.query({ ... });
const result = await processResults<{ count: number }>(query);
return res.send({ count: result[0].count });

…with no try/catch and logLevel: "silent" on the route. If ClickHouse is unreachable for any reason (transient network blip, OOM, query queue full), the await rejects, fastify falls through to its default error handler, returns 500, and no log line is emitted — silenced by the route-level log level. The dashboard goes dark for every user, but server.log shows nothing. Same shape applies to getRetention and any other handler without local try/catch.

CLAUDE.md line 19 explicitly states: "Error handling: Use try/catch blocks with specific error types". Most handlers (collectTelemetry.ts, recordSessionReplay.ts, etc.) already follow this; getLiveUsercount.ts is one of the gaps.

The global setErrorHandler is the DRY safety net for both today's gaps and any future handler that gets added without local try/catch — request.log.error(...) always logs at error severity regardless of route-level logLevel.

Solution

Two small additions:

  1. server.setErrorHandler(...) and server.setNotFoundHandler(...) immediately after the Fastify({...}) constructor in server/src/index.ts. Routes errors through request.log.error(...) with structured context. Returns generic "Internal Server Error" for 5xx (no leakage of internal error messages) and the original message for 4xx.
  2. try/catch around clickhouse.query(...) in server/src/api/analytics/getLiveUsercount.ts matching the existing pattern in server/src/api/admin/collectTelemetry.ts. Also adds an ?? 0 fallback for the empty-result-set edge case.

Testing

Reproduced manually: stopped local ClickHouse, observed /api/sites/1/live-user-count returning 500 every 5s with no log output under logLevel: "silent". After the fix: same 500 response, but logs now show structured error lines from both the inner handler and the global handler with full request context.

Happy to add a vitest case if you'd like one — server/src/utils.test.ts and server/src/services/import/mappers/plausible.test.ts show the existing pattern. Left it out by default to keep the PR small.

Scope

Two files, +24 / -2 lines, no new imports, no new dependencies, no schema changes. Does not touch any of the 60+ route handlers (the global setErrorHandler covers them automatically).

Pre-flight

(No issue — happy to file one if you'd prefer.)

Summary by CodeRabbit

  • Bug Fixes
    • Improved validation and error handling for live user count requests, adding safe defaults when no data is returned and clearer error responses on failure.
    • Added global API error and 404 handlers to provide consistent JSON error responses and include request context for not-found cases.

Review Change Stack

@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

@calebcgates is attempting to deploy a commit to the goldflag's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

This PR adds input validation to the live-user endpoint, wraps its ClickHouse query/result processing in try/catch with logging and safe defaults, and registers global Fastify handlers for uncaught errors and 404 routes that return standardized JSON responses.

Changes

Error Handling Strategy

Layer / File(s) Summary
Endpoint validation and error handling
server/src/api/analytics/getLiveUsercount.ts
Validates siteId and minutes with zod and returns HTTP 400 on validation failure; wraps the ClickHouse query and result processing in try/catch, returns { count: result[0]?.count ?? 0 } on success, logs errors with siteId, and returns HTTP 500 with a fixed error payload on failure.
Global Fastify error and 404 handlers
server/src/index.ts
Adds setErrorHandler to log unhandled route errors with request method/URL/ids and return standardized JSON errors (uses error.statusCode for 4xx, otherwise 500 with generic message), and setNotFoundHandler to return a JSON 404 payload including the requested path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I checked the params, then chased the count,
Through queries and errors the logs mount.
A handler for missing, a guard for the rest,
Now responses are tidy — the server's well-dressed. 🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: adding a global setErrorHandler and hardening getLiveUsercount against ClickHouse failures, which align with the core objectives of the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
server/src/api/analytics/getLiveUsercount.ts (1)

12-13: ⚡ Quick win

Consider adding Zod validation for query parameters.

While the try/catch block makes the current implementation safe, adding Zod validation before line 15 would provide better error responses for invalid input. For example, Number("abc") produces NaN, which causes a ClickHouse error and returns a generic 500. With Zod, you could return a 400 with a clear message like "Invalid siteId or minutes parameter" before hitting the database.

✨ Example Zod validation
+import { z } from "zod";
+
+const querySchema = z.object({
+  minutes: z.coerce.number().int().positive().default(5),
+});
+
+const paramsSchema = z.object({
+  siteId: z.coerce.number().int().positive(),
+});
+
 export const getLiveUsercount = async (
   req: FastifyRequest<{
     Params: { siteId: string };
     Querystring: { minutes: number };
   }>,
   res: FastifyReply
 ) => {
-  const { siteId } = req.params;
-  const { minutes } = req.query;
+  const parseParams = paramsSchema.safeParse(req.params);
+  const parseQuery = querySchema.safeParse(req.query);
+  
+  if (!parseParams.success || !parseQuery.success) {
+    return res.status(400).send({ error: "Invalid parameters" });
+  }
+  
+  const { siteId } = parseParams.data;
+  const { minutes } = parseQuery.data;

   try {

As per coding guidelines, server/src/api/**/*.ts route handlers should "Validate untrusted request bodies and query params with Zod before using them."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/api/analytics/getLiveUsercount.ts` around lines 12 - 13, Add Zod
validation for the untrusted inputs before converting/parsing and calling the
DB: define a Zod schema that validates req.params.siteId (string/nonempty) and
req.query.minutes (optional number or numeric string), then use schema.parse or
safeParse at the top of the handler (before the existing try/catch) to return a
400 with a clear message like "Invalid siteId or minutes parameter" on failure;
after successful validation, assign the validated siteId and minutes (converted
to number if needed) and continue using the existing logic that references the
siteId and minutes variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/src/api/analytics/getLiveUsercount.ts`:
- Around line 12-13: Add Zod validation for the untrusted inputs before
converting/parsing and calling the DB: define a Zod schema that validates
req.params.siteId (string/nonempty) and req.query.minutes (optional number or
numeric string), then use schema.parse or safeParse at the top of the handler
(before the existing try/catch) to return a 400 with a clear message like
"Invalid siteId or minutes parameter" on failure; after successful validation,
assign the validated siteId and minutes (converted to number if needed) and
continue using the existing logic that references the siteId and minutes
variables.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0536934c-02d7-42b9-858a-77394aceb204

📥 Commits

Reviewing files that changed from the base of the PR and between 08a3e20 and 7a96f48.

📒 Files selected for processing (2)
  • server/src/api/analytics/getLiveUsercount.ts
  • server/src/index.ts

Adds Zod schemas for params and query so invalid inputs return 400 with
a structured error instead of falling through to the ClickHouse query
and returning a generic 500. Matches the repo convention
(server/src/api/user/updateAccountSettings.ts uses the same safeParse +
400-on-failure shape) and aligns with CLAUDE.md's stated backend stack
(Fastify, Drizzle, ClickHouse, Zod).

minutes now defaults to 5 inside the schema instead of in the query
params object, so the default applies even if the query string omits
the field.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
server/src/api/analytics/getLiveUsercount.ts (1)

24-29: ⚡ Quick win

Consider showing both validation errors when both fail.

The current ternary logic shows only the params error when both validations fail, which may confuse API consumers who need to fix multiple issues.

♻️ Proposed improvement
  if (!paramsValidation.success || !queryValidation.success) {
+   const errors: Record<string, unknown> = {};
+   if (!paramsValidation.success) errors.params = paramsValidation.error;
+   if (!queryValidation.success) errors.query = queryValidation.error;
+
    return res.status(400).send({
      error: "Invalid request parameters",
-     details: paramsValidation.success ? queryValidation.error : paramsValidation.error,
+     details: errors,
    });
  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/api/analytics/getLiveUsercount.ts` around lines 24 - 29, In the
getLiveUsercount handler return path, the current ternary chooses only one
validation error (paramsValidation or queryValidation); change the response to
include both validation results when both fail by building a combined details
object (e.g., { params: paramsValidation.error, query: queryValidation.error }
or an array) instead of selecting one via the ternary, so that paramsValidation
and queryValidation errors are both surfaced to the client.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/src/api/analytics/getLiveUsercount.ts`:
- Around line 15-18: The Fastify request typing is incorrect: update the req
type annotation so Params stays { siteId: string } but Querystring types reflect
pre-validation strings, e.g. make minutes optional as string (minutes?: string
or minutes: string | undefined) instead of number; adjust the FastifyRequest
generic where you see "Params" and "Querystring" so callers reading
req.query.minutes see the raw string/undefined (the zod schema's .default(5)
will handle coercion later).

---

Nitpick comments:
In `@server/src/api/analytics/getLiveUsercount.ts`:
- Around line 24-29: In the getLiveUsercount handler return path, the current
ternary chooses only one validation error (paramsValidation or queryValidation);
change the response to include both validation results when both fail by
building a combined details object (e.g., { params: paramsValidation.error,
query: queryValidation.error } or an array) instead of selecting one via the
ternary, so that paramsValidation and queryValidation errors are both surfaced
to the client.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d2e13a5f-0241-4720-9e1e-5e94d63a4e50

📥 Commits

Reviewing files that changed from the base of the PR and between 7a96f48 and a44b9a0.

📒 Files selected for processing (1)
  • server/src/api/analytics/getLiveUsercount.ts

Comment on lines 15 to 18
req: FastifyRequest<{
Params: { siteId: string };
Querystring: { minutes: number };
}>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fastify type annotations don't match the raw request shape.

The type annotations should reflect the pre-validation request types:

  • siteId is correctly typed as string (URL params are strings)
  • minutes should be typed as string | undefined or marked optional with minutes?: string because:
    • Query parameters arrive as strings (or undefined) before zod coerces them
    • The zod schema's .default(5) makes this parameter optional

The current minutes: number suggests it's already a required number, which is misleading and reduces type safety for anyone accessing req.query.minutes directly.

🔧 Proposed fix
  req: FastifyRequest<{
    Params: { siteId: string };
-   Querystring: { minutes: number };
+   Querystring: { minutes?: string };
  }>,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/api/analytics/getLiveUsercount.ts` around lines 15 - 18, The
Fastify request typing is incorrect: update the req type annotation so Params
stays { siteId: string } but Querystring types reflect pre-validation strings,
e.g. make minutes optional as string (minutes?: string or minutes: string |
undefined) instead of number; adjust the FastifyRequest generic where you see
"Params" and "Querystring" so callers reading req.query.minutes see the raw
string/undefined (the zod schema's .default(5) will handle coercion later).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant