[pull] main from triggerdotdev:main#165
Merged
Merged
Conversation
…nge integers (#3759) ## Summary Second class of poisoned-row failure in the runs replication path. PR #3708 plugged lone UTF-16 surrogates; this one handles bare JSON integer literals outside ClickHouse's `Int64`..`UInt64` range. Recovery stays purely reactive — the existing `sanitizeRows` walker just gains an extra branch, so the hot replication path pays nothing on healthy rows. Fixes the still-firing customer-facing symptom from [TRI-9755](https://linear.app/triggerdotdev/issue/TRI-9755): `scan-social-profiles` runs continued to be stranded in `EXECUTING` on the Tasks page after #3708 deployed. CloudWatch showed `Dropped batch — ClickHouse JSON parse error but sanitizer found nothing to fix` firing **8/8 times** since the previous deploy (zero successful sanitizations). Root cause: upstream JS Number precision loss on a 21-digit Google Plus ID (`117039831458782873093` → `117039831458782870000`) — the precision-lossy value still serialises as a bare integer that exceeds `UInt64.MAX`, which ClickHouse rejects with `INCORRECT_DATA`. ## How the bug ships The customer task emits an output containing a Poshmark profile's `spec_format`: ```json {"key":"gp_id","proper_key":"Gp Id","value":117039831458782870000,"type":"int"} ``` That value is `1.17e20` — comfortably above `UInt64.MAX` (`1.84e19`) but comfortably below `1e21`. `Number.prototype.toString` only switches to exponential form at `|value| >= 1e21`, so `JSON.stringify` emits the bare token `117039831458782870000` and the ClickHouse `JSON(max_dynamic_paths)` column fails with: ``` Code: 117. DB::Exception: Cannot parse JSON object here: {…}: (while reading the value of key output): (at row 1) : While executing ParallelParsingBlockInputFormat. (INCORRECT_DATA) (version 25.12.x) ``` Same error verbatim as prod. The same number quoted (`"117039831458782870000"`) inserts fine — ClickHouse's dynamic JSON column accepts a `String` subtype on the same path. ## What changed `apps/webapp/app/v3/eventRepository/sanitizeRowsOnParseError.server.ts`: - New private `isUnsafeJsonInteger(value)` helper — true iff `value` is a finite integer-valued JS Number where `|value| < 1e21` (so `JSON.stringify` emits integer form, not exponent) **and** `value` falls outside `[Int64.MIN, UInt64.MAX]`. - `sanitizeUnknownInPlace` gains a number-branch: when the predicate holds, replace the Number with `String(value)`. The downstream JSON column dynamic-types the path as String for that row — fine, since the value was already precision-lossy upstream (no JS Number above 2^53 is numerically meaningful anyway). - Float-valued numbers, large floats (>= 1e21), NaN and Infinity are left alone — `JSON.stringify` emits them with exponents or as `null`, both of which ClickHouse accepts. `apps/webapp/test/sanitizeRowsOnParseError.test.ts`: four new unit tests + an extension to `sanitizeRows` covering surrogate + integer fixes counted together across rows. The unit suite now covers: - Positive value above `UInt64.MAX` (`117039831458782870000` — the actual prod value) - Negative value below `Int64.MIN` - Boundary values pass through (`42`, `Number.MAX_SAFE_INTEGER`, `2^63`) - Non-integer numbers untouched (floats, `1e25`, NaN, Infinity) - The actual `scan-social-profiles` nested shape — finds the offending `gp_id` deep inside `output.data.profiles[].spec_format[].platform_variables[].value` `.server-changes/runs-replication-bigint-recovery.md` — release notes entry. ## Why reactive, not pre-flight `#prepareJson` runs millions of times per day on the replication hot path. Walking every JSON tree to look for oversized integers would add bounded-but-real CPU on every healthy row. `sanitizeRows` only fires after a ClickHouse parse-error rejection, which is a few times a day platform-wide. Extending it costs effectively zero on healthy traffic and gains us recovery on the rare poisoned row. ## Verification - Reproduced 1:1 in a throwaway Docker `clickhouse/clickhouse-server:25.12.11.4` (closest available to the prod `25.12.1.1579` build). Pre-sanitize JSON fails with the exact prod error; post-sanitize JSON inserts cleanly and the row is readable with `gp_id` stored as a String subtype. - `pnpm --filter webapp exec vitest run test/sanitizeRowsOnParseError.test.ts` — 22/22 passing (18 existing + 4 new). - `pnpm run typecheck --filter webapp` — clean. ## Test plan - [x] `pnpm run typecheck --filter webapp` - [x] Unit tests pass against new + existing cases - [x] End-to-end Docker ClickHouse repro confirms recovery - [ ] Post-deploy: confirm `Sanitizing batch after ClickHouse JSON parse error` warns fire instead of `Dropped batch …` errors when `scan-social-profiles` outputs trip CH again - [ ] Post-deploy: confirm `permanentlyDroppedBatches` counter stops climbing in `/stp/trigger-app-prod/ecs/replication/service-container/process-logs` ## What this does NOT do - Doesn't backfill the ~120k+ existing stranded `EXECUTING` rows in production. Same as #3708 — that needs a reconciliation/backfill sweep (separate ticket — TRI-9755 fix #3). - Doesn't address the upstream root cause (the customer task emitting a JS-Number-precision-lossy big int). That's a customer-task concern; our replication path needs to be robust to whatever shape arrives. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )