fix(weave): shard call_parts by id so call_start/call_end co-locate#6997
Draft
gtarpenning wants to merge 1 commit into
Draft
Conversation
Member
Author
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This was referenced May 28, 2026
id so call_start/call_end co-locate
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
call_parts was using the default rand() sharding key, so call_start and call_end for the same call could land on different shards. Once split, the partial-state rows can never merge in calls_merged_local (OPTIMIZE runs per-shard), and queries that filter on an aggregated column see inconsistent state. Concretely: call_end doesn't carry parent_id, so its row defaults to NULL. Filters like trace_roots_only (`parent_id IS NULL`) then match the call_end row of every child call as if it were a root, inflating counts. Shard by `id` instead of `wf_clickhouse_calls_shard_key()` (which defaults to trace_id): trace_id is Nullable on call_end so sipHash64 returns Nullable, which ClickHouse rejects as a sharding expression (TYPE_MISMATCH). `id` is non-null on every call_part row and uniquely identifies a call, so all parts of one call land together. calls_merged Distributed table is intentionally left rand(): the only writes come from the MV which fires on the local source/target pair and never goes through the Distributed wrapper.
dbe0478 to
7943d1a
Compare
c25f1ba to
42c2c39
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

WB-34906
Summary
call_partswas using defaultrand()sharding.call_endrows don't carrytrace_id(onlycall_startdoes), so call_start and call_end of the same call landed on different shards.calls_complete, andparent_id IS NULLfilters matched the call_end row of every child call instead of trace roots."call_parts": "id"toID_SHARDED_TABLESso all parts of a single call hash to the same shard viasipHash64(id).Testing
caught by the new 2s2r nightly tests; existing migrator unit tests cover the sharding-key resolution.