perf(webapp): shrink run trace loader payload and add trace span cap controls#3906
Conversation
…ayload The trace tree only renders the derived timelineEvents, so the raw span events (with full properties) were serialized into the loader response as dead weight on event-heavy traces. Raw events now stay server-side, and timeline events no longer carry the raw event properties (the field was never in the TimelineSpanEvent type and nothing rendered it).
A new optional TRACE_VIEW_EMERGENCY_SPAN_CAP env var clamps the trace summary and detailed trace summary span limits on both event store paths (ClickHouse and Postgres), covering the dashboard trace view and the public run trace endpoint. Unset by default, so nothing changes unless an operator sets it.
The SSE stream route resolved runs by friendly id alone. The lookup now applies the same organization membership scoping as the rest of the run page presenters, on both the database lookup and the buffered-run fallback, with unauthorized indistinguishable from missing.
The virtualizer render path ran tree.find per virtual row and getNodeProps ran tree.findIndex per rendered node, which is quadratic work on large traces. Both now resolve through memoized id-to-index maps with identical behavior.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
WalkthroughThis PR introduces three coordinated improvements to the trace page system. The primary change slims client payloads by extracting raw span events on the server and removing them from serialized data, while timeline events continue to be derived server-side. An optional emergency span cap environment variable provides a safety mechanism to clamp trace query limits across both event store implementations. Authorization is strengthened by requiring user identity validation in RunStreamPresenter and scoping trace access to the user's organization. Finally, TreeView rendering is optimized by replacing linear array scans with memoized Map-based O(1) lookups for both node resolution and index computation. 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
The run trace page loader serialized every span's raw OTel events (with full properties) into the response, even though the tree UI only renders the derived
timelineEventsand the span detail panel refetches what it needs. On event-heavy traces that inflated both the loader payload and the server-side heap copies built per request. This PR keeps raw span events server-side and pairs that with a few related trace-view improvements:TRACE_VIEW_EMERGENCY_SPAN_CAPenv var (unset by default) clamps the trace summary and detailed trace summary span limits on both event store paths, including the public run trace endpoint, so operators can bound trace query sizes in one place without retuning the per-store limits.getNodePropsdid the same viafindIndex); rows now resolve through memoized id lookup maps, which matters once traces reach tens of thousands of spans.Behavior is unchanged by default: the trace tree renders from the same
timelineEventsit always has, and the new cap only takes effect when set.