Skip to content

Batcher refactor and performance observability#2310

Merged
sitole merged 7 commits intomainfrom
chore/events-batcher-observability
Apr 8, 2026
Merged

Batcher refactor and performance observability#2310
sitole merged 7 commits intomainfrom
chore/events-batcher-observability

Conversation

@sitole
Copy link
Copy Markdown
Member

@sitole sitole commented Apr 7, 2026

  • Refactor batcher implementation to be easier to read.
  • Performance observability metrics for batcher.
  • Traces for batcher callback functions.

sitole added 5 commits April 7, 2026 09:47
Batcher loop is now easier to navigate. Time waits replaced with ticker.

Metrics for performance monitoring are now included. Different batchers
can be run in parallel with name attribute used to distinguish.
@cursor
Copy link
Copy Markdown

cursor bot commented Apr 7, 2026

PR Summary

Medium Risk
Medium risk because it changes the batcher’s concurrency/flush logic and Push API semantics, which could affect batching behavior and drop/backpressure handling under load.

Overview
This PR refactors the ClickHouse Batcher implementation to use a ticker-based flush loop with explicit start/stop state, changes Push to return an error (including ErrBatcherQueueFull) instead of a boolean, and removes the custom timer pool.

It adds OpenTelemetry instrumentation: batcher-level metrics (queue length, dropped items, flush sizes and timings) with a configurable batcher name attribute, and wraps ClickHouse batch insert callbacks for sandbox events and host stats in traced spans that record batch size and failures. The OTEL collector filter is updated to allow the new batcher.* metrics through.

Reviewed by Cursor Bugbot for commit 376a09a. Bugbot is set up for automated code reviews on this repo. Configure here.

Guard Start, Stop, and Push with a sync.RWMutex to prevent a
send-on-closed-channel panic that could occur when Stop closed
b.ch between Push's started check and the channel send.

Also update the QueueSize doc comment to reflect that Push now
returns ErrBatcherQueueFull instead of the old (false, nil).
@sitole sitole marked this pull request as ready for review April 7, 2026 09:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e6ce9a7a1b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Without reset, a size-triggered flush near a tick boundary would
give the next batch less than MaxDelay to accumulate. Now each
batch always gets a full MaxDelay window after the previous flush.
@sitole sitole requested a review from jakubno April 7, 2026 13:10
@sitole sitole merged commit babe558 into main Apr 8, 2026
44 checks passed
@sitole sitole deleted the chore/events-batcher-observability branch April 8, 2026 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants