batch: reserve 10KiB gRPC framing overhead in split and add batch_splits_total metric#2078
Conversation
📊 API Diff Results
|
5f4b2c3 to
cbf9a38
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves observability and safety around batching in the chipingress batch client by (1) tracking when a batch is split due to request-size constraints and (2) reserving a fixed buffer to account for gRPC/HTTP2/request metadata not captured by proto.Size.
Changes:
- Added a new counter metric
chip_ingress.batch.batch_splits_totaland increments it when a queued batch is split into multiple sub-batches. - Adjusted batch-splitting logic to use an “effective max” request size that subtracts a 10KiB overhead buffer from
maxGRPCRequestSize. - Registered the new metric in the batch client metrics initialization.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| batchSplitsTotal, err := meter.Int64Counter( | ||
| "chip_ingress.batch.batch_splits_total", | ||
| otelmetric.WithDescription("Total number of times a batch was split due to exceeding max gRPC request size"), |
There was a problem hiding this comment.
Addressed — updated the description to: "Total number of times a batch was split due to exceeding the effective gRPC request size limit (max request size minus reserved framing overhead)"
| effectiveMax := maxRequestSize - grpcFramingOverhead | ||
| if effectiveMax <= 0 { | ||
| effectiveMax = maxRequestSize | ||
| } |
There was a problem hiding this comment.
Addressed — enforced a minimum maxGRPCRequestSize of 1 MiB in WithMaxGRPCRequestSize. Values below this are clamped up, ensuring the 10 KiB framing overhead reservation is always meaningful. The fallback branch now has a comment noting it should not be reachable given the minimum.
| splitBatches := splitMessagesByRequestSize(messages, b.maxGRPCRequestSize) | ||
| if len(splitBatches) > 1 { | ||
| b.metrics.batchSplitsTotal.Add(ctx, 1) | ||
| } |
There was a problem hiding this comment.
Addressed — added TestSendBatch/records_batch_splits_total_metric_when_batch_is_split that forces a split and asserts the counter equals 1.
87ff96b to
b9973df
Compare
72fb124 to
9e2725d
Compare
9e2725d to
abf7486
Compare
Summary
Add observability for batch splitting in the chipingress batch client:
chip_ingress.batch.batch_splits_total— Int64Counter that increments each timesplitMessagesByRequestSizeproduces more than one sub-batch, indicating the original batch exceeded the effective gRPC request size limit.splitMessagesByRequestSizenow subtracts a 10 KiB buffer frommaxGRPCRequestSizeto account for gRPC framing, HTTP/2 headers, auth tokens, and tracing metadata not captured byproto.Size.Changes
pkg/chipingress/batch/client.go: AddbatchSplitsTotalcounter, register it innewBatchClientMetrics(), record it insendBatch()when a split occurs, and applygrpcFramingOverheadin the split logic.