Conversation
When `export_metrics/1` fails, the `MetricStore` used to leave each generation's data points separately in its ETS table and include them all in the next export attempt as independent OTLP data points per tag. Under sustained export failures (e.g. intermittent receiver errors or rate limiting) the payload grows linearly with the number of accumulated generations, which in turn causes the 20s `Task.yield` timeout guarding the HTTP send to fire constantly. Collapse all pending generations into a single one before building the payload. Counters, sums and distribution bucket counts are summed; last_value keeps the newest value; distribution min/max are merged with min/max. The resulting request contains at most one data point per metric/tag combination regardless of how many prior exports failed, so payload size (and therefore send duration) stays bounded. Refs: electric-sql/stratovolt#1455
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8a0edff126
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| merge_row(state.metrics_table, type, bucket, target_key, value, extra) | ||
| end | ||
|
|
||
| :ets.match_delete(state.metrics_table, {{gen, :_, :_, :_, :_}, :_, :_}) |
There was a problem hiding this comment.
Move generation deletion after successful export
Deleting rows in reaggregate_failed_generations/3 before the send outcome is known can drop metrics under normal concurrent writes. A writer that read the old generation just before rotate_generation/1 can still insert into current_gen during export; those inserts can be removed here without being merged (or orphaned once generation metadata is removed), and if the HTTP send later fails they are no longer retried. This is a regression from the previous flow, which only deleted pending generations after :ok.
Useful? React with 👍 / 👎.
Summary
OtelMetricExporter.MetricStore.export_metrics/1guards the OTLP HTTP send with aTask.yield(task, 20_000) || Task.shutdown(task)wrapper. Under sustained export failures the request payload grows on every cycle because each generation of data is sent as its own OTLP data point per metric/tag. Eventually the send takes longer than 20s and the task times out every export cycle — the behaviour reported in electric-sql/stratovolt#1455.This PR collapses all pending generations into a single one before building the payload, so the request contains at most one data point per metric/tag combination regardless of how many prior exports failed:
last_valuekeeps the newest value (older generations are overwritten by newer ones during the merge)min/maxare merged withmin/maxfinishtimestamp is extended to cover the full span of the collapsed generationsPayload size — and therefore send duration — stays bounded across repeated failures. This is scoped purely to
export_metrics/1's handling of pending data; theconvert_value/2changes for stratovolt#1454 are unaffected.Link: electric-sql/stratovolt#1455
Test plan
mix compile --warnings-as-errorsmix test(45 tests, 0 failures)