Skip to content

execution/binary: make duplicate series error message deterministic#711

Merged
fpetkovski merged 1 commit into
thanos-io:mainfrom
sandy2008:fix/binary-deterministic-duplicate-series-error
May 29, 2026
Merged

execution/binary: make duplicate series error message deterministic#711
fpetkovski merged 1 commit into
thanos-io:mainfrom
sandy2008:fix/binary-deterministic-duplicate-series-error

Conversation

@sandy2008
Copy link
Copy Markdown
Contributor

The errManyToManyMatch.Error() method previously formatted the two conflicting series in whichever order they happened to arrive on the original / duplicate slots. That order is driven by upstream StepVector iteration which is itself fed by Go map iteration (see execution/scan/subquery.go), so the rendered error string is non-deterministic across runs and across parallel engine workers.

Same root cause as the upstream Prometheus issue/PR (prometheus/prometheus#18809 / prometheus/prometheus#18810). Tracked downstream as a flaky test in Cortex: cortexproject/cortex#7546 (tactical workaround in cortexproject/cortex#7550 — removable once both upstream fixes land and are vendored).

Fix: lexicographically sort the two label-set strings before formatting the message. One-line if-swap, no new imports, preserves the existing message format so existing matchers keep working.

A regression test in the new execution/binary/utils_test.go exercises both right-hand and left-hand side cases as well as adversarial label values containing the same framing punctuation ([, ], ,, {, }, "), and asserts byte-identical output regardless of which series came first. Verified by temporarily reverting the swap: all subtests FAIL with the unsorted ordering; restoring the patch makes them PASS.

Fixes #710.

The errManyToManyMatch.Error() method previously formatted the two
conflicting series in whichever order they happened to arrive on the
"original" / "duplicate" slots. That order is driven by upstream
StepVector iteration which is itself fed by Go map iteration (see
execution/scan/subquery.go), so the rendered error string was
non-deterministic across runs and across parallel engine workers.

Downstream consumers (e.g. Cortex's distributor / querier) compare
error strings to classify failures, and the flapping ordering caused
flaky tests and inconsistent user-visible error messages -- same root
cause as the issue tracked in cortexproject/cortex#7546.

Fix: lexicographically sort the two label-set strings before
formatting the message. This is a one-line if-swap, no new imports
required, and preserves the existing message format so existing
matchers continue to work.

A regression test exercises both right-hand and left-hand side cases
as well as adversarial label values containing the same framing
punctuation ('[', ']', ',', '{', '}', '"') and asserts byte-identical
output regardless of which series came first.

Refs: cortexproject/cortex#7546
Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>
Copy link
Copy Markdown
Collaborator

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@fpetkovski fpetkovski merged commit f831b43 into thanos-io:main May 29, 2026
6 of 7 checks passed
sandy2008 added a commit to sandy2008/cortex-1 that referenced this pull request May 30, 2026
…-series quirk

Addresses review feedback on cortexproject#7550. Replaces the generic bracket-list
canonicalizer (errBracketListRE / canonicalizeErrMsg / splitTopLevelCommas)
with a single-purpose, anchored duplicateSeriesRE + canonicalizeDuplicateSeriesErr
that only normalizes the non-deterministic two-element list in the PromQL
"found duplicate series for the match group" error. The regex matches the match
group and both entries as whole quote-aware labelsets, so a "," inside a labelset
is never mistaken for the entry separator; the two-element shape is hard-coded so
the comparator falls back to a strict (failing) compare if upstream reformats.

Also restores the *promv1.Error.Detail comparison that the original cmp.Equal
performed, and expands TestSameErrorClass to 19 subtests (multi-label reorder,
label values containing brackets/commas/braces/escaped quotes, nested brackets,
same list with different surrounding text, three-element and changed-separator
loud-break guards, and an unrelated-bracket-list anchoring guard).

Remove this workaround once prometheus/prometheus#18810 and
thanos-io/promql-engine#711 are vendored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

execution/binary: "found duplicate series for the match group" error is non-deterministic

2 participants