fix(tesseract): tautological FK-aggregate join for filtered PK counts by tlangton3 · Pull Request #10762 · cube-js/cube

tlangton3 · 2026-04-28T13:44:25Z

Summary

Filtered count measures of shape type: count + sql: "{primary_key}" + filters: [...], when reached through a 1→N (one-to-many) join with CUBESQL_SQL_PUSH_DOWN=true, generated SQL with a tautological FK-aggregate rejoin: LEFT JOIN <source_cube> AS <pk_alias> ON keys.pk = keys.pk. Both sides referenced the inner keys subquery, so the join cross-produced and the count over-included any row whose filter predicates were satisfied anywhere in the source table — silently incorrect and orders of magnitude slower.

The legacy JS path (BaseQuery.js::aggregateSubQuery) generates the correct keys.pk = source.pk join. The bug is exclusive to the Tesseract Rust planner.

Discovered as a production over-count in a downstream Cube deployment: a measure pattern matching ~70 instances across the model returned silently inflated counts whenever queried through a fanout-prone view. Reproduced into the upstream test suite and fixed here.

Changes

rust/cubesqlplanner/cubesqlplanner/src/physical_plan_builder/processors/aggregate_multiplied_subquery.rs — bind a dedicated VisitorContext to the right-hand side of the FK-aggregate ON clause in the Cube source arm.
rust/cubesqlplanner/cubesqlplanner/src/test_fixtures/schemas/yaml_files/common/integration_multi_fact.yaml — add customers.active_count (filtered count on PK) to the existing 1→N schema.
rust/cubesqlplanner/cubesqlplanner/src/tests/integration/multi_fact.rs — new test_multiplied_aggregate_filtered_count_on_pk regression test with a structural tautology assertion over every ON predicate, plus the suite-standard assert_snapshot!(result) on postgres execution row data when integration-postgres is on.

Implementation Details

The Cube source arm of AggregateMultipliedSubquery::process used Expr::Member(dim) for the join's right-hand side, deferring resolution to SQL-emission time. By then, the outer factory's render_references had been populated by the schema-resolution pass — and because the keys subquery (the join's root) projects the PK dimension, ReferencesBuilder::find_reference_column_for_member_in_join resolved it to (keys, <alias>). That mapping intercepted both sides of the ON clause via RenderReferencesSqlNode, producing the tautology.

Fix: snapshot a VisitorContext before the schema-resolution pollution, register pk_cube.name() → pk_cube_alias in cube_name_references, and bind it to the RHS via Expr::new_member_with_context. The dim then renders via AutoPrefixSqlNode against pk_cube_alias — preserves full SQL semantics for custom-SQL or composite PKs, not just simple sql: id cases.

Risk / Compatibility

The change is scoped to the Cube arm of AggregateMultipliedSubquerySource. The MeasureSubquery arm is unchanged — it already used Expr::Reference with pk_cube_alias and was unaffected. No other call sites are at risk: add_subquery_join (builder.rs:141) uses the same Expr::new_member pattern but joins against a Cube-rooted From where ReferencesBuilder returns None for cube sources, so it doesn't trigger the render-reference pollution that caused the bug.

Testing

cargo test --lib in rust/cubesqlplanner/: 823 tests pass locally, 0 failures, 14 ignored (Docker-dependent).
Pre-fix manifestation: LEFT JOIN customers AS "..." ON (("keys"."customers__id" = "keys"."customers__id")) — tautology, both sides reference keys.
Post-fix: LEFT JOIN customers AS "customers_key_customers" ON (("keys"."customers__id" = "customers_key_customers".id)) — RHS correctly resolves to the source-cube alias.
Regression coverage: (a) the test's structural tautology scanner walks every ON predicate and fails on any lhs == rhs equality (environment-independent, robust to planner formatting); (b) assert_snapshot!(result) on postgres execution row data when integration-postgres is enabled validates end-to-end correctness against a real database. SQL strings themselves are not snapshotted — that follows the existing convention in the suite (226 result-data snapshots vs 0 SQL-string snapshots), since SQL formatting is unstable across template-rendering passes while row data is stable.

Linear Ticket

N/A — upstream OSS contribution.

waralexrom · 2026-05-11T15:44:20Z

Hi, @tlangton3! Great find, and huge thanks for the fix — everything looks really solid 👍
Could you, please, rebase onto current master?

waralexrom

LGTM, excellent work! 👍

When a filtered count measure of shape `type: count` + `sql: "{primary_key}"` + `filters: [...]` is reached through a 1->N edge with `CUBESQL_SQL_PUSH_DOWN=true`, the FK-aggregate rejoin onto the source cube emitted `ON keys.pk = keys.pk` — both sides referencing the inner `keys` subquery. The join cross-produced and the count over-included any row whose filters were satisfied anywhere in the source table. Root cause: the `Cube` arm of `AggregateMultipliedSubquerySource` built the right-hand side of the join condition as `Expr::Member(MemberExpression::new(dim))`, deferring resolution to whatever rendering context was active at SQL emission. By emission time, the outer factory's `render_references` had been populated by the schema-resolution pass, mapping the PK dim to `(keys, <alias>)` (since the `keys` subquery — the join root — projects it). That mapping then intercepted both sides of the `ON` clause via `RenderReferencesSqlNode`. Fix: clone `context_factory`, register `pk_cube.name() -> pk_cube_alias` in `cube_name_references`, and bind a `VisitorContext` snapshot (pre-pollution) to the join's right-hand side via `Expr::new_member_with_context`. The dimension renders through `AutoPrefixSqlNode` against `pk_cube_alias`, preserving full SQL semantics for custom-SQL or composite primary keys. The `MeasureSubquery` arm in the same file already used an explicit reference to `pk_cube_alias` and was unaffected. Adds a regression test exercising the bug shape (`customers.active_count` filtered count on PK, queried by `orders.status` to force the multiplied path) with both a structural tautology assertion over every `ON` predicate and an `insta` snapshot of the generated SQL.

tlangton3 · 2026-05-11T15:56:26Z

Hi @waralexrom

Thanks for taking a look 🙏 I've performed the rebase

github-actions Bot added rust Pull requests that update Rust code pr:community Contribution from Cube.js community members. labels Apr 28, 2026

vercel Bot deployed to Preview April 28, 2026 13:46 View deployment

vercel Bot deployed to Preview April 28, 2026 13:49 View deployment

vercel Bot deployed to Preview April 28, 2026 14:16 View deployment

vercel Bot deployed to Preview April 28, 2026 14:21 View deployment

tlangton3 marked this pull request as ready for review April 28, 2026 14:35

vercel Bot deployed to Preview April 28, 2026 14:37 View deployment

ovr requested a review from waralexrom May 11, 2026 15:20

waralexrom approved these changes May 11, 2026

View reviewed changes

tlangton3 force-pushed the cube/fix-tesseract-fk-aggregate-pk-tautology branch from 6301482 to 5fd6e45 Compare May 11, 2026 15:54

vercel Bot deployed to Preview May 11, 2026 15:57 View deployment

waralexrom merged commit b647dd5 into cube-js:master May 11, 2026
93 of 98 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(tesseract): tautological FK-aggregate join for filtered PK counts#10762

fix(tesseract): tautological FK-aggregate join for filtered PK counts#10762
waralexrom merged 1 commit into
cube-js:masterfrom
tlangton3:cube/fix-tesseract-fk-aggregate-pk-tautology

tlangton3 commented Apr 28, 2026 •

edited

Loading

Uh oh!

waralexrom commented May 11, 2026

Uh oh!

waralexrom left a comment

Uh oh!

tlangton3 commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tlangton3 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Implementation Details

Risk / Compatibility

Testing

Linear Ticket

Uh oh!

waralexrom commented May 11, 2026

Uh oh!

waralexrom left a comment

Choose a reason for hiding this comment

Uh oh!

tlangton3 commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tlangton3 commented Apr 28, 2026 •

edited

Loading