Skip to content

fix(tesseract): tautological FK-aggregate join for filtered PK counts#10762

Merged
waralexrom merged 1 commit into
cube-js:masterfrom
tlangton3:cube/fix-tesseract-fk-aggregate-pk-tautology
May 11, 2026
Merged

fix(tesseract): tautological FK-aggregate join for filtered PK counts#10762
waralexrom merged 1 commit into
cube-js:masterfrom
tlangton3:cube/fix-tesseract-fk-aggregate-pk-tautology

Conversation

@tlangton3

@tlangton3 tlangton3 commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Filtered count measures of shape type: count + sql: "{primary_key}" + filters: [...], when reached through a 1→N (one-to-many) join with CUBESQL_SQL_PUSH_DOWN=true, generated SQL with a tautological FK-aggregate rejoin: LEFT JOIN <source_cube> AS <pk_alias> ON keys.pk = keys.pk. Both sides referenced the inner keys subquery, so the join cross-produced and the count over-included any row whose filter predicates were satisfied anywhere in the source table — silently incorrect and orders of magnitude slower.

The legacy JS path (BaseQuery.js::aggregateSubQuery) generates the correct keys.pk = source.pk join. The bug is exclusive to the Tesseract Rust planner.

Discovered as a production over-count in a downstream Cube deployment: a measure pattern matching ~70 instances across the model returned silently inflated counts whenever queried through a fanout-prone view. Reproduced into the upstream test suite and fixed here.

Changes

  • rust/cubesqlplanner/cubesqlplanner/src/physical_plan_builder/processors/aggregate_multiplied_subquery.rs — bind a dedicated VisitorContext to the right-hand side of the FK-aggregate ON clause in the Cube source arm.
  • rust/cubesqlplanner/cubesqlplanner/src/test_fixtures/schemas/yaml_files/common/integration_multi_fact.yaml — add customers.active_count (filtered count on PK) to the existing 1→N schema.
  • rust/cubesqlplanner/cubesqlplanner/src/tests/integration/multi_fact.rs — new test_multiplied_aggregate_filtered_count_on_pk regression test with a structural tautology assertion over every ON predicate, plus the suite-standard assert_snapshot!(result) on postgres execution row data when integration-postgres is on.

Implementation Details

The Cube source arm of AggregateMultipliedSubquery::process used Expr::Member(dim) for the join's right-hand side, deferring resolution to SQL-emission time. By then, the outer factory's render_references had been populated by the schema-resolution pass — and because the keys subquery (the join's root) projects the PK dimension, ReferencesBuilder::find_reference_column_for_member_in_join resolved it to (keys, <alias>). That mapping intercepted both sides of the ON clause via RenderReferencesSqlNode, producing the tautology.

Fix: snapshot a VisitorContext before the schema-resolution pollution, register pk_cube.name() → pk_cube_alias in cube_name_references, and bind it to the RHS via Expr::new_member_with_context. The dim then renders via AutoPrefixSqlNode against pk_cube_alias — preserves full SQL semantics for custom-SQL or composite PKs, not just simple sql: id cases.

Risk / Compatibility

The change is scoped to the Cube arm of AggregateMultipliedSubquerySource. The MeasureSubquery arm is unchanged — it already used Expr::Reference with pk_cube_alias and was unaffected. No other call sites are at risk: add_subquery_join (builder.rs:141) uses the same Expr::new_member pattern but joins against a Cube-rooted From where ReferencesBuilder returns None for cube sources, so it doesn't trigger the render-reference pollution that caused the bug.

Testing

  • cargo test --lib in rust/cubesqlplanner/: 823 tests pass locally, 0 failures, 14 ignored (Docker-dependent).
  • Pre-fix manifestation: LEFT JOIN customers AS "..." ON (("keys"."customers__id" = "keys"."customers__id")) — tautology, both sides reference keys.
  • Post-fix: LEFT JOIN customers AS "customers_key_customers" ON (("keys"."customers__id" = "customers_key_customers".id)) — RHS correctly resolves to the source-cube alias.
  • Regression coverage: (a) the test's structural tautology scanner walks every ON predicate and fails on any lhs == rhs equality (environment-independent, robust to planner formatting); (b) assert_snapshot!(result) on postgres execution row data when integration-postgres is enabled validates end-to-end correctness against a real database. SQL strings themselves are not snapshotted — that follows the existing convention in the suite (226 result-data snapshots vs 0 SQL-string snapshots), since SQL formatting is unstable across template-rendering passes while row data is stable.

Linear Ticket

N/A — upstream OSS contribution.

@waralexrom

Copy link
Copy Markdown
Member

Hi, @tlangton3! Great find, and huge thanks for the fix — everything looks really solid 👍
Could you, please, rebase onto current master?

@waralexrom waralexrom left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, excellent work! 👍

When a filtered count measure of shape `type: count` + `sql: "{primary_key}"` + `filters: [...]` is reached through a 1->N edge with `CUBESQL_SQL_PUSH_DOWN=true`, the FK-aggregate rejoin onto the source cube emitted `ON keys.pk = keys.pk` — both sides referencing the inner `keys` subquery. The join cross-produced and the count over-included any row whose filters were satisfied anywhere in the source table.

Root cause: the `Cube` arm of `AggregateMultipliedSubquerySource` built the right-hand side of the join condition as `Expr::Member(MemberExpression::new(dim))`, deferring resolution to whatever rendering context was active at SQL emission. By emission time, the outer factory's `render_references` had been populated by the schema-resolution pass, mapping the PK dim to `(keys, <alias>)` (since the `keys` subquery — the join root — projects it). That mapping then intercepted both sides of the `ON` clause via `RenderReferencesSqlNode`.

Fix: clone `context_factory`, register `pk_cube.name() -> pk_cube_alias` in `cube_name_references`, and bind a `VisitorContext` snapshot (pre-pollution) to the join's right-hand side via `Expr::new_member_with_context`. The dimension renders through `AutoPrefixSqlNode` against `pk_cube_alias`, preserving full SQL semantics for custom-SQL or composite primary keys. The `MeasureSubquery` arm in the same file already used an explicit reference to `pk_cube_alias` and was unaffected.

Adds a regression test exercising the bug shape (`customers.active_count` filtered count on PK, queried by `orders.status` to force the multiplied path) with both a structural tautology assertion over every `ON` predicate and an `insta` snapshot of the generated SQL.
@tlangton3 tlangton3 force-pushed the cube/fix-tesseract-fk-aggregate-pk-tautology branch from 6301482 to 5fd6e45 Compare May 11, 2026 15:54
@tlangton3

Copy link
Copy Markdown
Contributor Author

Hi @waralexrom

Thanks for taking a look 🙏 I've performed the rebase

@waralexrom waralexrom merged commit b647dd5 into cube-js:master May 11, 2026
93 of 98 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr:community Contribution from Cube.js community members. rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants