Skip to content

feat(tesseract): multi-stage grain directive#10957

Merged
waralexrom merged 11 commits into
masterfrom
tesseract-set-of-multi-stage-directives-group-by
May 29, 2026
Merged

feat(tesseract): multi-stage grain directive#10957
waralexrom merged 11 commits into
masterfrom
tesseract-set-of-multi-stage-directives-group-by

Conversation

@waralexrom

Copy link
Copy Markdown
Member

Summary

Introduces the grain: directive on multi-stage measures — declarative partition shaping (exclude / keep_only / include) that supersedes the older add_group_by / reduce_by / group_by fields. Mirrors the design of the already-shipped filter: directive (#10827).

Changes

  • JS Joi schema, transpiler patterns, and CubeEvaluator accept grain.exclude/keepOnly/include (all Joi.func()) and resolve them into *References arrays.
  • Rust bridge MultiStageGrainReferences plus its mock and YAML test parser; the backend-native bridge harness registers the new bridge with full coverage.
  • MultiStageGrain struct collapses the three independent fields on MultiStageProperties into a single set-operation value; the planner reads grain directly — partition_filter and member_partition_by_logical take &MultiStageGrain.
  • MultiStageInodeMember carries a MultiStageGrain instead of three separate vecs; legacy proxy methods on MeasureSymbol / DimensionSymbol removed.
  • Schema-compiler postgres integration tests cover every grain shape — exclude / keep_only / include at single-element, two-element-array, and the keep_only + include combination.

Testing

  • cargo test --package cubesqlplanner --lib — 966 passed.
  • jest dist/test/unit/cube-validator.test.js — 103 passed.
  • jest --config jest-bridge.config.js (backend-native bridge harness) — 192 passed.
  • Postgres integration suite (multi-stage-grain.test.ts) is gated on nativeSqlPlanner and skips without Postgres; numeric verification will happen in CI.

… measures

Introduce `grain:` directive on multi-stage measures with fields
`mode (relative|fixed)`, `exclude`, `keepOnly`, `include` mirroring the
existing `filter:` directive shape. All three list fields are
`Joi.func()` returning arrays of refs and are wired into the prop
transpiler so authors can write `({ CUBE }) => [CUBE.dim]`.

`exclude` and `keepOnly` are mutually exclusive via `.nand(...)`,
matching the `MultiStageFilter` contract.

`CubeEvaluator.evaluateMultiStageReferences` now resolves grain's
function fields into `*References` siblings that the native bridge will
read. The directive is only added to multi-stage measures; dimensions
are intentionally out of scope for now, and the legacy
`group_by/add_group_by/reduce_by` fields keep working unchanged.
Mirror the multi-stage filter bridge for the new `grain:` directive:

- New static-only bridge `MultiStageGrainReferences` carrying
  `mode/exclude/keep_only/include` (all reference name lists).
- `MeasureDefinition::grain()` accessor — only on measures, dimensions
  stay out of scope until the directive expands.
- Tesseract mocks: `MockMultiStageGrainReferences`, `grain:` field on
  the YAML measure parser with cube-name qualification, wired into
  `MockMeasureDefinition`, plus YAML round-trip unit tests covering
  full / partial / cube-qualified shapes.

Backend-native bridge harness:
- Register `multiStageGrain` in the bridge_registry and add an empty
  `invoke_multi_stage_grain` (static-only, like the filter sibling).
- Extend `invoke_measure_definition` to record the new `grain` getter
  so the dispatcher/meta drift guard stays green.
- JS fixture + coverage rows for `multiStageGrain`, and `grain` added
  to the `measureDefinition` expected meta. The fixture populates only
  one of `excludeReferences`/`keepOnlyReferences` because the `.nand`
  enforcement currently lives on the JS Joi validator; the bridge
  itself is structurally permissive.
…StageGrain

Replace the three independent `add_group_by` / `reduce_by` / `group_by`
fields on `MultiStageProperties` with a single `MultiStageGrain` struct
carrying `mode` + `exclude` / `keep_only` / `include`, modeled as a set
operation on the inherited grain context.

For measures the grain is sourced from the new `grain:` directive when
the schema provides it; otherwise it falls back to the older
`add_group_by` / `reduce_by` / `group_by` fields with the mapping:
  - add_group_by -> include
  - reduce_by    -> exclude
  - group_by     -> keep_only
Fallback always uses `mode: Relative` — the planner does not branch on
mode for grain yet, so observable behavior is unchanged. The `.nand`
between `exclude` and `keep_only` on the directive is enforced at the
symbol-build layer, matching the wording of the filter equivalent.

For dimensions the grain still carries only `include` (from
`add_group_by`); the `grain:` directive is currently scoped to measures.

`MeasureSymbol::reduce_by/add_group_by/group_by` and
`DimensionSymbol::add_group_by` are kept as thin proxies on the new
struct so existing call sites continue to compile and behave the same.

Tests: existing direct-field accesses moved to the new shape; a YAML
fixture and two unit tests cover the legacy-mapping and
directive-overrides-legacy paths.
Replace the three separate `reduce_by` / `add_group_by` / `group_by`
fields on `MultiStageInodeMember` with the shared `MultiStageGrain`,
and route every planner call site through `.grain()` rather than
through per-field accessors. The inode now exposes only `grain()` and
`time_shift()`; partition-shape helpers (`partition_filter`,
`member_partition_by_logical`) take `&MultiStageGrain` directly.

Add `Default` impls on `MultiStageGrain` and `MultiStageGrainMode`
(Relative). Reuse it from `MultiStageProperties::from_dimension_definition`
and at every "no directive, no legacy fields" fallback in the planner.

The `use_window_path` check still gates only on `include` — `exclude`
and `keep_only` change partition shape but are honoured by the window
path through PARTITION BY at render time. Comment added so future
work that extends window-path coverage revisits this assumption.

Schema fixture and integration test for `grain.keep_only` rephrased
without "legacy/new" wording so they stay accurate when the older
fields go away.
…r of grain

Remove `MeasureSymbol::reduce_by/add_group_by/group_by` and
`DimensionSymbol::add_group_by`. Their only role was to hide the
`multi_stage().grain.{exclude,include,keep_only}` access behind the old
field names; now both production call sites
(`join_hints_collector`, `logical_plan::multistage::dimension::join_dimensions`)
and the symbol-level tests read the grain directly.

`MeasureSymbol` and `DimensionSymbol` are now grain-agnostic apart from
exposing `multi_stage() -> Option<&MultiStageProperties>`; everything
partition-shape-related lives on `MultiStageGrain`.
Honour `MultiStageGrainMode::Fixed` in `partition_filter`:
  - `Relative` (existing): start from parent state's dims and apply
    `exclude` → `keep_only`.
  - `Fixed`: discard the parent state, start from the original query
    context (`root_state.dimensions/time_dimensions`), and intersect
    with `keep_only`. `exclude` is undefined here per the spec
    (validator forbids `exclude` + `keep_only` together).

`include` continues to be appended via `add_dimensions(...)` outside
the helper, so the full FIXED result is
`(root_dims ∩ keep_only) ∪ include`, and FIXED with no keep_only and
no include collapses to an empty grain (grand total).

`member_partition_by_logical` is intentionally not updated: its input
is `description.state()` which has already been shaped by
`partition_filter`, so re-applying `exclude` / `keep_only` over the
narrowed set is idempotent in every allowed combination.

Fixtures and integration smoke tests cover three top-level scenarios:
grand total (FIXED, no narrowing), FIXED + keep_only with the dim
present in the query, and FIXED + keep_only with the dim absent.
Snapshots are filled when the suite runs against Postgres.
Roll back `MultiStageGrainMode` and the `mode` field on `MultiStageGrain`.
The FIXED-vs-RELATIVE distinction is technically implementable but its
behaviour is difficult to communicate to schema authors; ship grain
without it for now and keep only the always-relative semantics
(`exclude` removes from the parent grain, `keep_only` intersects with
it, `include` adds).

Removed across the stack:
- Rust bridge: `mode: Option<String>` on `MultiStageGrainReferencesStatic`,
  the corresponding mock builder field, and the YAML parser entry.
- Planner symbols: the `MultiStageGrainMode` enum, the `mode` field on
  `MultiStageGrain`, and every site that read or wrote it
  (`build_grain_from_directive`, `build_grain_from_legacy`,
  `apply_to_deps`).
- Planner core: `partition_filter` collapses back to a single relative
  branch (no `match grain.mode`, no `root_dims` parameter).
- Tests: the three FIXED integration smoke tests, the `mock_measure_definition`
  yaml-grain asserts that checked `static_data.mode`, and every
  `assert_eq!(ms.grain.mode, …)` in `tests/measure_symbol.rs`. Test
  fixtures (`integration_multi_stage.yaml`, `multi_stage_filter.yaml`)
  no longer set `mode:`.
- JS schema: `mode` removed from the `MultiStageGrain` Joi schema and
  from the `MultiStageGrainDirective` TS type. The
  `multi-stage grain — invalid mode rejected` validator test is gone.
- Backend-native bridge harness: `mode` dropped from the JS fixture
  and from the bridge coverage expected-fields list.

FIXED implementation is recoverable via `git revert 7ef91664d5`
(the commit that introduced the `partition_filter` mode branch).
Mirror `multi-stage-filter.test.ts`: a single inline YAML cube with 6
rows wired through `prepareYamlCompiler` and `dbRunner.runQueryTest`,
covering every shape of `grain:` end-to-end (JS → Joi → transpiler →
Tesseract → SQL). Tests are gated on `nativeSqlPlanner`; the legacy
planner does not implement the directive.

Eight scenarios:
- exclude / keep_only / include with a single dim
- keep_only with the listed dim absent from the query (collapse to
  grand total)
- exclude / keep_only / include with a two-element array each
- keep_only + include combined: the keep narrows the inherited grain
  to `[status]`, include extends the leaf to `[status, id]`, outer
  re-aggregates → per-status total broadcast across categories. This
  is the JS analogue of the Rust planner snapshot
  `reduce_by_add_group_by_combo`.

A sixth seed row (id=6, completed, books, 70) makes the two-element
keep_only test distinguishable from `total_amount` — (completed,
books) now spans two ids that share the same per-cell value.
…mments

Sweep through every comment touched by the grain work and remove
references to the legacy `add_group_by` / `reduce_by` / `group_by`
field names plus any "was → now" / "currently" / "today" wording that
will rot once the legacy path is gone.

- `MultiStageGrain` docstring no longer enumerates which legacy fields
  feed each slot; it just describes the set operation itself.
- `MultiStageProperties::from_dimension_definition` loses the inline
  "Dimensions only expose `add_group_by` today" note — the code reads
  the field directly on the next line.
- `partition_filter` keeps a single grain-centric docstring; the older
  paragraph that talked about `reduce_by`/`group_by` is dropped.
- Integration test renamed to `test_grain_keep_only_status_top_level`,
  comment rewritten as an intrinsic description rather than a
  comparison against `test_group_by_override`.
- YAML fixture and symbol-test comments drop the "expressed via grain"
  / "not from the sibling add_group_by" framing.
- JS grain test loses the "id=6 added so..." narrative and the
  reference to the Rust planner snapshot it was originally mirrored
  from.
@waralexrom waralexrom requested review from a team as code owners May 28, 2026 11:08
@github-actions github-actions Bot added rust Pull requests that update Rust code javascript Pull requests that update Javascript code labels May 28, 2026
@claude

claude Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Claude finished @waralexrom's task in 4m 28s —— View job


Code Review: Multi-Stage Grain Directive

  • Gather context and read PR diff
  • Check for existing review threads (none found)
  • Review JS/TS changes (schema compiler, validator, transpiler)
  • Review Rust bridge and planner changes
  • Review test coverage
  • Post inline comments for specific issues
  • Post final summary

Summary

This is a well-structured feature PR that introduces the grain: directive as a clean replacement for the reduce_by/group_by/add_group_by legacy fields. The design mirrors the already-shipped filter: directive, which keeps the mental model consistent. Overall the implementation is solid — here are my findings:

Strengths

  • Clean layering: The grain: directive is correctly threaded through all three layers — Joi validation (with .nand() mutual exclusivity), transpiler patterns (measures-only scope), and the Rust bridge/planner. The scope restriction to measures (not dimensions) is consistently enforced across all layers.
  • Good precedence semantics: grain: directive takes priority over legacy fields with a clean fallback path (build_grain_from_directive vs build_grain_from_legacy). Test grain_directive_overrides_legacy_fields explicitly covers this.
  • Thorough test coverage: Validator tests cover happy paths, partial directives, mutual exclusivity, type enforcement, and scope rejection on dimensions. Postgres integration tests cover every grain shape (exclude/keep_only/include as single-element, two-element, and combinations). Rust unit tests verify legacy-to-grain mapping and directive override.
  • Consolidation: The MultiStageGrain struct replaces three separate Vec fields on MultiStageProperties, and the removal of proxy methods (reduce_by(), add_group_by(), group_by()) on MeasureSymbol/DimensionSymbol is a nice cleanup.

Issues Found

  1. Code duplication risk (posted inline on multi_stage_query_planner.rs:283-288): partition_filter and member_partition_by_logical implement the same grain reshape logic. The FIXME comment acknowledges this, but with grain: becoming user-facing, a divergence is more likely to be noticed. Consider extracting a shared helper on MultiStageGrain.

  2. Empty grain: {} validation gap (posted inline on CubeValidator.ts:871-875): The Joi schema allows an empty grain: {} object, which becomes a no-op MultiStageGrain::default(). Unlike filter: where an empty object is semantically valid ("no overrides"), an empty grain: {} is almost certainly a user mistake. Suggest adding .or('exclude', 'keepOnly', 'include').

  3. Silent legacy field override (posted inline on multi_stage.rs:84-87): When grain: is present, legacy fields are silently ignored. This is correct behavior, but users migrating incrementally may be confused. Consider a compile-time warning when both are specified.

Minor Observations

  • The evaluateMultiStageReferences method is typed as MeasureDefinition but called for dimensions too — the grain block is harmlessly skipped for dimensions. This is an existing pattern, not introduced by this PR.
  • The bridge fixture (multiStageGrainFixture) correctly populates only excludeReferences (not both exclude and keepOnly), matching the Joi contract despite the bridge being structurally permissive.
  • Window-path eligibility correctly checks only include (not exclude/keep_only) since those are realized through PARTITION BY at render time. The comment at line 174-176 explains this well.

No security, performance, or correctness issues found. The feature is well-contained and backward-compatible.

Comment on lines +871 to +875
const MultiStageGrain = Joi.object().keys({
exclude: Joi.func(),
keepOnly: Joi.func(),
include: Joi.func(),
}).nand('exclude', 'keepOnly');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Joi MultiStageGrain validator uses .nand('exclude', 'keepOnly') but doesn't enforce that at least one of the three fields (exclude, keepOnly, include) is present. An empty grain: {} would pass validation and silently become a no-op MultiStageGrain::default() on the Rust side.

This is consistent with how MultiStageFilter works (it also allows an empty object), so it may be intentional. But unlike filter: (where an empty object has a meaningful "no filter overrides" reading), an empty grain: {} on a multi-stage measure is almost certainly a user mistake. Consider adding .or('exclude', 'keepOnly', 'include') to catch this early:

Suggested change
const MultiStageGrain = Joi.object().keys({
exclude: Joi.func(),
keepOnly: Joi.func(),
include: Joi.func(),
}).nand('exclude', 'keepOnly');
const MultiStageGrain = Joi.object().keys({
exclude: Joi.func(),
keepOnly: Joi.func(),
include: Joi.func(),
}).nand('exclude', 'keepOnly').or('exclude', 'keepOnly', 'include');

Comment on lines +84 to +87
let grain = match definition.grain()? {
Some(g) => build_grain_from_directive(g, compiler)?,
None => build_grain_from_legacy(&definition.static_data(), compiler)?,
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Rust bridge validates exclude + keep_only mutual exclusivity, which is good defensive layering. However, if a user manages to pass both (e.g., through an older JS pipeline or custom bridge), this error message is the only protection. The Joi .nand() on the JS side should catch it first, but it's reassuring to have it here too.

One edge case to consider: when grain directive is present alongside legacy fields (reduce_by, add_group_by, group_by), the legacy fields are silently ignored (line 84-87 in from_measure_definition). This is the correct precedence, but there's no user-facing warning. Users migrating from legacy to grain: who accidentally leave both might be confused when legacy fields stop having effect. Consider a CubeError::user warning (not error) or at minimum documenting this precedence.

/^measures\.[_a-zA-Z][_a-zA-Z0-9]*\.(timeShift|time_shift)\.[0-9]+\.(timeDimension|time_dimension)$/,
/^measures\.[_a-zA-Z][_a-zA-Z0-9]*\.(reduceBy|reduce_by|groupBy|group_by|addGroupBy|add_group_by)$/,
/^(measures|dimensions)\.[_a-zA-Z][_a-zA-Z0-9]*\.filter\.(exclude|keepOnly|keep_only)$/,
/^measures\.[_a-zA-Z][_a-zA-Z0-9]*\.grain\.(exclude|keepOnly|keep_only|include)$/,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The transpiler pattern supports keep_only as a snake_case alias but doesn't include an exclude snake_case variant (it's already snake_case). Worth noting that include also doesn't need a snake_case alias since it's already all lowercase. This looks correct as-is.

However, I notice the filter pattern on line 22 supports (measures|dimensions) but the grain pattern is restricted to measures only. This is consistent with the Joi validator (which only adds grain to MeasuresSchema, not dimensions). Good — just flagging for visibility that this deliberate asymmetry between filter and grain scope is correctly threaded through all three layers (Joi, transpiler, Rust bridge).

Comment on lines +127 to +136
- name: amount_grain_keep_status_include_id
sql: "{CUBE.total_amount}"
type: sum
multi_stage: true
grain:
keep_only:
- orders.status
include:
- orders.id
`);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test coverage: the keep_only + include combination (amount_grain_keep_status_include_id) tests the most complex grain interaction — keep_only narrows the partition, then include extends it. The expected values (completed=370 broadcast across categories) correctly demonstrate the semantic. This is the test case that would fail first if partition_filter and member_partition_by_logical ever diverge.

@codecov

codecov Bot commented May 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.88%. Comparing base (2c7a614) to head (bfbcf44).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10957      +/-   ##
==========================================
+ Coverage   78.87%   78.88%   +0.01%     
==========================================
  Files         470      470              
  Lines       93338    93346       +8     
  Branches     3457     3461       +4     
==========================================
+ Hits        73616    73633      +17     
+ Misses      19219    19210       -9     
  Partials      503      503              
Flag Coverage Δ
cube-backend 58.53% <100.00%> (+0.07%) ⬆️
cubesql 83.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…level

`grain.keep_only: [status]` with the query scoped to `[category]` yields
an empty partition (keep_only ∩ query_dims = ∅) and collapses to the
grand total — 2250 per category on the integration_multi_stage seed.
Mirrors the legacy `group_by_override` snapshot value-for-value;
content differs only by the measure name in the output column.
@claude

claude Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Claude finished @waralexrom's task in 2m 40s —— View job


Code Review: Multi-Stage Grain Directive (re-review after sync)

  • Gather context and check existing review threads
  • Review PR diff for new changes since last review
  • Review JS/TS changes (schema compiler, validator, transpiler)
  • Review Rust bridge and planner changes
  • Review test coverage and quality
  • Post inline comments for specific issues
  • Post final summary

Summary

Re-reviewed after the latest sync (commits 56909bf and 65b6c47). The new commits are doc cleanup — stripping legacy-field terminology from comments and renaming a test. No logic changes. The core implementation remains solid.

Re-affirmed prior threads still apply (4 unresolved)

All 4 unresolved threads from the prior review are still applicable to the current diff — same lines, same concerns:

  1. Empty grain: {} validation gap (CubeValidator.ts:875) — still no .or('exclude', 'keepOnly', 'include') to reject empty grain objects.
  2. Silent legacy field override (multi_stage.rs:87) — when grain: is present alongside legacy fields, the legacy fields are silently ignored with no user-facing warning.
  3. Transpiler scope observation (CubePropContextTranspiler.ts:23) — informational, the measures-only scope restriction is correctly threaded through all layers.
  4. Test coverage praise (multi-stage-grain.test.ts:136) — the keep_only + include combination test remains the most important regression canary.

New finding

Stale comment with legacy field names (multi_stage_query_planner.rs:527-536): The step-ordering comment in make_queries_descriptions still references reduce_by / group_by / add_group_by:

//   2. reduce_by / group_by — shrink parent grain to the
//      partition grain implied by directives.
//   3. add_group_by — extend the result with extra leaf dims.
//   4. time_shift / filter cleanup.
// Step 2 must precede step 3: `group_by` is a keep-only filter
// and would silently drop dims that step 3 needs to introduce.
//
// The window-path Aggregate inode skips step 2: the leaf stays
// at the parent state plus any add_group_by extension, and the
// window function does the reduce_by collapse at outer level.

This was missed by the docs cleanup in 56909bf. Suggest updating to grain terminology:

//   2. grain.exclude / grain.keep_only — shrink parent grain to
//      the partition grain implied by directives.
//   3. grain.include — extend the result with extra leaf dims.
//   4. time_shift / filter cleanup.
// Step 2 must precede step 3: `keep_only` is an intersection
// and would silently drop dims that step 3 needs to introduce.
//
// The window-path Aggregate inode skips step 2: the leaf stays
// at the parent state plus any include extension, and the
// window function does the exclude/keep_only collapse at outer level.

Overall assessment

The implementation is well-structured and backward-compatible. No security, performance, or correctness issues found. The doc cleanup sweep was thorough — just the one comment block slipped through. All prior review feedback still applies.

…s with grain terms

Sweep the planner module for pre-existing comments that still describe
the multi-stage shaping pipeline in terms of the legacy field names
(`reduce_by`, `add_group_by`, `group_by`). After the refactor the
planner reads `MultiStageGrain` directly, so the comments are reworded
to use `grain.exclude` / `grain.keep_only` / `grain.include` (or just
"grain reshape" where the specific operator doesn't matter).

Touched:
- `make_queries_descriptions` docstring — "grain reshape" instead of
  "add_group_by".
- The JOIN-vs-window-path narrative in `create_multi_stage_inode_member`
  — step list now names grain operators; the window-path note refers to
  "leaf-extending `include`" rather than `add_group_by`.
- `MultiStageQueryDescription::keys_input` docstring — "grain reshape"
  in place of "reduce_by / group_by".

No semantic change.
@claude

claude Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Claude finished @waralexrom's task in 3m 10s —— View job


Review: multi-stage grain directive — 0 issues, 2 suggestions (re-affirmed from prior threads)

Review: Multi-Stage Grain Directive (re-review on sync)

  • Gather context and list existing review threads
  • Read PR diff and changed files
  • Review JS/TS changes (schema compiler, validator, transpiler)
  • Review Rust bridge and planner changes
  • Review test coverage and quality
  • Check for stale/duplicate review threads
  • Post inline comments for new issues
  • Post final summary

Diff reviewed

All 29 changed files across 11 commits (b5a556c..bfbcf44). The latest two commits (56909bf, bfbcf44) are doc cleanup — no logic changes since the last review.

Overall assessment

Clean, well-layered feature with no correctness, security, or performance issues. The grain: directive is correctly threaded through all three layers: Joi validation → transpiler → Rust bridge/planner. The MultiStageGrain struct is a good consolidation of three separate Vec fields, and the legacy-to-grain mapping in build_grain_from_legacy preserves backward compatibility cleanly.

What improved since last review

  • The stale comment block in multi_stage_query_planner.rs:523-535 that still referenced reduce_by/add_group_by has been updated to grain terminology (commit bfbcf44). This was the one new finding from the second review — now resolved.

Re-affirmed prior threads (2 unresolved suggestions)

All 4 unresolved threads from prior reviews are still applicable at the same lines with the same concerns. No new inline comments needed — deduplicating against existing threads.

  1. Empty grain: {} validation gap (CubeValidator.ts:875) — The Joi schema allows grain: {} (no fields), which silently becomes MultiStageGrain::default(). Unlike filter: {} which has a meaningful "no overrides" reading, an empty grain is almost certainly a user mistake. Suggest adding .or('exclude', 'keepOnly', 'include').

  2. Silent legacy field override (multi_stage.rs:84-87) — When grain: is present alongside legacy reduce_by/add_group_by/group_by, the legacy fields are silently ignored. This is correct precedence, but users migrating incrementally may be confused. Consider a compile-time warning or documenting the override behavior.

Informational threads (still valid, no action needed)

  1. Transpiler scope asymmetry (CubePropContextTranspiler.ts:23) — grain: is measures-only while filter: supports both measures and dimensions. This asymmetry is correctly enforced across all three layers.

  2. Test coverage highlight (multi-stage-grain.test.ts:136) — The keep_only + include combination test is the strongest regression canary for planner correctness.

Strengths

  • Correct bridge shape: MultiStageGrainReferencesStatic uses Option<Vec<String>> for all three fields, cleanly distinguishing "not set" from "empty list". The serde rename annotations (excludeReferences, keepOnlyReferences, includeReferences) match the JS-side names exactly.
  • Window-path eligibility: The check at multi_stage_query_planner.rs:175-180 correctly gates on include only — exclude/keep_only are realized through PARTITION BY at render time and don't disqualify the window path. The comment explains the reasoning well.
  • Clean proxy removal: reduce_by(), add_group_by(), group_by() on MeasureSymbol/DimensionSymbol are removed; all callers now go through multi_stage().grain directly. The join_hints_collector.rs change correctly follows the new accessor path.
  • Test matrix: Validator tests cover happy paths, partial directives, mutual exclusivity (.nand), type enforcement (Joi.func()), and scope rejection on dimensions. Rust unit tests cover legacy field mapping and directive override. Integration tests cover every grain shape including the edge case where keep_only references a dim absent from the query (collapsing to grand total).

@waralexrom waralexrom merged commit 76eb75b into master May 29, 2026
134 of 135 checks passed
@waralexrom waralexrom deleted the tesseract-set-of-multi-stage-directives-group-by branch May 29, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

javascript Pull requests that update Javascript code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants