feat(tesseract): multi-stage grain directive#10957
Conversation
… measures
Introduce `grain:` directive on multi-stage measures with fields
`mode (relative|fixed)`, `exclude`, `keepOnly`, `include` mirroring the
existing `filter:` directive shape. All three list fields are
`Joi.func()` returning arrays of refs and are wired into the prop
transpiler so authors can write `({ CUBE }) => [CUBE.dim]`.
`exclude` and `keepOnly` are mutually exclusive via `.nand(...)`,
matching the `MultiStageFilter` contract.
`CubeEvaluator.evaluateMultiStageReferences` now resolves grain's
function fields into `*References` siblings that the native bridge will
read. The directive is only added to multi-stage measures; dimensions
are intentionally out of scope for now, and the legacy
`group_by/add_group_by/reduce_by` fields keep working unchanged.
Mirror the multi-stage filter bridge for the new `grain:` directive: - New static-only bridge `MultiStageGrainReferences` carrying `mode/exclude/keep_only/include` (all reference name lists). - `MeasureDefinition::grain()` accessor — only on measures, dimensions stay out of scope until the directive expands. - Tesseract mocks: `MockMultiStageGrainReferences`, `grain:` field on the YAML measure parser with cube-name qualification, wired into `MockMeasureDefinition`, plus YAML round-trip unit tests covering full / partial / cube-qualified shapes. Backend-native bridge harness: - Register `multiStageGrain` in the bridge_registry and add an empty `invoke_multi_stage_grain` (static-only, like the filter sibling). - Extend `invoke_measure_definition` to record the new `grain` getter so the dispatcher/meta drift guard stays green. - JS fixture + coverage rows for `multiStageGrain`, and `grain` added to the `measureDefinition` expected meta. The fixture populates only one of `excludeReferences`/`keepOnlyReferences` because the `.nand` enforcement currently lives on the JS Joi validator; the bridge itself is structurally permissive.
…StageGrain Replace the three independent `add_group_by` / `reduce_by` / `group_by` fields on `MultiStageProperties` with a single `MultiStageGrain` struct carrying `mode` + `exclude` / `keep_only` / `include`, modeled as a set operation on the inherited grain context. For measures the grain is sourced from the new `grain:` directive when the schema provides it; otherwise it falls back to the older `add_group_by` / `reduce_by` / `group_by` fields with the mapping: - add_group_by -> include - reduce_by -> exclude - group_by -> keep_only Fallback always uses `mode: Relative` — the planner does not branch on mode for grain yet, so observable behavior is unchanged. The `.nand` between `exclude` and `keep_only` on the directive is enforced at the symbol-build layer, matching the wording of the filter equivalent. For dimensions the grain still carries only `include` (from `add_group_by`); the `grain:` directive is currently scoped to measures. `MeasureSymbol::reduce_by/add_group_by/group_by` and `DimensionSymbol::add_group_by` are kept as thin proxies on the new struct so existing call sites continue to compile and behave the same. Tests: existing direct-field accesses moved to the new shape; a YAML fixture and two unit tests cover the legacy-mapping and directive-overrides-legacy paths.
Replace the three separate `reduce_by` / `add_group_by` / `group_by` fields on `MultiStageInodeMember` with the shared `MultiStageGrain`, and route every planner call site through `.grain()` rather than through per-field accessors. The inode now exposes only `grain()` and `time_shift()`; partition-shape helpers (`partition_filter`, `member_partition_by_logical`) take `&MultiStageGrain` directly. Add `Default` impls on `MultiStageGrain` and `MultiStageGrainMode` (Relative). Reuse it from `MultiStageProperties::from_dimension_definition` and at every "no directive, no legacy fields" fallback in the planner. The `use_window_path` check still gates only on `include` — `exclude` and `keep_only` change partition shape but are honoured by the window path through PARTITION BY at render time. Comment added so future work that extends window-path coverage revisits this assumption. Schema fixture and integration test for `grain.keep_only` rephrased without "legacy/new" wording so they stay accurate when the older fields go away.
…r of grain
Remove `MeasureSymbol::reduce_by/add_group_by/group_by` and
`DimensionSymbol::add_group_by`. Their only role was to hide the
`multi_stage().grain.{exclude,include,keep_only}` access behind the old
field names; now both production call sites
(`join_hints_collector`, `logical_plan::multistage::dimension::join_dimensions`)
and the symbol-level tests read the grain directly.
`MeasureSymbol` and `DimensionSymbol` are now grain-agnostic apart from
exposing `multi_stage() -> Option<&MultiStageProperties>`; everything
partition-shape-related lives on `MultiStageGrain`.
Honour `MultiStageGrainMode::Fixed` in `partition_filter`:
- `Relative` (existing): start from parent state's dims and apply
`exclude` → `keep_only`.
- `Fixed`: discard the parent state, start from the original query
context (`root_state.dimensions/time_dimensions`), and intersect
with `keep_only`. `exclude` is undefined here per the spec
(validator forbids `exclude` + `keep_only` together).
`include` continues to be appended via `add_dimensions(...)` outside
the helper, so the full FIXED result is
`(root_dims ∩ keep_only) ∪ include`, and FIXED with no keep_only and
no include collapses to an empty grain (grand total).
`member_partition_by_logical` is intentionally not updated: its input
is `description.state()` which has already been shaped by
`partition_filter`, so re-applying `exclude` / `keep_only` over the
narrowed set is idempotent in every allowed combination.
Fixtures and integration smoke tests cover three top-level scenarios:
grand total (FIXED, no narrowing), FIXED + keep_only with the dim
present in the query, and FIXED + keep_only with the dim absent.
Snapshots are filled when the suite runs against Postgres.
Roll back `MultiStageGrainMode` and the `mode` field on `MultiStageGrain`. The FIXED-vs-RELATIVE distinction is technically implementable but its behaviour is difficult to communicate to schema authors; ship grain without it for now and keep only the always-relative semantics (`exclude` removes from the parent grain, `keep_only` intersects with it, `include` adds). Removed across the stack: - Rust bridge: `mode: Option<String>` on `MultiStageGrainReferencesStatic`, the corresponding mock builder field, and the YAML parser entry. - Planner symbols: the `MultiStageGrainMode` enum, the `mode` field on `MultiStageGrain`, and every site that read or wrote it (`build_grain_from_directive`, `build_grain_from_legacy`, `apply_to_deps`). - Planner core: `partition_filter` collapses back to a single relative branch (no `match grain.mode`, no `root_dims` parameter). - Tests: the three FIXED integration smoke tests, the `mock_measure_definition` yaml-grain asserts that checked `static_data.mode`, and every `assert_eq!(ms.grain.mode, …)` in `tests/measure_symbol.rs`. Test fixtures (`integration_multi_stage.yaml`, `multi_stage_filter.yaml`) no longer set `mode:`. - JS schema: `mode` removed from the `MultiStageGrain` Joi schema and from the `MultiStageGrainDirective` TS type. The `multi-stage grain — invalid mode rejected` validator test is gone. - Backend-native bridge harness: `mode` dropped from the JS fixture and from the bridge coverage expected-fields list. FIXED implementation is recoverable via `git revert 7ef91664d5` (the commit that introduced the `partition_filter` mode branch).
Mirror `multi-stage-filter.test.ts`: a single inline YAML cube with 6 rows wired through `prepareYamlCompiler` and `dbRunner.runQueryTest`, covering every shape of `grain:` end-to-end (JS → Joi → transpiler → Tesseract → SQL). Tests are gated on `nativeSqlPlanner`; the legacy planner does not implement the directive. Eight scenarios: - exclude / keep_only / include with a single dim - keep_only with the listed dim absent from the query (collapse to grand total) - exclude / keep_only / include with a two-element array each - keep_only + include combined: the keep narrows the inherited grain to `[status]`, include extends the leaf to `[status, id]`, outer re-aggregates → per-status total broadcast across categories. This is the JS analogue of the Rust planner snapshot `reduce_by_add_group_by_combo`. A sixth seed row (id=6, completed, books, 70) makes the two-element keep_only test distinguishable from `total_amount` — (completed, books) now spans two ids that share the same per-cell value.
…mments Sweep through every comment touched by the grain work and remove references to the legacy `add_group_by` / `reduce_by` / `group_by` field names plus any "was → now" / "currently" / "today" wording that will rot once the legacy path is gone. - `MultiStageGrain` docstring no longer enumerates which legacy fields feed each slot; it just describes the set operation itself. - `MultiStageProperties::from_dimension_definition` loses the inline "Dimensions only expose `add_group_by` today" note — the code reads the field directly on the next line. - `partition_filter` keeps a single grain-centric docstring; the older paragraph that talked about `reduce_by`/`group_by` is dropped. - Integration test renamed to `test_grain_keep_only_status_top_level`, comment rewritten as an intrinsic description rather than a comparison against `test_group_by_override`. - YAML fixture and symbol-test comments drop the "expressed via grain" / "not from the sibling add_group_by" framing. - JS grain test loses the "id=6 added so..." narrative and the reference to the Rust planner snapshot it was originally mirrored from.
|
Claude finished @waralexrom's task in 4m 28s —— View job Code Review: Multi-Stage Grain Directive
SummaryThis is a well-structured feature PR that introduces the Strengths
Issues Found
Minor Observations
No security, performance, or correctness issues found. The feature is well-contained and backward-compatible. |
| const MultiStageGrain = Joi.object().keys({ | ||
| exclude: Joi.func(), | ||
| keepOnly: Joi.func(), | ||
| include: Joi.func(), | ||
| }).nand('exclude', 'keepOnly'); |
There was a problem hiding this comment.
The Joi MultiStageGrain validator uses .nand('exclude', 'keepOnly') but doesn't enforce that at least one of the three fields (exclude, keepOnly, include) is present. An empty grain: {} would pass validation and silently become a no-op MultiStageGrain::default() on the Rust side.
This is consistent with how MultiStageFilter works (it also allows an empty object), so it may be intentional. But unlike filter: (where an empty object has a meaningful "no filter overrides" reading), an empty grain: {} on a multi-stage measure is almost certainly a user mistake. Consider adding .or('exclude', 'keepOnly', 'include') to catch this early:
| const MultiStageGrain = Joi.object().keys({ | |
| exclude: Joi.func(), | |
| keepOnly: Joi.func(), | |
| include: Joi.func(), | |
| }).nand('exclude', 'keepOnly'); | |
| const MultiStageGrain = Joi.object().keys({ | |
| exclude: Joi.func(), | |
| keepOnly: Joi.func(), | |
| include: Joi.func(), | |
| }).nand('exclude', 'keepOnly').or('exclude', 'keepOnly', 'include'); |
| let grain = match definition.grain()? { | ||
| Some(g) => build_grain_from_directive(g, compiler)?, | ||
| None => build_grain_from_legacy(&definition.static_data(), compiler)?, | ||
| }; |
There was a problem hiding this comment.
The Rust bridge validates exclude + keep_only mutual exclusivity, which is good defensive layering. However, if a user manages to pass both (e.g., through an older JS pipeline or custom bridge), this error message is the only protection. The Joi .nand() on the JS side should catch it first, but it's reassuring to have it here too.
One edge case to consider: when grain directive is present alongside legacy fields (reduce_by, add_group_by, group_by), the legacy fields are silently ignored (line 84-87 in from_measure_definition). This is the correct precedence, but there's no user-facing warning. Users migrating from legacy to grain: who accidentally leave both might be confused when legacy fields stop having effect. Consider a CubeError::user warning (not error) or at minimum documenting this precedence.
| /^measures\.[_a-zA-Z][_a-zA-Z0-9]*\.(timeShift|time_shift)\.[0-9]+\.(timeDimension|time_dimension)$/, | ||
| /^measures\.[_a-zA-Z][_a-zA-Z0-9]*\.(reduceBy|reduce_by|groupBy|group_by|addGroupBy|add_group_by)$/, | ||
| /^(measures|dimensions)\.[_a-zA-Z][_a-zA-Z0-9]*\.filter\.(exclude|keepOnly|keep_only)$/, | ||
| /^measures\.[_a-zA-Z][_a-zA-Z0-9]*\.grain\.(exclude|keepOnly|keep_only|include)$/, |
There was a problem hiding this comment.
Minor: The transpiler pattern supports keep_only as a snake_case alias but doesn't include an exclude snake_case variant (it's already snake_case). Worth noting that include also doesn't need a snake_case alias since it's already all lowercase. This looks correct as-is.
However, I notice the filter pattern on line 22 supports (measures|dimensions) but the grain pattern is restricted to measures only. This is consistent with the Joi validator (which only adds grain to MeasuresSchema, not dimensions). Good — just flagging for visibility that this deliberate asymmetry between filter and grain scope is correctly threaded through all three layers (Joi, transpiler, Rust bridge).
| - name: amount_grain_keep_status_include_id | ||
| sql: "{CUBE.total_amount}" | ||
| type: sum | ||
| multi_stage: true | ||
| grain: | ||
| keep_only: | ||
| - orders.status | ||
| include: | ||
| - orders.id | ||
| `); |
There was a problem hiding this comment.
Nice test coverage: the keep_only + include combination (amount_grain_keep_status_include_id) tests the most complex grain interaction — keep_only narrows the partition, then include extends it. The expected values (completed=370 broadcast across categories) correctly demonstrate the semantic. This is the test case that would fail first if partition_filter and member_partition_by_logical ever diverge.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10957 +/- ##
==========================================
+ Coverage 78.87% 78.88% +0.01%
==========================================
Files 470 470
Lines 93338 93346 +8
Branches 3457 3461 +4
==========================================
+ Hits 73616 73633 +17
+ Misses 19219 19210 -9
Partials 503 503
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…level `grain.keep_only: [status]` with the query scoped to `[category]` yields an empty partition (keep_only ∩ query_dims = ∅) and collapses to the grand total — 2250 per category on the integration_multi_stage seed. Mirrors the legacy `group_by_override` snapshot value-for-value; content differs only by the measure name in the output column.
|
Claude finished @waralexrom's task in 2m 40s —— View job Code Review: Multi-Stage Grain Directive (re-review after sync)
SummaryRe-reviewed after the latest sync (commits Re-affirmed prior threads still apply (4 unresolved)All 4 unresolved threads from the prior review are still applicable to the current diff — same lines, same concerns:
New findingStale comment with legacy field names ( // 2. reduce_by / group_by — shrink parent grain to the
// partition grain implied by directives.
// 3. add_group_by — extend the result with extra leaf dims.
// 4. time_shift / filter cleanup.
// Step 2 must precede step 3: `group_by` is a keep-only filter
// and would silently drop dims that step 3 needs to introduce.
//
// The window-path Aggregate inode skips step 2: the leaf stays
// at the parent state plus any add_group_by extension, and the
// window function does the reduce_by collapse at outer level.This was missed by the docs cleanup in // 2. grain.exclude / grain.keep_only — shrink parent grain to
// the partition grain implied by directives.
// 3. grain.include — extend the result with extra leaf dims.
// 4. time_shift / filter cleanup.
// Step 2 must precede step 3: `keep_only` is an intersection
// and would silently drop dims that step 3 needs to introduce.
//
// The window-path Aggregate inode skips step 2: the leaf stays
// at the parent state plus any include extension, and the
// window function does the exclude/keep_only collapse at outer level.Overall assessmentThe implementation is well-structured and backward-compatible. No security, performance, or correctness issues found. The doc cleanup sweep was thorough — just the one comment block slipped through. All prior review feedback still applies. |
…s with grain terms Sweep the planner module for pre-existing comments that still describe the multi-stage shaping pipeline in terms of the legacy field names (`reduce_by`, `add_group_by`, `group_by`). After the refactor the planner reads `MultiStageGrain` directly, so the comments are reworded to use `grain.exclude` / `grain.keep_only` / `grain.include` (or just "grain reshape" where the specific operator doesn't matter). Touched: - `make_queries_descriptions` docstring — "grain reshape" instead of "add_group_by". - The JOIN-vs-window-path narrative in `create_multi_stage_inode_member` — step list now names grain operators; the window-path note refers to "leaf-extending `include`" rather than `add_group_by`. - `MultiStageQueryDescription::keys_input` docstring — "grain reshape" in place of "reduce_by / group_by". No semantic change.
|
Claude finished @waralexrom's task in 3m 10s —— View job Review: multi-stage grain directive — 0 issues, 2 suggestions (re-affirmed from prior threads)Review: Multi-Stage Grain Directive (re-review on sync)
Diff reviewedAll 29 changed files across 11 commits ( Overall assessmentClean, well-layered feature with no correctness, security, or performance issues. The What improved since last review
Re-affirmed prior threads (2 unresolved suggestions)All 4 unresolved threads from prior reviews are still applicable at the same lines with the same concerns. No new inline comments needed — deduplicating against existing threads.
Informational threads (still valid, no action needed)
Strengths
|
Summary
Introduces the
grain:directive on multi-stage measures — declarative partition shaping (exclude/keep_only/include) that supersedes the olderadd_group_by/reduce_by/group_byfields. Mirrors the design of the already-shippedfilter:directive (#10827).Changes
CubeEvaluatoracceptgrain.exclude/keepOnly/include(allJoi.func()) and resolve them into*Referencesarrays.MultiStageGrainReferencesplus its mock and YAML test parser; the backend-native bridge harness registers the new bridge with full coverage.MultiStageGrainstruct collapses the three independent fields onMultiStagePropertiesinto a single set-operation value; the planner reads grain directly —partition_filterandmember_partition_by_logicaltake&MultiStageGrain.MultiStageInodeMembercarries aMultiStageGraininstead of three separate vecs; legacy proxy methods onMeasureSymbol/DimensionSymbolremoved.exclude/keep_only/includeat single-element, two-element-array, and thekeep_only + includecombination.Testing
cargo test --package cubesqlplanner --lib— 966 passed.jest dist/test/unit/cube-validator.test.js— 103 passed.jest --config jest-bridge.config.js(backend-native bridge harness) — 192 passed.multi-stage-grain.test.ts) is gated onnativeSqlPlannerand skips without Postgres; numeric verification will happen in CI.