Skip to content

Tesseract: outer CTE missing GROUP BY for multi_stage type: number measures with inline aggregate SQL #10864

@akshatpeak

Description

@akshatpeak

Describe the bug
When a multi_stage: true measure uses type: number with inline aggregate SQL (e.g. STDDEV({x})/AVG({x}), MAX({x})) and is queried with a dimension that is NOT declared in add_group_by, the Tesseract planner emits an outer CTE that selects the dimension alongside aggregate expressions but omits the GROUP BY clause. The database rejects with:

Error: SQL compilation error: ["fk_aggregate"."__"] is not a valid group by expression

Switching the measure to a dedicated aggregate type (type: max|avg|sum|min|count) with a plain {measure} reference produces correct SQL. So the bug is specific to type: number + inline aggregate SQL.

For composed-aggregate expressions like STDDEV({x}) / NULLIF(AVG({x}), 0) there is no dedicated type that fits, so the typed-aggregate workaround doesn't apply — those queries remain broken.

To Reproduce

  1. Define the cube below.
  2. Run a query asking for cv_amount (multi_stage with inline STDDEV/AVG) with status as a dimension (not in add_group_by).
  3. Observe SQL compilation error from the database.

Example query:

{
  "measures": ["Orders.cv_amount"],
  "dimensions": ["Orders.status"]
}

Expected behavior
cte_2 (the outer aggregation CTE) should include GROUP BY 1 for the status column it selects, matching the pattern Tesseract uses correctly when the measure has a dedicated aggregate type. Query should return one row per status with the multi-stage value.

Generated SQL (from generateSql: true)

-- cte_0: inner aggregation — CORRECT, respects add_group_by
cte_0 AS (
  SELECT
    "Orders".status AS "Orders__status",
    "Orders".id AS "Orders__id",
    sum(1) AS "Orders__amount_count"
  FROM Orders
  GROUP BY 1, 2
)

-- cte_1: rename — fine, no aggregation
cte_1 AS (
  SELECT
    "fk_aggregate"."Orders__status" AS "Orders__status",
    "fk_aggregate"."Orders__amount_count" AS "Orders__inner"
  FROM (SELECT * FROM cte_0) AS "fk_aggregate"
)

-- cte_2: outer aggregation — BROKEN
cte_2 AS (
  SELECT
    "fk_aggregate"."Orders__status" AS "Orders__status",       -- non-aggregate column
    STDDEV("fk_aggregate"."Orders__inner")
      / NULLIF(AVG("fk_aggregate"."Orders__inner"), 0)
      AS "Orders__cv_amount"                                   -- aggregate expression
  FROM (SELECT * FROM cte_1) AS "fk_aggregate"
  -- ⚠ MISSING: GROUP BY 1
)

Minimally reproducible Cube Schema

cube(`Orders`, {
  sql: `
    SELECT 1 AS id, 100 AS amount, 'new' AS status UNION ALL
    SELECT 2 AS id, 200 AS amount, 'new' AS status UNION ALL
    SELECT 3 AS id, 300 AS amount, 'processed' AS status UNION ALL
    SELECT 4 AS id, 500 AS amount, 'processed' AS status UNION ALL
    SELECT 5 AS id, 600 AS amount, 'shipped' AS status
  `,
  measures: {
    amount_count: {
      sql: `amount`,
      type: `count`,
    },
    // ✗ BROKEN: type: number + inline aggregate + query dim outside add_group_by
    cv_amount: {
      sql: `STDDEV(${amount_count}) / NULLIF(AVG(${amount_count}), 0)`,
      type: `number`,
      multi_stage: true,
      add_group_by: [CUBE.id],
    },
    // ✓ WORKS: dedicated aggregate type with plain measure reference
    max_amount: {
      sql: `${amount_count}`,
      type: `max`,
      multi_stage: true,
      add_group_by: [CUBE.id],
    },
  },
  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true,
      public: true,
    },
    status: {
      sql: `status`,
      type: `string`,
    },
  },
});

Querying cv_amount + status fails. Querying max_amount + status succeeds. Same add_group_by, same query shape — the only difference is type: number + inline aggregate vs. dedicated type + measure reference.

Version:

Reproduced on 1.4.0, 1.6.33, and 1.6.44. Long-standing Tesseract SQL generation bug — not a recent regression.

Flag config: CUBEJS_TESSERACT_SQL_PLANNER=true.

Reproduced against Snowflake; likely target-DB-agnostic since the broken SQL would fail any standards-compliant DB.

Version:
1.6.44

Additional context

  • Related: type:number_agg seems to not be working #10799 (same bug family — Tesseract type:number + DENSE_RANK() OVER (...) window function emits invalid SQL; also references fk_aggregate in the error). May share root cause in the outer-CTE SQL emitter.
  • Confirmed not the same as Unexpected GROUP BY when using multi_stage and referencing dimensions with {} #9241 (which reports the opposite direction — unexpected GROUP BY in the inner CTE, not missing in the outer).
  • Workaround: Use dedicated aggregate types (max|avg|sum|min|count) with a plain {measure} reference instead of inline aggregate functions under type: number. This works for simple cases but does NOT work for composed-aggregate ratios like STDDEV/AVG (CV) or PERCENTILE_CONT(...) WITHIN GROUP (...) — those have no dedicated type and remain stuck.
  • add_group_by does NOT fix this: confirmed via generateSql: true that adding the query dim to add_group_by makes the inner CTE correct but the outer CTE still omits its GROUP BY. The bug is in the outer-CTE SQL emitter, not in add_group_by resolution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions