|
| 1 | +--- |
| 2 | +title: Semi-additive (end-of-period) measures |
| 3 | +description: Model balances and other snapshot metrics that sum across entities but not across time, using a multi-stage rank measure and the grain directive to pick each period's last snapshot at any date grain. |
| 4 | +--- |
| 5 | + |
| 6 | +## Use case |
| 7 | + |
| 8 | +A **semi-additive** measure can be summed across some dimensions but not others. The canonical |
| 9 | +example is an **account balance**: summing every daily balance in a month is meaningless — what |
| 10 | +you want is the **balance on the last day of the period** (end-of-month, end-of-quarter, and so |
| 11 | +on). Balances are additive across entities (accounts, products, stores) but *not* across time. |
| 12 | + |
| 13 | +This recipe uses a multi-stage [`rank`][ref-type] measure and the [`grain`][ref-grain] directive |
| 14 | +to pick each period's final snapshot declaratively, so it works for *any* date grain the user |
| 15 | +groups by — no per-grain SQL and no query rewriting. |
| 16 | + |
| 17 | +<Warning> |
| 18 | + |
| 19 | +This pattern requires the Tesseract SQL planner. Set |
| 20 | +[`CUBEJS_TESSERACT_SQL_PLANNER=true`][ref-tesseract-env]. The multi-stage `grain` directive is |
| 21 | +not supported on the legacy planner. |
| 22 | + |
| 23 | +</Warning> |
| 24 | + |
| 25 | +## The data |
| 26 | + |
| 27 | +Consider a fact table of **daily balance snapshots**, one row per account per day: |
| 28 | + |
| 29 | +| `snapshot_date` | `account_id` | `balance` | `snapshot_frequency_key` | |
| 30 | +|---|---|---:|---| |
| 31 | +| 2024-01-30 | A1 | 1,000 | 1 *(daily)* | |
| 32 | +| 2024-01-31 | A1 | 1,200 | 1 *(daily)* | |
| 33 | +| 2024-01-31 | A2 | 500 | 1 *(daily)* | |
| 34 | +| 2024-02-29 | A1 | 900 | 1 *(daily)* | |
| 35 | +| 2024-01-31 | A1 | 1,150 | 2 *(month-end)* | |
| 36 | +| 2024-02-29 | A1 | 880 | 2 *(month-end)* | |
| 37 | + |
| 38 | +- **`snapshot_frequency_key`** discriminates independent snapshot streams — here `1` = daily and |
| 39 | + `2` = a separate official month-end feed (note the daily and month-end balances for A1 on |
| 40 | + 2024-01-31 differ: 1,200 vs 1,150). Keeping it in the partition prevents the two streams from |
| 41 | + contaminating each other's "latest" calculation; consumers then pick a stream with a filter. |
| 42 | + Keep the key in the model even if you start with a single stream — it future-proofs the partition. |
| 43 | +- Note `A2` has **no row on 2024-01-30**. End-of-January for `A2` should be its 01-31 balance |
| 44 | + (500); a *missing* account at period end should contribute **0**, not its last-seen value. That |
| 45 | + edge case is exactly what the partition design below gets right. |
| 46 | + |
| 47 | +A standard date-dimension cube (`date_dim`) joins on `snapshot_date` and exposes the usual grains |
| 48 | +— `calendar_year`, `calendar_quarter`, `calendar_month`, `calendar_week`, and so on. |
| 49 | + |
| 50 | +## Data modeling |
| 51 | + |
| 52 | +The pattern is two multi-stage measures: a `rank` that finds each period's last snapshot date, |
| 53 | +and a `sum` that totals only the rows on that date. |
| 54 | + |
| 55 | +```yaml |
| 56 | +cubes: |
| 57 | + - name: balance_snapshots |
| 58 | + sql_table: analytics.balance_snapshots |
| 59 | + |
| 60 | + joins: |
| 61 | + - name: date_dim |
| 62 | + relationship: many_to_one |
| 63 | + sql: "{CUBE}.snapshot_date = {date_dim}.date_val" |
| 64 | + |
| 65 | + dimensions: |
| 66 | + - name: snapshot_date_key |
| 67 | + sql: snapshot_date |
| 68 | + type: time |
| 69 | + - name: snapshot_frequency_key |
| 70 | + sql: snapshot_frequency_key |
| 71 | + type: number |
| 72 | + - name: account_id |
| 73 | + sql: account_id |
| 74 | + type: string |
| 75 | + |
| 76 | + measures: |
| 77 | + # Plain additive base measure — safe to sum across accounts AND days. |
| 78 | + - name: balance |
| 79 | + sql: balance |
| 80 | + type: sum |
| 81 | + |
| 82 | + # ── Step 1: rank snapshot dates within each period (latest = 1) ────────────── |
| 83 | + - name: eop_rank |
| 84 | + public: false |
| 85 | + multi_stage: true |
| 86 | + type: rank |
| 87 | + order_by: |
| 88 | + - sql: "{snapshot_date_key}" |
| 89 | + dir: desc |
| 90 | + # A single grain block shapes both stages of this measure. |
| 91 | + grain: |
| 92 | + # include = the leaf GROUP BY. Pushes snapshot_date_key into the leaf so the |
| 93 | + # rank has a physical column to order by (without it: "missing FROM-clause |
| 94 | + # entry"), and snapshot_frequency_key so it survives into the partition below. |
| 95 | + include: |
| 96 | + - snapshot_date_key |
| 97 | + - snapshot_frequency_key |
| 98 | + # keep_only = the window PARTITION BY. List ONLY the date grains a query might |
| 99 | + # group by, plus snapshot_frequency_key. Everything else (account_id) is |
| 100 | + # dropped from the partition, so "rank = 1" means the period's GLOBAL last |
| 101 | + # snapshot date — identical for every account. |
| 102 | + keep_only: |
| 103 | + - snapshot_frequency_key |
| 104 | + - date_dim.calendar_year |
| 105 | + - date_dim.calendar_quarter |
| 106 | + - date_dim.calendar_month |
| 107 | + - date_dim.calendar_week |
| 108 | + # …add every date-dim grain a query might group by. A MISSING entry silently |
| 109 | + # over-counts (period computed too coarse); extra entries are harmless. |
| 110 | + |
| 111 | + # ── Step 2: sum the base measure, keeping only the period's last snapshot ───── |
| 112 | + - name: balance_eop |
| 113 | + title: End-of-Period Balance |
| 114 | + multi_stage: true |
| 115 | + type: sum |
| 116 | + sql: "{balance}" |
| 117 | + # Repeat the leaf members (grain.include) so the rank filter is evaluated per |
| 118 | + # snapshot date — same canonical form as eop_rank, no add_group_by. |
| 119 | + grain: |
| 120 | + include: |
| 121 | + - snapshot_date_key |
| 122 | + - snapshot_frequency_key |
| 123 | + filters: |
| 124 | + - sql: "{eop_rank} = 1" |
| 125 | +``` |
| 126 | +
|
| 127 | +That is the whole pattern. `balance_eop` now returns the correct end-of-period balance at **any** |
| 128 | +grain the consumer groups by — no rewrite logic and no per-grain measures. |
| 129 | + |
| 130 | +## What SQL this generates |
| 131 | + |
| 132 | +<Note> |
| 133 | + |
| 134 | +The SQL below is **illustrative** — simplified to show how `grain.include` and `grain.keep_only` |
| 135 | +map onto the `GROUP BY` and `PARTITION BY` clauses. The SQL Cube actually emits will differ in |
| 136 | +detail (CTE naming, column aliasing, extra wrapping subqueries, dialect-specific syntax). Inspect |
| 137 | +the real output for your setup via the [`/v1/sql` endpoint][ref-sql-api]. |
| 138 | + |
| 139 | +</Note> |
| 140 | + |
| 141 | +For the query: |
| 142 | + |
| 143 | +```sql |
| 144 | +SELECT calendar_year, calendar_month, MEASURE(balance_snapshots.balance_eop) |
| 145 | +FROM balance_snapshots |
| 146 | +GROUP BY 1, 2 |
| 147 | +``` |
| 148 | + |
| 149 | +Cube compiles the multi-stage measure into two stacked stages: |
| 150 | + |
| 151 | +```sql |
| 152 | +-- STAGE 1 (leaf): the GROUP BY. Grain = queried dims + grain.include members. |
| 153 | +WITH leaf AS ( |
| 154 | + SELECT |
| 155 | + calendar_year, |
| 156 | + calendar_month, |
| 157 | + snapshot_frequency_key, --\__ injected by grain.include |
| 158 | + snapshot_date_key, --/ (not in the user's SELECT) |
| 159 | + SUM(balance) AS balance |
| 160 | + FROM balance_snapshots |
| 161 | + JOIN date_dim ON balance_snapshots.snapshot_date = date_dim.date_val |
| 162 | + GROUP BY 1, 2, 3, 4 -- ← grain.include forced cols 3 & 4 in here |
| 163 | +), |
| 164 | +
|
| 165 | +-- STAGE 2 (window): the rank. PARTITION BY = grain AFTER keep_only. |
| 166 | +ranked AS ( |
| 167 | + SELECT |
| 168 | + leaf.*, |
| 169 | + RANK() OVER ( |
| 170 | + PARTITION BY calendar_year, calendar_month, snapshot_frequency_key |
| 171 | + -- ↑ only date grains + frequency survived keep_only; |
| 172 | + -- account_id was stripped out here. |
| 173 | + ORDER BY snapshot_date_key DESC -- ← order_by |
| 174 | + ) AS eop_rank |
| 175 | + FROM leaf |
| 176 | +) |
| 177 | +
|
| 178 | +-- FINAL: sum the surviving rows. |
| 179 | +SELECT calendar_year, calendar_month, SUM(balance) AS balance_eop |
| 180 | +FROM ranked |
| 181 | +WHERE eop_rank = 1 -- ← filters: {eop_rank} = 1 |
| 182 | +GROUP BY 1, 2 |
| 183 | +``` |
| 184 | + |
| 185 | +## Why both `include` and `keep_only` are needed |
| 186 | + |
| 187 | +Both keys live in the **same `grain` block** but act on **different clauses of different stages** |
| 188 | +and pull in opposite directions: |
| 189 | + |
| 190 | +| | `grain.include` | `grain.keep_only` | |
| 191 | +|---|---|---| |
| 192 | +| Acts on | Stage 1 `GROUP BY` (leaf grain) | Stage 2 `PARTITION BY` (window) | |
| 193 | +| Direction | **adds** members | **restricts** members | |
| 194 | +| Purpose | make `snapshot_date_key` exist so `order_by` can reference it | scope the rank to the date grain only, dropping entity dims | |
| 195 | +| Omit it and… | `missing FROM-clause entry for snapshot_date_key` | rank computed per-account, not per period-end | |
| 196 | + |
| 197 | +<Note> |
| 198 | + |
| 199 | +`grain.include` is the canonical form of the legacy [`add_group_by`][ref-add-group-by] directive |
| 200 | +(and `keep_only` / `exclude` are the canonical forms of `group_by` / `reduce_by`). When a measure |
| 201 | +sets a `grain` block, those legacy directives are ignored — so keep everything inside `grain` |
| 202 | +rather than mixing the two styles. |
| 203 | + |
| 204 | +</Note> |
| 205 | + |
| 206 | +### The partition is the whole point |
| 207 | + |
| 208 | +```sql |
| 209 | +-- WITHOUT keep_only — account_id stays in the partition: |
| 210 | +PARTITION BY calendar_year, calendar_month, account_id, snapshot_frequency_key |
| 211 | +-- → rank = 1 is "latest date THIS account appears". An account absent on Jan 31 but |
| 212 | +-- present Jan 30 ranks its Jan-30 row = 1 → counted. WRONG. |
| 213 | +
|
| 214 | +-- WITH keep_only — account_id dropped: |
| 215 | +PARTITION BY calendar_year, calendar_month, snapshot_frequency_key |
| 216 | +-- → rank = 1 is the month's GLOBAL last snapshot date (Jan 31), same for every account. |
| 217 | +-- An account with no Jan-31 row has no rank-1 row → contributes 0. CORRECT. |
| 218 | +``` |
| 219 | + |
| 220 | +This "missing at period end ⇒ 0" behavior is usually what end-of-period reporting wants, and it |
| 221 | +falls out automatically once entity dimensions are excluded from the partition. |
| 222 | + |
| 223 | +## Gotchas |
| 224 | + |
| 225 | +- **Tesseract only.** The `grain` directive requires |
| 226 | + [`CUBEJS_TESSERACT_SQL_PLANNER=true`][ref-tesseract-env]. |
| 227 | +- **Don't mix `grain` with the legacy directives.** When a measure sets a `grain` block, the |
| 228 | + legacy [`group_by`][ref-group-by] / [`reduce_by`][ref-reduce-by] / [`add_group_by`][ref-add-group-by] |
| 229 | + directives on that measure are ignored. Put leaf members under `grain.include`, not `add_group_by`. |
| 230 | +- **Members only.** Multi-stage measures reference **members**, never `{CUBE}.raw_column`. Wrap |
| 231 | + raw columns in a base measure or dimension first (here, `balance` and `snapshot_date_key`). |
| 232 | +- **`order_by` must be in the leaf grain.** A `rank` can only order by a column present in the |
| 233 | + leaf — that is why `snapshot_date_key` must be listed under `grain.include` on the rank measure |
| 234 | + (and on the consuming sum). |
| 235 | +- **`keep_only` takes explicit member paths only** — no cube-level or wildcard references. |
| 236 | + Enumerate every date grain a query might group by. A missing grain over-counts silently; extras |
| 237 | + are harmless. |
| 238 | +- **Default the frequency.** Consumers should constrain `snapshot_frequency_key` (for example via |
| 239 | + a view's [`default_filters`][ref-default-filters]) so they don't mix daily and month-end streams |
| 240 | + unintentionally. |
| 241 | + |
| 242 | +## Related |
| 243 | + |
| 244 | +<CardGroup cols={2}> |
| 245 | + <Card title="grain reference" icon="layer-group" href="/reference/data-modeling/measures#grain"> |
| 246 | + Full semantics of the `keep_only`, `exclude`, and `include` keys. |
| 247 | + </Card> |
| 248 | + <Card title="Calculating share of total" icon="percent" href="/recipes/data-modeling/share-of-total"> |
| 249 | + Use `grain` to compute each row's contribution to a group or grand total. |
| 250 | + </Card> |
| 251 | +</CardGroup> |
| 252 | + |
| 253 | +[ref-type]: /reference/data-modeling/measures#type |
| 254 | +[ref-grain]: /reference/data-modeling/measures#grain |
| 255 | +[ref-add-group-by]: /reference/data-modeling/measures#add_group_by |
| 256 | +[ref-group-by]: /reference/data-modeling/measures#group_by |
| 257 | +[ref-reduce-by]: /reference/data-modeling/measures#reduce_by |
| 258 | +[ref-tesseract-env]: /reference/configuration/environment-variables#cubejs_tesseract_sql_planner |
| 259 | +[ref-sql-api]: /reference/core-data-apis/rest-api/reference |
| 260 | +[ref-default-filters]: /reference/data-modeling/view#default_filters |
0 commit comments