Skip to content

Commit caf0a8e

Browse files
committed
add back long-form aggregation
1 parent c6dd4a9 commit caf0a8e

5 files changed

Lines changed: 485 additions & 60 deletions

File tree

CHANGELOG.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,21 @@
33
### Added
44

55
- New `aggregate` SETTING on Identity-stat layers (point, line, area, bar, ribbon,
6-
range, segment, arrow, rule, text). Collapses each group to a single row by
7-
replacing every numeric mapping in place with its aggregated value. Accepts a
8-
single string or array of strings; entries are either unprefixed defaults
9-
(`'mean'`) or per-aesthetic targets (`'y:max'`, `'color:median'`). Up to two
10-
defaults may be supplied — the first applies to lower-half aesthetics plus all
11-
non-range layers, the second to upper-half (`max`/`end` suffix). Numeric
12-
mappings without a target or applicable default are dropped with a warning.
6+
range, segment, arrow, rule, text). By default it collapses each group to a
7+
single row by replacing every numeric mapping in place with its aggregated
8+
value. Accepts a single string or array of strings; entries are either
9+
unprefixed defaults (`'mean'`) or per-aesthetic targets (`'y:max'`,
10+
`'color:median'`). Up to two defaults may be supplied — the first applies to
11+
lower-half aesthetics plus all non-range layers, the second to upper-half
12+
(`max`/`end` suffix). Numeric mappings without a target or applicable default
13+
are dropped with a warning. Targeting the same aesthetic more than once
14+
(e.g. `aggregate => ('y:min', 'y:max')`) produces one row per function with
15+
a synthetic `aggregate` column tagging each row, available for `REMAPPING` to
16+
another aesthetic; targets with a single function and the unprefixed defaults
17+
are reused unchanged across the exploded rows. The `aggregate` column's value
18+
is built from the dedup-and-joined function names of all exploded targets at
19+
each row, separated by `/` (so `('y:min', 'y:max', 'color:sum', 'color:prod')`
20+
yields `'min/sum'` and `'max/prod'`). Mixed lengths above 1 are an error.
1321
- Add cell delimiters and code lens actions to the Positron extension (#366)
1422
- ODBC is now turned on for the CLI as well (#344)
1523
- `FROM` can now come before `VISUALIZE`, mirroring the DuckDB style. This means

doc/syntax/clause/draw.qmd

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,17 @@ The setting takes a single string or an array of strings. Each string is one of:
8686

8787
A numeric mapping that has neither a target nor an applicable default is dropped from the layer with a warning.
8888

89+
You can also target the same aesthetic more than once to produce **multiple rows per group** — one for each function. For example `aggregate => ('y:min', 'y:max')` emits a min row and a max row per group, so a single `DRAW line` produces two summary lines that connect within each group rather than across them.
90+
91+
The stat exposes a synthetic `aggregate` column tagging each row, which you can pick up with a `REMAPPING` to drive another aesthetic — e.g. `REMAPPING aggregate AS stroke` to colour the two lines differently. The column's value is built from the per-row function names of the *exploded* targets, deduplicated, and joined with `/`:
92+
93+
* `aggregate => ('y:min', 'y:max')` → rows tagged `'min'`, `'max'`.
94+
* `aggregate => ('y:min', 'y:max', 'color:sum', 'color:prod')` → rows tagged `'min/sum'`, `'max/prod'`.
95+
* `aggregate => ('y:mean', 'y:max', 'color:mean', 'color:prod')` → rows tagged `'mean'`, `'max/prod'` (the duplicate `'mean'` collapses).
96+
* `aggregate => ('y:min', 'y:max', 'color:median')` → rows tagged `'min'`, `'max'` (the single-function `color` target is recycled across rows and is not part of the label).
97+
98+
When several aesthetics are targeted with the same number of functions, they explode in lockstep (row 1 uses each aesthetic's first function, row 2 the second, and so on); aesthetics with a single function — and the unprefixed defaults — are reused unchanged across every row. Mixing different lengths above 1 is an error.
99+
89100
The simple functions are:
90101

91102
* `'count'`: Non-null tally of the bound column.
@@ -102,7 +113,7 @@ Allowed offsets are: `'mean'`, `'median'`, `'geomean'`, `'harmean'`, `'rms'`, `'
102113

103114
Allowed expansions are: `'sdev'`, `'se'`, `'var'`, `'iqr'`, and `'range'`
104115

105-
Aggregation applies in place: there is no extra `aggregate` column to remap, and you do not need a `REMAPPING` clause to consume aggregate output. The aggregated value replaces the bound column for the same aesthetic.
116+
In the single-row (reduction) case aggregation applies in placeno `REMAPPING` is needed and no synthetic column is added. Only the multi-row (explosion) case described above introduces the synthetic `aggregate` column.
106117

107118
### `FILTER`
108119
```ggsql

doc/syntax/layer/type/line.qmd

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -96,13 +96,12 @@ DRAW line
9696
SCALE linewidth TO (0, 30)
9797
```
9898

99-
Use aggregation to draw min and max lines from a set of observations. Each layer produces one summary trace; stack two layers for both bounds.
99+
Use aggregation to draw min and max lines from a set of observations on a single layer. Targeting `y` twice produces one summary line per function within the same layer, with a synthetic `aggregate` column tagging each row that you can remap to colour the lines distinctly:
100100

101101
```{ggsql}
102102
VISUALISE Day AS x, Temp AS y FROM ggsql:airquality
103103
DRAW line
104-
SETTING aggregate => 'min'
105-
DRAW line
106-
SETTING aggregate => 'max'
104+
REMAPPING aggregate AS stroke
105+
SETTING aggregate => ('y:min', 'y:max')
107106
DRAW point
108107
```

src/execute/layer.rs

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -604,6 +604,24 @@ where
604604
}
605605
}
606606

607+
// The synthetic `aggregate` stat column produced by an exploded
608+
// Aggregate stat tags each row with its function name. For mark
609+
// types that connect rows within a group (line, area, path,
610+
// polygon) we add this column to `layer.partition_by` so e.g.
611+
// `aggregate => ('y:min', 'y:max')` renders as two separate lines
612+
// rather than one zigzag through both. Resolves to the post-rename
613+
// data-column name: if the user remapped `aggregate AS <aes>`, the
614+
// prefixed aesthetic column; otherwise the stat column.
615+
if stat_columns.iter().any(|s| s == "aggregate") {
616+
let partition_col = match final_remappings.get("aggregate") {
617+
Some(aes) => naming::aesthetic_column(aes),
618+
None => naming::stat_column("aggregate"),
619+
};
620+
if !layer.partition_by.contains(&partition_col) {
621+
layer.partition_by.push(partition_col);
622+
}
623+
}
624+
607625
// Apply stat_columns to layer aesthetics using the remappings
608626
for stat in &stat_columns {
609627
if let Some(aesthetic) = final_remappings.get(stat) {

0 commit comments

Comments
 (0)