Skip to content

Commit 841ac67

Browse files
authored
Add aggregate functionality for base layers (#384)
1 parent 23c50f1 commit 841ac67

46 files changed

Lines changed: 3594 additions & 147 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ criterion/
9696

9797
# Claude Code specific
9898
.claude/
99+
memory
99100

100101
# R specific
101102
*.Rproj.user

CHANGELOG.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
### Added
44

5+
- New `aggregate` SETTING on Identity-stat layers (point, line, area, bar, ribbon,
6+
range, segment, arrow, rule, text). By default it collapses each group to a
7+
single row by replacing every numeric mapping in place with its aggregated
8+
value. See the `DRAW` documentation for details.
59
- Added panel decorations (grid lines, axes, background) for polar coordinates (#156).
610
- Added `radar` setting to polar coordinates for making radar plots (#418).
711

@@ -11,7 +15,7 @@
1115

1216
- Side effects like `CREATE TEMP TABLE` before the `VISUALISE` statement are now
1317
separated from directly feeding into the visualisation data (#415)
14-
- Fixed bug where panel axes were unintentionally anchored to zero when using
18+
- Fixed bug where panel axes were unintentionally anchored to zero when using
1519
`FACET ... SETTING free => 'x'/'y'` (#410).
1620
- Fixed bug where faceted data were matched to the incorrect panels (#409)
1721

doc/syntax/clause/draw.qmd

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,48 @@ The `SETTING` clause can be used for two different things:
7676
#### Position
7777
A special setting is `position` which controls how overlapping objects are repositioned to avoid overlapping etc. Position adjustments have special mapping requirements so all position adjustments will not be relevant for all layer types. Different layers have different defaults as detailed in their documentation. You can read about each different position adjustment at [their own documentation sites](../index.qmd#position-adjustments).
7878

79+
#### Aggregate
80+
Some layers support aggregation of their data through the `aggregate` setting. Their documentation will state this. `aggregate` collapses each group to a single row, replacing every numeric mapping in place with its aggregated value. Groups are defined by `PARTITION BY` together with all discrete mappings.
81+
82+
The `aggregate` setting takes a single string or an array of strings. Each string is one of:
83+
84+
* **Untargeted**`'<func>'` (no prefix). With one untargeted aggregation, the function applies to every numeric mapping that doesn't have a targeted aggregation. With two untargeted aggregations, the first is used for the lower side of range layers (e.g. `x`/`xmin`) plus all non-range layers, and the second is used for the upper side of range layers (e.g. `xend`/`xmax`). More than two untargeted aggregations is not allowed.
85+
* **Targeted**`'<aes>:<func>'`. Applies `func` to the named aesthetic only (`<aes>` is a name like `x`, `y`, `xmin`, `xmax`, `xend`, `yend`, `color`, `size`, …). A target overrides any untargeted aggregation for that aesthetic.
86+
87+
A numeric mapping is dropped from the layer with a warning, when it has neither a target nor an applicable default.
88+
89+
##### Aggregate functions
90+
Aggregation can either be a simple function or a band function. The simple functions are:
91+
92+
* `'count'`: Non-null tally of the bound column.
93+
* `'sum'` and `'prod'`: The sum or product
94+
* `'min'`, `'max'`: Extremes
95+
* `'range'` (max - min), `'mid'` (min + max) / 2
96+
* `'mean'`, and `'median'`: Central tendency
97+
* `'geomean'`, `'harmean'`, and `'rms'`: Geometric, harmonic, and root-mean-square
98+
* `'sdev'`, `'var'`, `'iqr'`, and `'se'`: Standard deviation, variance, interquartile range, and standard error
99+
* `'p05'`, `'p10'`, `'p25'`, `'p50'`, `'p75'`, `'p90'`, and `'p95'`: Percentiles
100+
* `'first'` and `'last'`: The first or last value in the group, in row order. Note that the row order within a group is engine-defined unless the source query has an `ORDER BY` — these are most useful when the upstream SQL provides an explicit ordering.
101+
* `'diff'`: `last - first`. The change between the first and last value in row order — same ordering caveat applies.
102+
103+
For band functions you combine an offset with an expansion, potentially multiplied. An example could be `'mean-1.96sdev'` which does exactly what you'd expect it to be. The general form is `<offset>±<multiplier><expansion>` with `<multiplier>` being optional (defaults to `1`).
104+
105+
Allowed offsets are: `'mean'`, `'median'`, `'geomean'`, `'harmean'`, `'rms'`, `'sum'`, `'prod'`, `'min'`, `'max'`, `'mid'`, and `'p05'``'p95'`
106+
107+
Allowed expansions are: `'sdev'`, `'se'`, `'var'`, `'iqr'`, and `'range'`
108+
109+
##### Exploded aggregation
110+
You can also target the same aesthetic more than once to produce *multiple rows per group* — one for each function. We call that *exploded aggregation*. For example `aggregate => ('y:min', 'y:max')` emits a min row and a max row per group, so a single `DRAW line` produces two summary lines that connect within each group rather than across them. When multiple rows are created, a synthetic `aggregate` column is made that tags each row with the name of the aggregation function. You can use this with a `REMAPPING` to drive another aesthetic — e.g. `REMAPPING aggregate AS stroke` to colour the two lines differently. The column's value is built from the per-row function names of the *exploded* targets, deduplicated, and joined with `/`:
111+
112+
* `aggregate => ('y:min', 'y:max')` → rows tagged `'min'`, `'max'`.
113+
* `aggregate => ('y:min', 'y:max', 'color:median')` → rows tagged `'min'`, `'max'` (the single-function `color` target is recycled across rows and is not part of the label).
114+
* `aggregate => ('y:min', 'y:max', 'color:sum', 'color:prod')` → rows tagged `'min/sum'`, `'max/prod'`.
115+
* `aggregate => ('y:mean', 'y:max', 'color:mean', 'color:prod')` → rows tagged `'mean'`, `'max/prod'` (the duplicate `'mean'` collapses).
116+
117+
When several aesthetics are targeted with the same number of functions, they explode in lockstep: row 1 uses each aesthetic's first function, row 2 the second, and so on. Aesthetics with a single function — and the unprefixed defaults — are reused unchanged across every row. Mixing different numbers of aggregation metrics above 1 across aesthetics is not allowed.
118+
119+
In the single-row (reduction) case aggregation applies in place — no `REMAPPING` is needed and no synthetic column is added. Only the multi-row (explosion) case described above introduces the synthetic `aggregate` column.
120+
79121
### `FILTER`
80122
```ggsql
81123
FILTER <condition>

doc/syntax/layer/type/area.qmd

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,14 @@ The following aesthetics are recognised by the area layer.
2525
* `orientation`: The orientation of the layer, see the [Orientation section](#orientation). One of the following:
2626
* `'aligned'` to align the layer's primary axis with the coordinate system's first axis.
2727
* `'transposed'` to align the layer's primary axis with the coordinate system's second axis.
28+
* `aggregate` Aggregation functions to apply per group:
29+
* `null` apply no group aggregation (default).
30+
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.
2831

2932
## Data transformation
30-
The area layer sorts the data along its primary axis
33+
This layer supports aggregation through the `aggregate` setting. Aggregation groups are defined by `PARTITION BY`, all discrete mappings, but also the primary axis. Within each group, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.
34+
35+
Further, the area layer sorts the data along its primary axis before returning it.
3136

3237
## Orientation
3338
Area plots are sorted and connected along their primary axis. Since the primary axis cannot be deduced from the mapping it must be specified using the `orientation` setting. E.g. if you wish to create a vertical area plot you need to set `orientation => 'transposed'` to indicate that the primary layer axis follows the second axis of the coordinate system.

doc/syntax/layer/type/bar.qmd

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,15 @@ The bar layer has no required aesthetics
2525
## Settings
2626
* `position`: Position adjustment. One of `'identity'`, `'stack'` (default), `'dodge'`, or `'jitter'`
2727
* `width`: The width of the bars as a proportion of the available width (0 to 1)
28+
* `aggregate` Aggregation functions to apply per group:
29+
* `null` apply no group aggregation (default).
30+
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.
2831

2932
## Data transformation
3033
If the secondary axis has not been mapped the layer will calculate counts for you and display these as the secondary axis.
3134

35+
This layer supports aggregation through the `aggregate` setting. Aggregation groups are defined by `PARTITION BY` and all discrete mappings. Within each group, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.
36+
3237
### Properties
3338

3439
* `weight`: If mapped, the sum of the weights within each group is calculated instead of the count in each group
@@ -116,3 +121,15 @@ DRAW bar
116121
MAPPING species AS fill
117122
PROJECT TO polar
118123
```
124+
125+
Use a different type of aggregation for the bars through the `aggregate` setting. The `range` layer needs both `ymin` and `ymax` mapped; with two defaults, the first is applied to the lower bound and the second to the upper bound.
126+
127+
```{ggsql}
128+
VISUALISE species AS x FROM ggsql:penguins
129+
DRAW bar
130+
MAPPING body_mass AS y
131+
SETTING aggregate => 'mean', fill => 'steelblue'
132+
DRAW range
133+
MAPPING body_mass AS ymin, body_mass AS ymax
134+
SETTING aggregate => ('mean-1.96sdev', 'mean+1.96sdev')
135+
```

doc/syntax/layer/type/line.qmd

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,15 @@ The following aesthetics are recognised by the line layer.
2424
* `orientation`: The orientation of the layer, see the [Orientation section](#orientation). One of the following:
2525
* `'aligned'` to align the layer's primary axis with the coordinate system's first axis.
2626
* `'transposed'` to align the layer's primary axis with the coordinate system's second axis.
27+
* `aggregate` Aggregation functions to apply per group:
28+
* `null` apply no group aggregation (default).
29+
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.
2730

2831
## Data transformation
29-
The line layer sorts the data along its primary axis.
32+
This layer supports aggregation through the `aggregate` setting. Aggregation groups are defined by `PARTITION BY`, all discrete mappings, but also the primary axis. Within each group, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.
33+
34+
Further, the line layer sorts the data along its primary axis before returning it.
35+
3036
If the line has a variable `stroke` or `opacity` aesthetic within groups, the line is broken into segments.
3137
Each segment gets the property of the preceding datapoint, so the last datapoint in a group does not transfer these properties.
3238

@@ -89,4 +95,14 @@ VISUALISE x, y FROM data
8995
DRAW line
9096
MAPPING z AS linewidth
9197
SCALE linewidth TO (0, 30)
92-
```
98+
```
99+
100+
Use aggregation to draw min and max lines from a set of observations on a single layer. Targeting `y` twice produces one summary row per function within the same group. A synthetic `aggregate` column tags each row with the different function names, that you can remap to colour the lines distinctly:
101+
102+
```{ggsql}
103+
VISUALISE Day AS x, Temp AS y FROM ggsql:airquality
104+
DRAW line
105+
REMAPPING aggregate AS stroke
106+
SETTING aggregate => ('y:min', 'y:max')
107+
DRAW point
108+
```

doc/syntax/layer/type/point.qmd

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,12 @@ The following aesthetics are recognised by the point layer.
2323

2424
## Settings
2525
* `position`: Position adjustment. One of `'identity'` (default), `'stack'`, `'dodge'`, or `'jitter'`
26+
* `aggregate` Aggregation functions to apply per group:
27+
* `null` apply no group aggregation (default).
28+
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.
2629

2730
## Data transformation
28-
The point layer does not transform its data but passes it through unchanged
31+
This layer supports aggregation through the `aggregate` setting. Aggregation groups are defined by `PARTITION BY` and all discrete mappings. Within each group, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.
2932

3033
## Orientation
3134
The point layer has no orientation. The axes are treated symmetrically.
@@ -72,3 +75,13 @@ VISUALISE species AS x, bill_dep AS y FROM ggsql:penguins
7275
DRAW point
7376
SETTING position => 'jitter', distribution => 'density'
7477
```
78+
79+
Use aggregation to show a single point per group
80+
81+
```{ggsql}
82+
VISUALISE species AS x, island AS y, body_mass AS fill, body_mass AS size
83+
FROM ggsql:penguins
84+
DRAW point
85+
SETTING aggregate => ('fill:mean', 'size:count')
86+
SCALE size TO (5, 20)
87+
```

doc/syntax/layer/type/range.qmd

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,12 @@ The following aesthetics are recognised by the range layer.
2222

2323
## Settings
2424
* `width`: The width of the hinges in points (must be >= 0). Defaults to 10. Can be set to `null` to not display hinges.
25+
* `aggregate` Aggregation functions to apply per group:
26+
* `null` apply no group aggregation (default).
27+
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.
2528

2629
## Data transformation
27-
The range layer does not transform its data but passes it through unchanged.
30+
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY` and all discrete mappings, every numeric mapping is replaced in place by its aggregated value, producing one range per group. Range is a range layer with two defaults: the first applies to the start point (`xmin`/`ymin`) and the second applies to the end point (`xmax`/`ymax`). Use a single default like `'mean'` to apply the same function to all values, or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.
2831

2932
## Orientation
3033
The orientation of range layers is deduced directly from the mapping, because the interval is mapped to the secondary axis. To create a horizontal range layer, you map the independent variable to `y` instead of `x` and the interval to `xmin` and `xmax` (assuming a default Cartesian coordinate system).
@@ -108,3 +111,25 @@ DRAW range
108111
MAPPING low AS ymin, high AS ymax
109112
SETTING width => null
110113
```
114+
115+
Rather than precomputing the values and plotting them, you can use the aggregate functionality to calculate the relevant statistics dynamically:
116+
117+
```{ggsql}
118+
VISUALISE Date AS x, Temp AS ymin, Temp AS ymax, Temp AS color
119+
FROM ggsql:airquality
120+
DRAW range
121+
REMAPPING aggregate AS linewidth
122+
SETTING
123+
aggregate => (
124+
'x:first',
125+
'ymin:first', 'ymin:min',
126+
'ymax:last', 'ymax:max',
127+
'color:diff'
128+
),
129+
width => null
130+
PARTITION BY Week
131+
SCALE linewidth TO (5, 1)
132+
SCALE BINNED color TO ('steelblue', 'firebrick')
133+
SETTING breaks => (-20, 0, 20)
134+
```
135+

doc/syntax/layer/type/ribbon.qmd

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,12 @@ The following aesthetics are recognised by the ribbon layer.
2323

2424
## Settings
2525
* `position`: Position adjustment. One of `'identity'` (default), `'stack'`, `'dodge'`, or `'jitter'`
26+
* `aggregate` Aggregation functions to apply per group:
27+
* `null` apply no group aggregation (default).
28+
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.
2629

2730
## Data transformation
28-
The ribbon layer sorts the data along its primary axis
31+
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY` and all discrete mappings, every numeric mapping is replaced in place by its aggregated value, producing one ribbon per group. Ribon is a range layer with two defaults: the first applies to the start point (`xmin`/`ymin`) and the second applies to the end point (`xmax`/`ymax`). Use a single default like `'mean'` to apply the same function to all values, or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.
2932

3033
## Orientation
3134
Ribbon layers are sorted and connected along their primary axis. The orientation is deduced directly from the mapping, because the interval is mapped to the secondary axis. To create a vertical ribbon layer you map the independent variable to `y` instead of `x` and the interval to `xmin` and `xmax` (assuming a default Cartesian coordinate system).
@@ -59,3 +62,11 @@ DRAW ribbon
5962
DRAW line
6063
MAPPING MeanTemp AS y
6164
```
65+
66+
Use aggregation to calculate bounds on the fly. The two untargeted aggregation functions target the `ymin` and `ymax` aesthetics automatically.
67+
68+
```{ggsql}
69+
VISUALISE Day AS x, Temp AS ymin, Temp AS ymax FROM ggsql:airquality
70+
DRAW ribbon
71+
SETTING aggregate => ('min', 'max')
72+
```

doc/syntax/layer/type/rule.qmd

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,12 @@ The following aesthetics are recognised by the rule layer.
2525

2626
## Settings
2727
* `position`: Position adjustment. One of `'identity'` (default), `'stack'`, `'dodge'`, or `'jitter'`
28+
* `aggregate` Aggregation functions to apply per group:
29+
* `null` apply no group aggregation (default).
30+
* A single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.
2831

2932
## Data transformation
33+
This layer supports aggregation through the `aggregate` setting. Aggregation groups are defined by `PARTITION BY` and all discrete mappings. Within each group, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.
3034

3135
For diagonal lines, the position aesthetic determines the intercept:
3236

@@ -110,4 +114,14 @@ VISUALISE FROM ggsql:penguins
110114
intercept AS y,
111115
label AS colour
112116
FROM lines
113-
```
117+
```
118+
119+
Show a max rule for a timeseries
120+
121+
```{ggsql}
122+
VISUALISE Temp AS y FROM ggsql:airquality
123+
DRAW line
124+
MAPPING Date AS x
125+
DRAW rule
126+
SETTING aggregate => 'max'
127+
```

0 commit comments

Comments
 (0)