Skip to content
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
a12e198
first pass
thomasp85 Apr 27, 2026
bb16c9c
Merged upstream/main into issue160-aggregate
thomasp85 Apr 27, 2026
778b6ac
support numeric axis geoms
thomasp85 Apr 27, 2026
0a1b214
support range geoms
thomasp85 Apr 27, 2026
218f302
reformat
thomasp85 Apr 27, 2026
f14a017
Merge commit 'c3e234b942094f05ddefac1ae6d9b407c54771c3'
thomasp85 Apr 28, 2026
8c5845f
support aggregation in segment
thomasp85 Apr 28, 2026
2cb0216
allow orientation in range and ribbon for aggregation case
thomasp85 Apr 28, 2026
cc390bd
rename to percentile
thomasp85 Apr 28, 2026
4476005
make aggregates parametric
thomasp85 Apr 28, 2026
3f1a433
reformat
thomasp85 Apr 28, 2026
6147ccc
clippy be happy
thomasp85 Apr 28, 2026
1c613e4
ensure multiple aggregates give rise to multiple groups
thomasp85 Apr 28, 2026
f3081a3
begin to document
thomasp85 Apr 28, 2026
56780b0
polygon and path doesn't allow aggregation
thomasp85 Apr 28, 2026
802f1f1
Add documentation for non-range layers
thomasp85 Apr 28, 2026
c6dd4a9
rethink aggregation
thomasp85 May 4, 2026
caf0a8e
add back long-form aggregation
thomasp85 May 4, 2026
564673c
reformat
thomasp85 May 4, 2026
88a707b
fix aggregation of time-dependent layers
thomasp85 May 4, 2026
b1938d8
add additional aggregations + examples
thomasp85 May 6, 2026
c40ea31
Apply suggestions from code review
thomasp85 May 6, 2026
d76825d
apply doc changes to all layers
thomasp85 May 6, 2026
bcbedba
support first and last in ANSI, add diff
thomasp85 May 7, 2026
905063d
support tile
thomasp85 May 7, 2026
840fc6e
defer scaling of aggregated columns
thomasp85 May 7, 2026
bdcd700
update SKILL
thomasp85 May 7, 2026
65c504c
reformat
thomasp85 May 7, 2026
a2b24b9
Merge aggregate_domain_aesthetics and supports_aggregate into one
thomasp85 May 7, 2026
7014e93
avoid twice parsing
thomasp85 May 7, 2026
5ce2ddc
refactor aggregate parsing
thomasp85 May 7, 2026
4aa4159
better warning
thomasp85 May 7, 2026
95d4df1
add finer test
thomasp85 May 7, 2026
134f847
Merge commit '23c50f1a67872808838933f2a7a287871e82c446'
thomasp85 May 7, 2026
ee49998
appease our dear lord and master clippy
thomasp85 May 7, 2026
c574e45
Apply suggestions from code review
thomasp85 May 7, 2026
fe8fae0
improve docs
thomasp85 May 7, 2026
ba0bf3f
implement suggestions from review
thomasp85 May 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@

### Added

- New `aggregate` SETTING on Identity-stat layers (point, line, area, bar, ribbon,
range, segment, arrow, rule, text). By default it collapses each group to a
single row by replacing every numeric mapping in place with its aggregated
value. Accepts a single string or array of strings; entries are either
unprefixed defaults (`'mean'`) or per-aesthetic targets (`'y:max'`,
`'color:median'`). Up to two defaults may be supplied — the first applies to
lower-half aesthetics plus all non-range layers, the second to upper-half
(`max`/`end` suffix). Numeric mappings without a target or applicable default
are dropped with a warning. Targeting the same aesthetic more than once
(e.g. `aggregate => ('y:min', 'y:max')`) produces one row per function with
a synthetic `aggregate` column tagging each row, available for `REMAPPING` to
another aesthetic; targets with a single function and the unprefixed defaults
are reused unchanged across the exploded rows. The `aggregate` column's value
is built from the dedup-and-joined function names of all exploded targets at
each row, separated by `/` (so `('y:min', 'y:max', 'color:sum', 'color:prod')`
yields `'min/sum'` and `'max/prod'`). Mixed lengths above 1 are an error.
- Add cell delimiters and code lens actions to the Positron extension (#366)
- ODBC is now turned on for the CLI as well (#344)
- `FROM` can now come before `VISUALIZE`, mirroring the DuckDB style. This means
Expand Down Expand Up @@ -37,6 +53,7 @@ portion (#364).
- Removed polars from dependency list along with all its transient dependencies. Rewrote DataFrame struct on top of arrow (#350)
- Moved ggsql-python to its own repo (posit-dev/ggsql-python) and cleaned up any additional references to it
- Moved ggsql-r to its own repo (posit-dev/ggsql-r)
- The `orientation` setting on `ribbon` and `range` layers. With explicit `xmin`/`xmax` or `ymin`/`ymax` mappings, orientation is unambiguous and is auto-detected from the mappings; the override is no longer needed.
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

## [2.7.0] - 2026-04-20

Expand Down
39 changes: 39 additions & 0 deletions doc/syntax/clause/draw.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,45 @@ The `SETTING` clause can be used for two different things:
#### Position
A special setting is `position` which controls how overlapping objects are repositioned to avoid overlapping etc. Position adjustments have special mapping requirements so all position adjustments will not be relevant for all layer types. Different layers have different defaults as detailed in their documentation. You can read about each different position adjustment at [their own documentation sites](../index.qmd#position-adjustments).

#### Aggregate
Comment thread
thomasp85 marked this conversation as resolved.
Some layers support aggregation of their data through the `aggregate` setting. These layers will state this. `aggregate` collapses each group to a single row, replacing every numeric mapping in place with its aggregated value. Groups are defined by `PARTITION BY` together with all discrete mappings.
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

The setting takes a single string or an array of strings. Each string is one of:
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

* **Default** — `'<func>'` (no prefix). With one default the function applies to every untargeted numeric mapping. With two defaults the first is used for the lower side of range layers (e.g. `x`/`xmin`) plus all non-range layers, and the second is used for the upper side of range layers (e.g. `xend`/`xmax`). More than two defaults is an error.
Comment thread
thomasp85 marked this conversation as resolved.
Outdated
* **Target** — `'<aes>:<func>'`. Applies `func` to the named aesthetic only (`<aes>` is a user-facing name like `x`, `y`, `xmin`, `xmax`, `xend`, `yend`, `color`, `size`, …). A target overrides any default for that aesthetic.
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

A numeric mapping that has neither a target nor an applicable default is dropped from the layer with a warning.
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

You can also target the same aesthetic more than once to produce **multiple rows per group** — one for each function. For example `aggregate => ('y:min', 'y:max')` emits a min row and a max row per group, so a single `DRAW line` produces two summary lines that connect within each group rather than across them.

The stat exposes a synthetic `aggregate` column tagging each row, which you can pick up with a `REMAPPING` to drive another aesthetic — e.g. `REMAPPING aggregate AS stroke` to colour the two lines differently. The column's value is built from the per-row function names of the *exploded* targets, deduplicated, and joined with `/`:

* `aggregate => ('y:min', 'y:max')` → rows tagged `'min'`, `'max'`.
* `aggregate => ('y:min', 'y:max', 'color:sum', 'color:prod')` → rows tagged `'min/sum'`, `'max/prod'`.
* `aggregate => ('y:mean', 'y:max', 'color:mean', 'color:prod')` → rows tagged `'mean'`, `'max/prod'` (the duplicate `'mean'` collapses).
* `aggregate => ('y:min', 'y:max', 'color:median')` → rows tagged `'min'`, `'max'` (the single-function `color` target is recycled across rows and is not part of the label).
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

When several aesthetics are targeted with the same number of functions, they explode in lockstep (row 1 uses each aesthetic's first function, row 2 the second, and so on); aesthetics with a single function — and the unprefixed defaults — are reused unchanged across every row. Mixing different lengths above 1 is an error.

The simple functions are:
Comment thread
thomasp85 marked this conversation as resolved.
Outdated
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

* `'count'`: Non-null tally of the bound column.
* `'sum'` and `'prod'`: The sum or product
* `'min'`, `'max'`, and `'range'`: Extremes and max - min
* `'mean'`, and `'median'`: Central tendency
* `'geomean'`, `'harmean'`, and `'rms'`: Geometric, harmonic, and root-mean-square
* `'sdev'`, `'var'`, `'iqr'`, and `'se'`: Standard deviation, variance, interquartile range, and standard error
* `'p05'`, `'p10'`, `'p25'`, `'p50'`, `'p75'`, `'p90'`, and `'p95'`: Percentiles

For band functions you combine an offset with an expansion, potentially multiplied. An example could be `'mean-1.96sdev'` which does exactly what you'd expect it to be. The general form is `<offset>±<multiplier><expansion>` with `<multiplier>` being optional (defaults to `1`).

Allowed offsets are: `'mean'`, `'median'`, `'geomean'`, `'harmean'`, `'rms'`, `'sum'`, `'prod'`, `'min'`, `'max'`, and `'p05'`–`'p95'`

Allowed expansions are: `'sdev'`, `'se'`, `'var'`, `'iqr'`, and `'range'`
Comment thread
thomasp85 marked this conversation as resolved.
Outdated
Comment on lines +105 to +107
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For overview purposes a table could be nice, but it is not at all necessary

| function             | simple | offset | expansion | description                                          |
|----------------------|--------|--------|-----------|------------------------------------------------------|
| `'mean'`, `'median'` | v      | v      | x         | Central tendency.                                    |
| `'sdev'` , `'var'`   | v      | x      | v         | Standard deviation, variance                         |
| `'first'`,`'last'`   | v      | x      | x         | The first or last value in the group ^[**footnote**] |


In the single-row (reduction) case aggregation applies in place — no `REMAPPING` is needed and no synthetic column is added. Only the multi-row (explosion) case described above introduces the synthetic `aggregate` column.

### `FILTER`
```ggsql
FILTER <condition>
Expand Down
5 changes: 4 additions & 1 deletion doc/syntax/layer/type/area.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,12 @@ The following aesthetics are recognised by the area layer.
* `orientation`: The orientation of the layer, see the [Orientation section](#orientation). One of the following:
* `'aligned'` to align the layer's primary axis with the coordinate system's first axis.
* `'transposed'` to align the layer's primary axis with the coordinate system's second axis.
* `aggregate`: Aggregation functions to apply per group. Either a single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

## Data transformation
The area layer sorts the data along its primary axis
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY`, all discrete mappings, and the primary axis, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

Further, the area layer sorts the data along its primary axis before returning it.

## Orientation
Area plots are sorted and connected along their primary axis. Since the primary axis cannot be deduced from the mapping it must be specified using the `orientation` setting. E.g. if you wish to create a vertical area plot you need to set `orientation => 'transposed'` to indicate that the primary layer axis follows the second axis of the coordinate system.
Expand Down
15 changes: 15 additions & 0 deletions doc/syntax/layer/type/bar.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,13 @@ The bar layer has no required aesthetics
## Settings
* `position`: Position adjustment. One of `'identity'`, `'stack'` (default), `'dodge'`, or `'jitter'`
* `width`: The width of the bars as a proportion of the available width (0 to 1)
* `aggregate`: Aggregation functions to apply per group if the secondary position has been mapped. Either a single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
If the secondary axis has not been mapped the layer will calculate counts for you and display these as the secondary axis.

If the secondary axis has been mapped you can apply aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY`, all discrete mappings, and the primary axis, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

### Properties

* `weight`: If mapped, the sum of the weights within each group is calculated instead of the count in each group
Expand Down Expand Up @@ -116,3 +119,15 @@ DRAW bar
MAPPING species AS fill
PROJECT TO polar
```

Use a different type of aggregation for the bars through the `aggregate` setting. The `range` layer needs both `ymin` and `ymax` mapped; with two defaults, the first is applied to the lower bound and the second to the upper bound.
Comment thread
teunbrand marked this conversation as resolved.

```{ggsql}
VISUALISE species AS x FROM ggsql:penguins
DRAW bar
MAPPING body_mass AS y
SETTING aggregate => 'mean', fill => 'steelblue'
DRAW range
MAPPING body_mass AS ymin, body_mass AS ymax
SETTING aggregate => ('mean-1.96sdev', 'mean+1.96sdev')
```
19 changes: 17 additions & 2 deletions doc/syntax/layer/type/line.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,16 @@ The following aesthetics are recognised by the line layer.
* `orientation`: The orientation of the layer, see the [Orientation section](#orientation). One of the following:
* `'aligned'` to align the layer's primary axis with the coordinate system's first axis.
* `'transposed'` to align the layer's primary axis with the coordinate system's second axis.
* `aggregate`: Aggregation functions to apply per group. Either a single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
The line layer sorts the data along its primary axis.
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY`, all discrete mappings, and the primary axis, every numeric mapping is replaced in place by its aggregated value to produce a summary trace. Use a default like `'mean'` to summarise the secondary axis, or target other aesthetics with `'<aes>:<func>'` (e.g. `'color:median'`). To draw min/max envelope lines, use a separate `DRAW line` layer per function, or use a [`range` layer](range.qmd) for a single range mark. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

Further, the line layer sorts the data along its primary axis before returning it.

If the line has a variable `stroke` or `opacity` aesthetic within groups, the line is broken into segments.
Each segment gets the property of the preceding datapoint, so the last datapoint in a group does not transfer these properties.
This behavior is not compatible with aggregation.
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

## Orientation
Line plots are sorted and connected along their primary axis. Since the primary axis cannot be deduced from the mapping it must be specified using the `orientation` setting. If you wish to create a vertical line plot, you need to set `orientation => 'transposed'` to indicate that the primary layer axis follows the second axis of the coordinate system.
Expand Down Expand Up @@ -89,4 +94,14 @@ VISUALISE x, y FROM data
DRAW line
MAPPING z AS linewidth
SCALE linewidth TO (0, 30)
```
```

Use aggregation to draw min and max lines from a set of observations on a single layer. Targeting `y` twice produces one summary line per function within the same layer, with a synthetic `aggregate` column tagging each row that you can remap to colour the lines distinctly:
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

```{ggsql}
VISUALISE Day AS x, Temp AS y FROM ggsql:airquality
DRAW line
REMAPPING aggregate AS stroke
SETTING aggregate => ('y:min', 'y:max')
DRAW point
```
3 changes: 2 additions & 1 deletion doc/syntax/layer/type/point.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,10 @@ The following aesthetics are recognised by the point layer.

## Settings
* `position`: Position adjustment. One of `'identity'` (default), `'stack'`, `'dodge'`, or `'jitter'`
* `aggregate`: Aggregation functions to apply per group. Either a single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
The point layer does not transform its data but passes it through unchanged
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY` and all discrete mappings, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

## Orientation
The point layer has no orientation. The axes are treated symmetrically.
Expand Down
3 changes: 2 additions & 1 deletion doc/syntax/layer/type/range.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,10 @@ The following aesthetics are recognised by the range layer.

## Settings
* `width`: The width of the hinges in points (must be >= 0). Defaults to 10. Can be set to `null` to not display hinges.
* `aggregate`: Aggregation functions to apply per group. Either a single string or an array of strings. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and the *Data transformation* section below.

## Data transformation
The range layer does not transform its data but passes it through unchanged.
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY` and all discrete mappings, every numeric mapping is replaced in place by its aggregated value, producing one range per group. Range is a range layer: with two defaults the first applies to the start point (`xmin`/`ymin`) and the second applies to the end point (`xmax`/`ymax`). Use a single default like `'mean'` to apply the same function to all values, or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.
Comment thread
thomasp85 marked this conversation as resolved.
Outdated

## Orientation
The orientation of range layers is deduced directly from the mapping, because the interval is mapped to the secondary axis. To create a horizontal range layer, you map the independent variable to `y` instead of `x` and the interval to `xmin` and `xmax` (assuming a default Cartesian coordinate system).
Expand Down
3 changes: 2 additions & 1 deletion doc/syntax/layer/type/ribbon.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,10 @@ The following aesthetics are recognised by the ribbon layer.

## Settings
* `position`: Position adjustment. One of `'identity'` (default), `'stack'`, `'dodge'`, or `'jitter'`
* `aggregate`: Aggregation functions to apply per group. Either a single string or an array of strings. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and the *Data transformation* section below.

## Data transformation
The ribbon layer sorts the data along its primary axis
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY` and all discrete mappings, every numeric mapping is replaced in place by its aggregated value, producing one ribbon per group. Ribon is a range layer: with two defaults the first applies to the start point (`xmin`/`ymin`) and the second applies to the end point (`xmax`/`ymax`). Use a single default like `'mean'` to apply the same function to all values, or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

## Orientation
Ribbon layers are sorted and connected along their primary axis. The orientation is deduced directly from the mapping, because the interval is mapped to the secondary axis. To create a vertical ribbon layer you map the independent variable to `y` instead of `x` and the interval to `xmin` and `xmax` (assuming a default Cartesian coordinate system).
Expand Down
14 changes: 13 additions & 1 deletion doc/syntax/layer/type/rule.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,10 @@ The following aesthetics are recognised by the rule layer.

## Settings
* `position`: Position adjustment. One of `'identity'` (default), `'stack'`, `'dodge'`, or `'jitter'`
* `aggregate`: Aggregation functions to apply per group. Either a single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY` and all discrete mappings, every numeric mapping is replaced in place by its aggregated value. Use a default like `'mean'` or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

For diagonal lines, the position aesthetic determines the intercept:

Expand Down Expand Up @@ -110,4 +112,14 @@ VISUALISE FROM ggsql:penguins
intercept AS y,
label AS colour
FROM lines
```
```

Show a max rule for a timeseries

```{ggsql}
VISUALISE Temp AS y FROM ggsql:airquality
DRAW line
MAPPING Date AS x
DRAW rule
SETTING aggregate => 'max'
```
3 changes: 2 additions & 1 deletion doc/syntax/layer/type/segment.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,10 @@ For axis-aligned intervals where one coordinate is shared between the start and

## Settings
* `position`: Position adjustment. One of `'identity'` (default), `'stack'`, `'dodge'`, or `'jitter'`
* `aggregate`: Aggregation functions to apply per group. Either a single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
The segment layer does not transform its data but passes it through unchanged.
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY` and all discrete mappings, every numeric mapping is replaced in place by its aggregated value, producing one segment per group. Segment is a range layer: with two defaults the first applies to the start point (`x`/`y`) and the second applies to the end point (`xend`/`yend`). Use a single default like `'mean'` to apply the same function to all four endpoints, or target individual aesthetics with `'<aes>:<func>'`. See [the `DRAW` documentation](../../clause/draw.qmd#aggregate) for the full setting shape.

## Orientation
The segment layer has no orientations. The axes are treated symmetrically.
Expand Down
Loading
Loading