Skip to content

Commit 4712868

Browse files
teunbrandclaudethomasp85
authored
Spatial phase I: Layer (#370)
* grammatical accommodations * Add Spatial geom type for choropleth/geographic visualization Registers GeomType::Spatial across the parser, AST, and builder. The spatial geom requires a `geometry` aesthetic and supports fill, stroke, opacity, linewidth, and linetype. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add spatial feature: geometry auto-detection, geozero conversion, and SpatialRenderer - Feature-gated `spatial` (default-on) with geozero + hex dependencies - Auto-detect GEOMETRY columns via DESCRIBE for `DRAW spatial` layers - SpatialRenderer converts WKB hex / GeoJSON strings to GeoJSON Features - Vega-Lite geoshape mark with proper encoding channel handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add spatial tests, fix geometry encoding, remove auto-detection - Skip geometry aesthetic in encoding builder (structural, not visual) - Remove SpatialRenderer::modify_encoding stub (default suffices) - Remove geometry column auto-detection (revisit after Arrow migration) - Add end-to-end tests: GeoJSON features, WKB hex, mixed layers - Add execute test for explicit geometry mapping Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Allow arbitrary SQL setup statements (INSTALL, LOAD, SET, etc.) Relax the grammar's other_sql_statement rule to accept any non-delimiter tokens, so statements like INSTALL/LOAD/SET/ATTACH parse without error. Execute these setup statements before the main query in the pipeline. Flip DDL detection in DuckDB and SQLite readers to a returns_rows whitelist, so unknown statement types are handled gracefully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(reader): replace ColumnBuilder with direct Arrow export via query_arrow DuckDB's query_arrow API exports results directly as Arrow RecordBatches, eliminating the manual row-by-row type mapping. Extension types like GEOMETRY now flow through as native Binary columns instead of hitting a lossy string fallback. Decimal128 columns are normalized to Float64 at the boundary for downstream compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(execute): use DDL instead of Arrow round-trip for temp table materialisation Replaces the execute_sql() → register() two-step with direct CREATE TEMP TABLE AS DDL. This keeps data inside the database engine and preserves native types (e.g. DuckDB GEOMETRY) that were lost during the Arrow materialisation round-trip. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(spatial): conditional ST_AsWKB stat transform and native GEOMETRY tests Only apply ST_AsWKB when the geometry column is Binary (native DuckDB GEOMETRY via Arrow). String columns (GeoJSON, WKB hex) pass through to the writer unchanged. Adds tests using INSTALL spatial; LOAD spatial with ST_GeomFromText to verify the full native GEOMETRY pipeline end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "refactor(execute): use DDL instead of Arrow round-trip for temp table materialisation" This reverts commit 2694724. * refactor(spatial): remove string geometry paths, require native GEOMETRY Remove GeoJSON/WKB hex string handling from the writer and stat transform. Spatial data should come from native GEOMETRY columns (via ST_Read, ST_GeomFromText, etc.), not string columns. Drops hex crate dependency from the spatial feature. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(spatial): use dialect for geometry-to-WKB conversion Add sql_geometry_to_wkb() to SqlDialect trait with ST_AsBinary default (OGC standard). DuckDB overrides with ST_AsWKB. The spatial stat transform uses the dialect instead of hardcoding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(data): add ggsql:world built-in dataset Natural Earth 110m country boundaries as geoparquet. Columns: name, iso_a3, continent, subregion, income_group, population, gdp, geom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(spatial): auto-load spatial extension via dialect The spatial stat transform now calls dialect.sql_spatial_setup() before emitting ST_AsWKB, so users no longer need manual LOAD spatial statements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * cargo fmt * feat(spatial): auto-detect geometry column by name and type When a geom declares a geometry aesthetic and the user hasn't mapped it explicitly, scan the schema for a column with a conventional geometry name (geom, geometry, wkb_geometry, the_geom, shape) and binary type. Requires exactly one match to avoid ambiguity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(spatial): load spatial extension before registering ggsql:world DuckDB reads geoparquet geometry columns as BLOB when the spatial extension is not loaded, making ST_AsWKB fail later. Pre-load spatial when the world dataset is referenced so the column is read as GEOMETRY. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * override defaults chosen by claude * add docs * candles in pentagram shape for clippy * beg for clippy's forgiveness * try installing the spatial module during the test * Apply suggestions from code review Co-authored-by: Thomas Lin Pedersen <thomasp85@gmail.com> * apply suggestions from code review * cargo fmt * Detect geometry columns via Reader::geometry_columns() Geometry columns lose their native type during Arrow materialisation (DuckDB GEOMETRY becomes plain Binary). Add Reader::geometry_columns() that queries the backend's type system (DESCRIBE for DuckDB) to find actual geometry columns, with a name+type heuristic fallback. The detection is implemented as GeomTrait::detect_aesthetics(), which spatial overrides. This runs after global mapping merge so user-declared geometry mappings take precedence. Multiple native geometry columns are treated as ambiguous (user must declare explicitly). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "Detect geometry columns via Reader::geometry_columns()" This reverts commit 0f42b58. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Thomas Lin Pedersen <thomasp85@gmail.com>
1 parent 0868fb0 commit 4712868

20 files changed

Lines changed: 630 additions & 62 deletions

File tree

Cargo.lock

Lines changed: 56 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,9 @@ thiserror = "1.0"
5757
# Color interpolation
5858
palette = { version = "0.7", default-features = false, features = ["std", "approx"] }
5959

60+
# Spatial
61+
geozero = { version = "0.14", default-features = false }
62+
6063
# Utilities
6164
regex = "1.10"
6265
chrono = "0.4"

doc/ggsql.xml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@
142142
<item>arrow</item>
143143
<item>rule</item>
144144
<item>range</item>
145+
<item>spatial</item>
145146
</list>
146147

147148
<!-- Aesthetics -->
@@ -188,6 +189,7 @@
188189
<!-- Specialty aesthetics -->
189190
<item>slope</item>
190191
<item>intercept</item>
192+
<item>geometry</item>
191193
<!-- Facet aesthetics -->
192194
<item>panel</item>
193195
<item>row</item>

doc/syntax/index.qmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ There are many different layers to choose from when visualising your data. Some
3333
- [`boxplot`](layer/type/boxplot.qmd) displays continuous variables as 5-number summaries.
3434
- [`range`](layer/type/range.qmd) a line segment between two values along an axis, with optional hinges at the endpoints.
3535
- [`smooth`](layer/type/smooth.qmd) a trendline that follows the data shape.
36+
- [`spatial`](layer/type/spatial.qmd) simple features from geometry.
3637

3738
### Position adjustments
3839
- [`stack`](layer/position/stack.qmd) places objects with a shared baseline on top of each other.

doc/syntax/layer/type/spatial.qmd

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
title: "Spatial"
3+
---
4+
5+
> Layers are declared with the [`DRAW` clause](../../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
6+
7+
The spatial layer is used to render geographic geometries consisting of polygons, lines and points used to make maps like choropleths.
8+
It differs from other layers in that it uses a special [simple features](https://en.wikipedia.org/wiki/Simple_Features) geometry column that defines the shapes.
9+
10+
## Aesthetics
11+
The following aesthetics are recognised by the spatial layer.
12+
13+
### Required
14+
* `geometry`: a column of simple features.
15+
16+
Note that the `geometry` column is required, but an attempt is made to detect such a column automatically.
17+
In practise, this mapping does not often need to be declared.
18+
19+
### Optional
20+
* `stroke` The colour of the lines.
21+
* `fill` The colour of the inner area.
22+
* `colour` Shorthand for setting `stroke` and `fill` simultaneously.
23+
* `opacity` The opacity of colours.
24+
* `linewidth` The width of the lines.
25+
* `linetype` The dash pattern of the line.
26+
27+
## Settings
28+
The spatial layer has no additional settings.
29+
30+
## Data transformation
31+
The spatial layer transforms the `geometry` column to [Well-Known Binary](https://libgeos.org/specifications/wkb/).
32+
33+
## Orientation
34+
The spatial layer has no orientations.
35+
36+
## Examples
37+
38+
Note that depending on your reader, you may need to activate modules for spatial analysis.
39+
40+
```{ggsql}
41+
-- For example, for DuckDB, one could use:
42+
INSTALL spatial;
43+
LOAD spatial;
44+
```
45+
46+
A basic map of the world using built-in data.
47+
Note that the geometry column is automatically detected.
48+
49+
```{ggsql}
50+
VISUALISE FROM ggsql:world
51+
DRAW spatial
52+
```
53+
54+
If the geometry column isn't automatically detected —for example because it has a non-standard name— you may need to declare the mapping explicitly.
55+
56+
```{ggsql}
57+
SELECT geom AS foo FROM ggsql:world
58+
VISUALISE
59+
DRAW spatial
60+
MAPPING foo AS geometry
61+
```
62+
63+
Filtering on other columns.
64+
65+
```{ggsql}
66+
VISUALISE FROM ggsql:world
67+
DRAW spatial
68+
FILTER continent == 'Asia'
69+
```
70+
71+
Filtering based on spatial operations.
72+
73+
```{ggsql}
74+
VISUALISE FROM ggsql:world
75+
DRAW spatial
76+
FILTER ST_Intersects(geom, ST_MakeEnvelope(-20.0, -35.0, 55.0, 38.0))
77+
```
78+
79+
Make a choropleth map by mapping a variable to a fill aesthetic.
80+
81+
```{ggsql}
82+
VISUALISE FROM ggsql:world
83+
DRAW spatial
84+
MAPPING population AS fill
85+
SETTING opacity => 1
86+
```

ggsql-vscode/syntaxes/ggsql.tmLanguage.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@
269269
{
270270
"comment": "Specialty and computed aesthetics",
271271
"name": "support.type.aesthetic.ggsql",
272-
"match": "\\b(weight|coef|intercept|offset|density|count|intensity)\\b"
272+
"match": "\\b(weight|coef|intercept|offset|density|count|intensity|geometry)\\b"
273273
},
274274
{
275275
"comment": "Facet aesthetics",
@@ -320,7 +320,7 @@
320320
{
321321
"comment": "Geom types from grammar.js",
322322
"name": "support.type.geom.ggsql",
323-
"match": "\\b(point|line|path|bar|col|area|tile|polygon|ribbon|histogram|density|smooth|boxplot|violin|text|label|segment|arrow|rule|range)\\b"
323+
"match": "\\b(point|line|path|bar|col|area|tile|polygon|ribbon|histogram|density|smooth|boxplot|violin|text|label|segment|arrow|rule|range|spatial)\\b"
324324
},
325325
{ "include": "#common-clause-patterns" }
326326
]
@@ -334,7 +334,7 @@
334334
"patterns": [
335335
{
336336
"name": "support.type.geom.ggsql",
337-
"match": "\\b(point|line|path|bar|col|area|tile|polygon|ribbon|histogram|density|smooth|boxplot|violin|text|label|segment|arrow|rule|range)\\b"
337+
"match": "\\b(point|line|path|bar|col|area|tile|polygon|ribbon|histogram|density|smooth|boxplot|violin|text|label|segment|arrow|rule|range|spatial)\\b"
338338
},
339339
{ "include": "#common-clause-patterns" }
340340
]

src/Cargo.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,9 @@ libloading = { workspace = true, optional = true }
3232
parquet = { workspace = true, optional = true }
3333
bytes = { workspace = true }
3434

35+
# Spatial
36+
geozero = { workspace = true, optional = true, features = ["with-wkb", "with-geojson"] }
37+
3538
# Serialization
3639
serde.workspace = true
3740
serde_json.workspace = true
@@ -53,11 +56,12 @@ tempfile = "3.8"
5356
ureq = "3"
5457

5558
[features]
56-
default = ["duckdb", "sqlite", "vegalite", "parquet", "builtin-data", "odbc"]
59+
default = ["duckdb", "sqlite", "vegalite", "parquet", "builtin-data", "odbc", "spatial"]
5760
duckdb = ["dep:duckdb"]
5861
parquet = ["dep:parquet"]
5962
sqlite = ["dep:rusqlite"]
6063
odbc = ["dep:toml_edit", "dep:libloading"]
64+
spatial = ["dep:geozero"]
6165
vegalite = []
6266
builtin-data = []
6367
all-readers = ["duckdb", "sqlite", "odbc"]

src/data/world.parquet

174 KB
Binary file not shown.

0 commit comments

Comments
 (0)