Skip to content

Spatial phase I: Layer#370

Merged
teunbrand merged 34 commits into
posit-dev:mainfrom
teunbrand:spatial_start
May 7, 2026
Merged

Spatial phase I: Layer#370
teunbrand merged 34 commits into
posit-dev:mainfrom
teunbrand:spatial_start

Conversation

@teunbrand

Copy link
Copy Markdown
Collaborator

This PR advances #336.

It introduces the spatial layer with draws the simple feature geometry.
Internally, the geometry column is converted to WKB, which is handled via a dialect.
The vegalite writer converts the WKB to GeoJSON.
Like ggplot2, detection of the geometry column is automated for common cases.
This PR also adds the ggsql:world built-in dataset that contains the Natural Earth 110m dataset with selected columns.

Noteably absent in this PR is the whole projection ordeal.

teunbrand and others added 26 commits April 22, 2026 14:05
Registers GeomType::Spatial across the parser, AST, and builder.
The spatial geom requires a `geometry` aesthetic and supports
fill, stroke, opacity, linewidth, and linetype.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… SpatialRenderer

- Feature-gated `spatial` (default-on) with geozero + hex dependencies
- Auto-detect GEOMETRY columns via DESCRIBE for `DRAW spatial` layers
- SpatialRenderer converts WKB hex / GeoJSON strings to GeoJSON Features
- Vega-Lite geoshape mark with proper encoding channel handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Skip geometry aesthetic in encoding builder (structural, not visual)
- Remove SpatialRenderer::modify_encoding stub (default suffices)
- Remove geometry column auto-detection (revisit after Arrow migration)
- Add end-to-end tests: GeoJSON features, WKB hex, mixed layers
- Add execute test for explicit geometry mapping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Relax the grammar's other_sql_statement rule to accept any non-delimiter
tokens, so statements like INSTALL/LOAD/SET/ATTACH parse without error.
Execute these setup statements before the main query in the pipeline.
Flip DDL detection in DuckDB and SQLite readers to a returns_rows
whitelist, so unknown statement types are handled gracefully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…query_arrow

DuckDB's query_arrow API exports results directly as Arrow RecordBatches,
eliminating the manual row-by-row type mapping. Extension types like
GEOMETRY now flow through as native Binary columns instead of hitting a
lossy string fallback. Decimal128 columns are normalized to Float64 at
the boundary for downstream compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… materialisation

Replaces the execute_sql() → register() two-step with direct
CREATE TEMP TABLE AS DDL. This keeps data inside the database engine
and preserves native types (e.g. DuckDB GEOMETRY) that were lost
during the Arrow materialisation round-trip.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Y tests

Only apply ST_AsWKB when the geometry column is Binary (native DuckDB
GEOMETRY via Arrow). String columns (GeoJSON, WKB hex) pass through
to the writer unchanged.

Adds tests using INSTALL spatial; LOAD spatial with ST_GeomFromText
to verify the full native GEOMETRY pipeline end-to-end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove GeoJSON/WKB hex string handling from the writer and stat
transform. Spatial data should come from native GEOMETRY columns
(via ST_Read, ST_GeomFromText, etc.), not string columns.

Drops hex crate dependency from the spatial feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sql_geometry_to_wkb() to SqlDialect trait with ST_AsBinary
default (OGC standard). DuckDB overrides with ST_AsWKB. The spatial
stat transform uses the dialect instead of hardcoding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Natural Earth 110m country boundaries as geoparquet. Columns: name,
iso_a3, continent, subregion, income_group, population, gdp, geom.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The spatial stat transform now calls dialect.sql_spatial_setup() before
emitting ST_AsWKB, so users no longer need manual LOAD spatial statements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a geom declares a geometry aesthetic and the user hasn't mapped it
explicitly, scan the schema for a column with a conventional geometry
name (geom, geometry, wkb_geometry, the_geom, shape) and binary type.
Requires exactly one match to avoid ambiguity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDB reads geoparquet geometry columns as BLOB when the spatial
extension is not loaded, making ST_AsWKB fail later. Pre-load spatial
when the world dataset is referenced so the column is read as GEOMETRY.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@teunbrand teunbrand marked this pull request as ready for review May 5, 2026 12:48
@teunbrand teunbrand requested a review from thomasp85 May 5, 2026 12:49

@thomasp85 thomasp85 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in general looks good. Most comments are cosmetic.

One thing we kinda ignore here, and maybe that is for the future projection PR, but do we need a geometry scale? Or do you scale latitude and longitude (defined by the projection), and the geometry mapping just sort of follows?

Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Comment thread src/execute/mod.rs
fn looks_like_geometry(name: &str) -> bool {
matches!(
name.to_lowercase().as_str(),
"geom" | "geometry" | "wkb_geometry" | "the_geom" | "shape"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these based on use in the wild? "the_geom" sounds made up 🫠

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are 100% based on what claude decided might be reasonable. I didn't ask it for receipts, but it does seem to be out there in the wild:
https://gis.stackexchange.com/questions/49455/geometry-column-naming-convention-geom-or-the-geom

Comment thread src/execute/mod.rs Outdated
return Some(candidates[0].name.clone());
}

// Fall back to name-only match (e.g. extension types we don't recognise)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there is a geometry type with another name? Shouldn't that win over the "standard" name?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually say that name only should never "win" since this could map to a column that doesn't support ST_asWKB() and end up with a super confusing error.

The logic should be: Any column holding valid geometry. If multiple of these chose the one with standard name, if no standard naming then the first

@teunbrand teunbrand May 6, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the logic you propose itself is sound, but the problem with this is that every DB has its own interpretation of what constitutes 'valid geometry'. WKB is the 'standard' but DuckDB has its own flavour of geometry and I imagine different databases store it in different formats yet again (i.e. the 'extension types we don't recognise'). The homogenisation of data is the main reason we cast the geometry column as WKB.

I'm happy to remove this fallback though, it'll force the user to declare a geometry column if we can't guess it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we rely on the dialect to know what is "valid geometry"?

@teunbrand teunbrand May 6, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at 0f42b58 for how this could work. I personally think all the extra plumbing is not worth the slightly better heuristics and it should be reverted. Let me know if you think otherwise.

Comment thread src/reader/data.rs Outdated

@thomasp85 thomasp85 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a comment from Claude to consider:

GeoJSON reserved-key collisions in SpatialRenderer::prepare_data (src/writer/vegalite/layer.rs:2239):

properties.insert(col_name.clone(), value.clone()); 
feature.insert(col_name.clone(), value); 

Every non-geometry column is also splatted into the feature top level. If a user has a column named type, this overwrites "Feature" and Vega-Lite stops treating it as a feature. properties, bbox, id collide too. At minimum, skip those reserved keys when copying to the top level — or just drop the top-level copy (see next point).

Worth fixing while you're here

Properties stored twice in the inline data (same lines as above). Each non-geometry column ends up both at feature.X and feature.properties.X. Vega-Lite resolves bare field: "X" references against datum.X, which is the top-level form — so the properties map is unused output. Drop the properties map (simpler, matches how the rest of the codebase emits row data), or drop the top-level copy and use from: { data: ..., fields: ["properties.X"] } style. Either way, half the geometry-row JSON goes away.

Comment thread Cargo.toml Outdated
Comment thread src/reader/duckdb.rs
}

fn sql_spatial_setup(&self) -> Vec<String> {
vec!["LOAD spatial".into()]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if spatial is not installed? IS it installed on first load?

@teunbrand teunbrand May 6, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the user should install it. We ourselves only ever try loading it.
If it isn't installed the error message returned by the execution of LOAD spatial surfaces to the user.

The one exception is that we try to install it in our own tests, but these are all contingent of the spatial feature gate and we can't really run the tests without DuckDB's spatial extension.

Comment thread src/execute/mod.rs Outdated
return Some(candidates[0].name.clone());
}

// Fall back to name-only match (e.g. extension types we don't recognise)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually say that name only should never "win" since this could map to a column that doesn't support ST_asWKB() and end up with a super confusing error.

The logic should be: Any column holding valid geometry. If multiple of these chose the one with standard name, if no standard naming then the first

Comment thread doc/syntax/layer/type/spatial.qmd Outdated
Co-authored-by: Thomas Lin Pedersen <thomasp85@gmail.com>
@teunbrand

teunbrand commented May 6, 2026

Copy link
Copy Markdown
Collaborator Author

do we need a geometry scale

At some point maybe, but we'd have to have graticules first before it becomes meaningful.
It also isn't clear to me why we wouldn't just use SCALE lon ... and SCALE lat ... for this.
Having to invent how a multidimensional scale should act is a bigger challenge.

Also, a comment from Claude to consider:

Issue 1 about the reserved names is moot, because our naming scheme prevents collisions. I'm fine with being performatively defensive though.
Issue 2 about the properties is a good catch.

teunbrand and others added 6 commits May 6, 2026 11:07
Geometry columns lose their native type during Arrow materialisation
(DuckDB GEOMETRY becomes plain Binary). Add Reader::geometry_columns()
that queries the backend's type system (DESCRIBE for DuckDB) to find
actual geometry columns, with a name+type heuristic fallback.

The detection is implemented as GeomTrait::detect_aesthetics(), which
spatial overrides. This runs after global mapping merge so user-declared
geometry mappings take precedence. Multiple native geometry columns are
treated as ambiguous (user must declare explicitly).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@teunbrand teunbrand merged commit 4712868 into posit-dev:main May 7, 2026
2 checks passed
@teunbrand teunbrand deleted the spatial_start branch May 7, 2026 07:58
@mdsumner

mdsumner commented Jun 9, 2026

Copy link
Copy Markdown

Projections aren't an ordeal. It's super simple just treat it like numbers. True that longlat plots usually look better with a cosine latitude aspect ratio hack, but it's not a physical law it's entirely arbitrary. Any projection can be just as wrong in some regions of the map (for some definition of "wrong") at 1:1. Where is the idea of an "ordeal" coming from? (I'll be exploring the work, thanks !)

@teunbrand

teunbrand commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

It's super simple just treat it like numbers

Oh how I wish this was the only requirement 😅
I don't have a GIS background, so I'm being blindsided with various spatial concepts that are/were foreign to me, accummulating into the whole I'm calling an 'ordeal'.

Examples of ordeal-contributing factors are:

  • Projections producing invalid geometry.
  • Determining 'valid area' to place panel background.
  • Dealing with the antimeridian and the Chukotka peninsula.
  • Hiding geometry at the other side of the globe if using orthographic projection
  • Cutting holes/slits into geometries when using e.g. IGH projection.

I'll be exploring the work, thanks !

If you have any feedback I look forward to it!

@mdsumner

mdsumner commented Jun 9, 2026

Copy link
Copy Markdown

I don't think you should concern with any of those things. If it can plot, plot it. No other package with simple features ever solved these things, but many partial solutions definitely get in the way of what users can simply work with - if other software hands you data that looks bad it's not your responsiblity. btw something I should have pointed to earlier, in wk it has wk::wk_transform and PROJ package will provide a crs2crs transform to use there (or does the db backend do all that?).

What is "valid area" for panel background? You mean what is a projected extent of a geometry(set) vs what kind of other spatial context it's in?

@teunbrand

Copy link
Copy Markdown
Collaborator Author

I'm sure GIS people will know how to shape their data to plot it nicely, but I'm not convinced the average user can and/or should. If partial solutions get in the way of what people want, I'd hope they report it on the issue tracker.

does the db backend do all that

Yes the db backend does the projections. We've tested with PostGIS, SpatiaLite and the DuckDB spatial extension. We don't take on additional dependencies for spatial stuff aside for the geozero crate to do a wkt -> geojson conversion that vegalite needs.

What is "valid area" for panel background?

The area where one can expect data to be. For example for lon/lat coordinates it is a rectangle spanning -180 to +180 longitude and -90 to +90 latitude. They're the gray areas in the examples at https://ggsql.org/syntax/coord/crs.html#via-named-projection.

@mdsumner

mdsumner commented Jun 9, 2026

Copy link
Copy Markdown

oh I see, we had a discussion that in tmap and tmap improve crs - certainly I'm happy to be asked to explore details on this stuff at any time, it will be a while until I dive into the code here but I love the potential. Feel free to ping me on specific issues - there's a lot more we can do these days, and a lot more we can do with claude by our side (as I'm sure you know)

@teunbrand

Copy link
Copy Markdown
Collaborator Author

Ah good to see I'm not alone in these struggles and more experienced people run into similar issues 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants