Skip to content

Commit d71e7e6

Browse files
beta(0.4.0): lightweight-tier hardening — Serverless, geometry consistency, reader defaults (#40)
* fix(light): pin mapbox-vector-tile 2.1.x for Serverless (protobuf<6) geobrix[light] on Databricks Serverless (env v5) failed to install: the mapbox-vector-tile<3 pin resolved to 2.2.0, which forces protobuf>=6.31.1 and upgrades the immutable base protobuf 5.29.4 -> 6.x. That conflicts with Serverless's Spark-Connect/gRPC stack (grpcio-status / googleapis-common-protos pin protobuf<6), so pip emits an ERROR conflict report and protobuf 6 would break Spark Connect. mapbox-vector-tile 2.1.0 keeps protobuf at the base 5.29.4 and still has the 2.x default_options / native-typed-attr encode API, so MVT stays on the lightweight/Serverless tier. Pin >=2.1,<2.2 in both the [light] and [test] extras. Adds notebooks/tests/serverless_light_smoke.py: a reusable Serverless (env v5) probe that submits a one-off run to validate the [light] install + exercise the API (Spark-Connect health, versions, MVT encode, pyrx/pyvx register); supports --diagnose / --probe-mvt / --func-validate / --env-deps / --isolate-register modes. Co-authored-by: Isaac * docs: quote the PEP 508 'geobrix[light] @ file://' install everywhere The install snippets showed the path-with-extra form ('/Volumes/.../...whl[light]') which fails on Serverless — %pip keeps the surrounding quotes so pip reads [light] as part of the filename. Switch installation/quick-start/README to the named, quoted PEP 508 form (one argument), which installs cleanly on Serverless/standard/ARM, and add a warning admonition explaining the gotcha. Also drop the misleading '%pip install geobrix' (not on PyPI) from the VectorX install table. Co-authored-by: Isaac * fix(bench): derive serverless smoke notebook path from current user Drop the hardcoded personal /Users/<email> notebook path (flagged by the internals-leak check); derive it from w.current_user.me().user_name and read HOST from DATABRICKS_HOST when set. Co-authored-by: Isaac * fix(bench): read serverless smoke config from gitignored env file Source host/profile/Volume coordinates from databricks_cluster_config.env (GBX_BUNDLE_VOLUME_*, DATABRICKS_CONFIG_PROFILE) and derive host from the profile, so no workspace URL, Volume path, or profile name is hardcoded in this committed file. Co-authored-by: Isaac * fix(bench): serverless smoke auth via profile + pyrx exec probe Authenticate with WorkspaceClient(profile=...) (the configured CLI profile) instead of minting/injecting a bearer token; keep DATABRICKS_CONFIG_PROFILE OUT of os.environ (when present the CLI auth takes a broken refresh path). Replace the catalog.listFunctions() count (which fails on Serverless/UC with a DataType.fromDDL parse error, unrelated to GeoBrix) with a real pyrx execution: build a tiny GeoTIFF and read its width through the Column API; characterize listFunctions separately. Co-authored-by: Isaac * fix(bench): serverless smoke mints CLI token once (no SDK refresh churn) The SDK profile path refreshes+rotates the single-use OAuth refresh token on every client creation, which breaks across repeated runs. Use the CLI's cached access token (databricks auth token) + host from cfg, env kept clean of DATABRICKS_CONFIG_PROFILE. Co-authored-by: Isaac * fix(bench): serverless smoke passes driver to rst_fromcontent rst_fromcontent(content, driver) requires the GDAL driver name; the probe omitted it and would always fail the pyrx-exec check. Pass lit('GTiff'). Co-authored-by: Isaac * fix(light): cap idna<3.8 to keep Serverless base unchanged idna is transitive (requests/anyio/httpx) with no upper bound, so pip pulls the latest (3.18) and shadows Serverless v5's base idna 3.7, firing the 'a core Python package changed: idna' notebook notice. Nothing in the stack needs >3.7, so cap <3.8 to keep the base in place. Co-authored-by: Isaac * docs(readers): fix filterRegex example escaping (raw-loader showed \\.) The raster_gbx 'read with options' snippet wrote the regex as a raw string r".*\\.tif$" inside a non-raw triple-quote, so the raw-loader rendered TWO backslashes; a user copying it got r".*\\.tif$" (literal backslash) which matches nothing (FileNotFoundError: no files matched). Make the constant a raw triple-quote and use r".*\.tif$" so the rendered example is the correct single-backslash escaped-dot regex. Co-authored-by: Isaac * fix(ds): light raster reader source column dbfs:-qualified The light raster readers emitted source as a bare /Volumes/... path (os.path.abspath), but Spark binaryFile and the heavy gdal reader emit dbfs:/Volumes/... So a light-produced DataFrame failed to join (0 rows) against a binaryFile/heavy path column. Add to_spark_uri() (mirrors the Hadoop convention: /Volumes -> dbfs:/Volumes, /dbfs -> dbfs:, other schemes + local paths unchanged) and apply it to the OUTPUT source column only; rasterio still reads the bare FUSE path. Co-authored-by: Isaac * fix(light): strip dbfs:/file: scheme before native file ops Columns store dbfs:-qualified paths (to_spark_uri); every light place that opens/writes a path via rasterio/pyogrio/os/GDAL now strips the scheme back to the bare FUSE path via to_local_path (rst_fromfile, color-relief table, raster reader listing + writer, vector reader/writer, pmtiles writer). Keeps object-store schemes (s3/abfss/gs/http/vsi) untouched. Mirrors the heavy convention (Hadoop-qualified columns, cleanPath for native opens). Co-authored-by: Isaac * docs(xview): port example to the lightweight API Use the light tier end-to-end: pip install geobrix[light] from the Volume, import pyrx + register readers/writers, read rasters via the gtiff_gbx DataSource (which yields the tile directly, no rst_fromfile), comment out the Serverless-unsupported spark.conf.set lines, replace binaryFile thumbnail .display() with .limit(1).show(vertical=True), and add a FORCE_REBUILD flag so the tableExists guards don't skip steps. Join labels to rasters on a normalized key so the clip count is non-zero (the source column is now dbfs:-qualified, matching binaryFile/heavy). Co-authored-by: Isaac * docs(xview): write clipped rasters via gtiff_gbx writer + nameCol Replace the manual foreachPartition file-write with the lightweight gtiff_gbx DataSource writer. Deterministic names via nameCol: select the exact (source, tile) schema with source = index_right_type-id_feature-id, so the writer emits <source>.tif (ext defaults to tif). Co-authored-by: Isaac * fix(pyrx): rst_clip/rst_sample accept WKB/EWKB/WKT/EWKT (parity) Light rst_clip/rst_sample assumed WKB bytes (bytes(geom_wkb)) and threw TypeError on a WKT/EWKT string, but heavy accepts all four encodings (the xView example passes EWKT). Add a shapely-only gbx._geom.geom_to_wkb that decodes WKB/EWKB bytes or WKT/EWKT str to WKB bytes; use it in both UDFs. Centralize parse_geom in gbx._geom (pyvx re-exports) so pyrx needs no pyvx import (no MVT-dep leak into a pyrx-only install). Co-authored-by: Isaac * fix(light): all geom-accepting functions handle WKB/EWKB/WKT/EWKT Route every remaining user-geometry input through the shared gbx._geom decoder so encodings are consistent tier-wide (no per-function surprises): viewshed observer_geom, gridfrompoints(+agg), dtmfromgeoms(+agg) and the TIN point/breakline decoders. pygx._geom now re-exports the shared decoder (BNG/quadbin/custom inherit). Output/encode paths and already-decoded core paths unchanged. Co-authored-by: Isaac * docs(installation): add 'Docs updated for v0.4.0 (coming soon)' badge Floated red/white badge top-right of the Installation title. Co-authored-by: Isaac * docs(xview): default FORCE_REBUILD=False (skip already-built tables) Re-run convenience: skip tables that already exist instead of always rebuilding; set True to force a full rebuild. Co-authored-by: Isaac * docs(xview): force the clip step (cmd 33) to rebuild on re-run xview_object_clip exists from prior runs, so FORCE_REBUILD=False would skip it and never exercise the rst_clip EWKT fix. Force just that cell to always rebuild (overwrite); the upstream raster/object tables still skip. Co-authored-by: Isaac * fix(pyrx): rst_clip reprojects cutline to raster CRS (heavy parity) Light clip did a bare rasterio.mask with no reprojection, so an EWKT/EWKB cutline in a different CRS than the raster raised 'Input shapes do not overlap raster'. Mirror heavy RST_Clip: read the cutline SRID and reproject to the raster CRS (rasterio.warp.transform_geom) before masking; fall back to as-is when SRID is 0/unknown or the raster has no CRS. _clip_udf now passes the SRID-bearing parsed geom (geom_to_wkb dropped the SRID). Co-authored-by: Isaac * fix(pyrx): geom x raster ops align CRS + handle non-overlap gracefully Audit + fix the 'geometry not aligned to raster' class: every function combining a geom with a raster now (1) reprojects the geom from its SRID to the raster CRS, and (2) returns null/empty instead of hard-crashing when the geom does not overlap the raster (matching heavy GDAL). Covers clip (graceful non-overlap), sample (reproject + graceful), viewshed, and the rasterize/dtmfromgeoms/gridfrompoints constructors as applicable. Co-authored-by: Isaac * docs(xview): read whole-image tiles (sizeInMB) so clips align to labels gtiff_gbx split larger images into 4 window-tiles at the default 16MB, so the image_file join paired each label with all 4 windows and rst_clip hit tiles that don't contain the label. Read one whole-image tile per .tif (large sizeInMB), matching the heavy rst_fromfile one-tile-per-image flow, so each label clips against the tile that contains it. Co-authored-by: Isaac * docs: move v0.4.0 badge from installation to intro page The 'Docs updated for v0.4.0' badge reads better on the intro landing page than on installation; move it there. Co-authored-by: Isaac * docs(execution-tiers): warn heavyweight needs JAR+init, not the wheel The one-line import swap is symmetric but the install is not: clarify that the heavyweight tier needs the JAR + GDAL init script, not just the wheel. Co-authored-by: Isaac * feat(readers): raster sizeInMB default -1 (no split; one tile per file) Both tiers default the raster reader to no-split (one whole-image tile per file) instead of the 16MB auto-split, which silently multi-tiled larger rasters and broke path-keyed joins. sizeInMB<=0 = whole image; set a positive MB value to opt into tiling. A single tile that would exceed the ~2GB Spark cell limit fails with an actionable 'set sizeInMB' message. tileSize option left unchanged. Co-authored-by: Isaac * docs: Intro sidebar label + execution-tiers wording Capitalize the intro sidebar entry (sidebar_label: Intro) and include the execution-tiers tier-overview edits. Co-authored-by: Isaac * docs(xview): refresh last-modified + link to Execution Tiers Refresh the example's Last Modified date and add a note linking to the Execution Tiers page from the Setup section. Co-authored-by: Isaac * docs(api): set title frontmatter so browser tab isn't the logo JSX The function-reference H1 is an <img> logo; Docusaurus derived the page <title> from it, so the browser tab showed the raw '<img src={...}' markup. Add a title frontmatter (RasterX/GridX/VectorX Function Reference) which takes precedence for <title> while the logo H1 still renders. Co-authored-by: Isaac * docs(xview): FORCE_REBUILD=True default + restore clip-cell guard Default to a full rebuild every run (validated: 450 Yacht clips, one whole-image tile per raster). Restore cmd 33's guard to the standard FORCE_REBUILD-or-not-exists form (the temporary 'if True' is no longer needed). The clip write already uses the gtiff_gbx writer with nameCol. Co-authored-by: Isaac * docs(release-notes): v0.4.0 lightweight hardening notes Fold the post-merge lightweight-tier hardening into the 0.4.0 notes: Serverless install support (quoted PEP 508 + protobuf<6 pin), geometry inputs accept WKB/EWKB/WKT/EWKT everywhere, geom x raster ops reproject to the raster CRS + handle non-overlap gracefully, and the raster reader now defaults to no-split (sizeInMB=-1). Plus gtiff_gbx nameCol + dbfs path column where relevant. Also correct the limitations page so the Serverless/Classic/ARM compute requirements read as heavyweight-only. Co-authored-by: Isaac --------- Co-authored-by: Michael Johns <user.name>
2 parents 3430653 + 35f70db commit d71e7e6

45 files changed

Lines changed: 2100 additions & 257 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,11 @@ A **single wheel + single JAR** runs on both: Scala 2.13.16 matches both runtime
5757
Stage the wheel (a [Releases](https://github.com/databrickslabs/geobrix/releases) artifact, not on PyPI) in a Unity Catalog Volume, then install the `[light]` extra:
5858

5959
```python
60-
%pip install '/Volumes/<catalog>/<schema>/<volume>/geobrix-<version>-py3-none-any.whl[light]'
60+
%pip install "geobrix[light] @ file:///Volumes/<catalog>/<schema>/<volume>/geobrix-<version>-py3-none-any.whl"
6161
```
6262

63+
> **Use the quoted `geobrix[light] @ file://…` form** (PEP 508, one argument). Don't put the extra on the path (`'/Volumes/…/…whl[light]'`) — on Serverless, `%pip` keeps the surrounding quotes and pip reads `[light]` as part of the filename, failing with *"Expected package name at the start of dependency specifier."* The named form installs cleanly on Serverless, standard/shared, and ARM.
64+
6365
```python
6466
from databricks.labs.gbx.ds.register import register # *_gbx readers/writers
6567
from databricks.labs.gbx.pyrx import functions as rx # gbx_rst_* functions

docs/docs/api/execution-tiers.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ df.select(rx.rst_slope("tile", unit="degrees"))
2020

2121
After an explicit `rx.register(spark)`, the SQL names are identical too (`gbx_rst_*`), so SQL is portable across tiers.
2222

23+
:::warning Heavyweight needs more than the wheel
24+
The one-line *import* swap is symmetric, but the *install* is not. The **lightweight** tier is just the `[light]` wheel (`%pip`, no JAR, no init script). The **heavyweight** tier additionally requires the **GeoBrix JAR as a cluster library and the GDAL init script** on a **classic x86 cluster** — the wheel alone will not resolve the import or the JVM expressions. See [Installation](../installation) for the heavyweight setup.
25+
:::
26+
2327
## Tradeoffs
2428

2529
| Aspect | Heavyweight (rasterx) | Lightweight (pyrx) |

docs/docs/api/gridx-functions.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
sidebar_position: 6
3+
title: GridX Function Reference
34
---
45

56
import CodeFromTest from '@site/src/components/CodeFromTest';

docs/docs/api/raster-functions.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
sidebar_position: 2
3+
title: RasterX Function Reference
34
---
45

56
import Tabs from '@theme/Tabs';

docs/docs/api/vectorx-functions.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
sidebar_position: 7
3+
title: VectorX Function Reference
34
---
45

56
import Tabs from '@theme/Tabs';
@@ -78,7 +79,7 @@ vx.register(spark)
7879

7980
| Aspect | Lightweight (pyvx) | Heavyweight (vectorx) |
8081
|---|---|---|
81-
| Install | `%pip install geobrix` (wheel) | Init script + JAR |
82+
| Install | Volume-staged `[light]` wheel ([install](../installation)) | Init script + JAR |
8283
| Serverless / shared / ARM | Supported | Not supported |
8384
| Lakeflow declarative pipelines | Supported | Not supported |
8485
| Execution model | Python UDTF / pandas UDF | JVM (Scala + Spark columnar) |

docs/docs/beta-release-notes.mdx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,11 @@ This page tracks **API and naming changes** since the GeoBrix project started. A
1818
In-flight beta release. Per-version highlights; full migration tables are in the per-component sections below.
1919

2020
- **Lightweight execution tier (pyrx, pygx, pyvx).** A pure-Python implementation of the GeoBrix API that needs no JAR and no init script, and runs on serverless compute, standard (shared) clusters, Lakeflow declarative pipelines, and ARM. It keeps the same function names and the same `gbx_*` SQL after `register`, so switching tiers is a one-line import change. **RasterX** (`pyrx`, on [rasterio](https://rasterio.readthedocs.io/)) implements every `rst_*` function; **GridX** (`pygx`) covers quadbin, BNG, and custom grids; **VectorX** (`pyvx`) covers MVT, TIN surface modeling, and legacy-geometry migration. With this release GridX and VectorX are fully both-tier — the lightweight tier reaches 1:1 parity with the heavyweight one across all three packages. See [Choosing an Execution Tier](./api/execution-tiers).
21+
- **Serverless support is verified and documented.** `geobrix[light]` installs and runs on Databricks Serverless (environment v5), standard (shared) clusters, and ARM. Install with the quoted PEP 508 named form — `%pip install "geobrix[light] @ file:///Volumes/.../geobrix-0.4.0-py3-none-any.whl"` — not the path-with-extra form (`'…whl[light]'`), which fails on Serverless because `%pip` writes the surrounding quotes into the requirement and pip reads `[light]` as part of the filename. `mapbox-vector-tile` is pinned to 2.1.x so its `protobuf` dependency stays `<6` (Spark Connect compatibility on Serverless), and `idna` is pinned `<3.8` to avoid a core-package-change notice. See [Installation](./installation?tier=lightweight).
22+
- **Geometry inputs accept WKB, EWKB, WKT, and EWKT consistently.** Every geometry-accepting function in both tiers now decodes all four encodings through a single shared decoder. Previously some lightweight functions accepted only WKB.
23+
- **Geometry×raster operations align to the raster CRS and handle non-overlap gracefully.** `gbx_rst_clip`, `gbx_rst_sample`, and `gbx_rst_viewshed` reproject the input geometry from its SRID to the raster's CRS (matching the heavyweight GDAL behavior), so a geometry in a different CRS clips/samples the correct region. A geometry that does not overlap the raster now returns null / empty instead of raising an error.
24+
- **Raster reader default changed to no-split (`sizeInMB = -1`, behavior change since v0.3.0).** The `gdal` / `gtiff_gdal` (heavyweight) and `raster_gbx` / `gtiff_gbx` (lightweight) readers now default `sizeInMB` to `-1` — one whole-image tile per file — instead of auto-splitting large rasters at 16 MB. Set a positive `sizeInMB` to opt back into tiling for parallel processing of large files. See [Raster Readers](./readers/raster).
25+
- **Lightweight raster writer and source-column parity.** The lightweight `gtiff_gbx` writer accepts the `nameCol` option for deterministic output filenames, matching the heavyweight GDAL writer. The lightweight raster reader's `source` column is now `dbfs:`-scheme-qualified to match `binaryFile` and the heavyweight reader, so DataFrames join cleanly across tiers; lightweight file operations strip the scheme internally.
2126
- **Vector tile encoding (`gbx_st_asmvt`).** First VectorX expression-level function — aggregates features into MVT protobuf bytes for slippy-map publishing. See [VectorX § Vector tile output](./api/vectorx-functions#vector-tile-output).
2227
- **Vector tile pyramid (`gbx_st_asmvt_pyramid`).** Generator function: emits one row per `(z, x, y)` tile that input geometries intersect, encoded as MVT bytes. Composes with `gbx_pmtiles_agg` for end-to-end vector publishing pipelines. Builds on `gbx_st_asmvt` and shares the same web-mercator tile math as `gbx_rst_xyzpyramid`. See [VectorX § Vector tile output](./api/vectorx-functions#vector-tile-output).
2328
- **Quadbin grid math (10 functions).** New `gridx/quadbin` subpackage adds CARTO quadbin v0 support — `gbx_quadbin_pointascell`, `gbx_quadbin_aswkb`, `gbx_quadbin_centroid`, `gbx_quadbin_resolution`, `gbx_quadbin_polyfill`, `gbx_quadbin_kring`, `gbx_quadbin_tessellate`, `gbx_quadbin_cellunion`, `gbx_quadbin_cellunion_agg`, `gbx_quadbin_distance`. Cell IDs are 64-bit Long; coordinates are EPSG:4326 lon/lat; output geometry is EWKB SRID=4326. Cell encoding matches the [CARTO quadbin-py](https://github.com/CartoDB/quadbin-py) reference implementation (cross-checked at 5 reference points). See [GridX § Quadbin](./api/gridx-functions#quadbin-carto-v0).

docs/docs/installation.mdx

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,20 @@ The wheel ships as a **GitHub release artifact** for GeoBrix 0.4.0+ — it is **
3939
3. **Install** it — either notebook-scoped with the `%pip` magic (installs across the whole cluster for the notebook session; plain `pip` installs only on the driver), or as a [cluster-scoped library](https://docs.databricks.com/aws/en/libraries/notebooks-python-libraries#manage-libraries-with-pip-commands) pointing at the same Volume path (works on Serverless and Classic):
4040

4141
```python
42-
%pip install '/Volumes/<catalog>/<schema>/<volume>/geobrix/geobrix-<version>-py3-none-any.whl[light]'
42+
%pip install "geobrix[light] @ file:///Volumes/<catalog>/<schema>/<volume>/geobrix/geobrix-<version>-py3-none-any.whl"
4343
```
4444

45+
:::warning Use the quoted `geobrix[light] @ file://…` form
46+
Install with the **PEP 508 named form above**, wrapped in quotes as a single
47+
argument. Do **not** put the extra on the path —
48+
`%pip install '/Volumes/…/geobrix-<version>-py3-none-any.whl[light]'`. On
49+
**Serverless**, `%pip` writes the requirement to a file *including the
50+
surrounding quotes*, so pip reads `[light]` as part of the filename and fails
51+
with *"Expected package name at the start of dependency specifier."* The named
52+
`geobrix[light] @ file:///…` form installs cleanly on **Serverless**,
53+
standard/shared, and ARM.
54+
:::
55+
4556
Then import the package(s) you need and (optionally) register their SQL functions. Each light package exposes the same `functions` / `register(spark)` pattern:
4657

4758
```python

docs/docs/intro.mdx

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,25 @@
11
---
22
sidebar_position: 1
3+
sidebar_label: Intro
34
---
45

56
import CodeFromTest from '@site/src/components/CodeFromTest';
67
import GeoBrixLogo from '../../resources/images/GeoBriX.png';
78

9+
<div style={{
10+
float: 'right',
11+
background: '#c62828',
12+
color: '#ffffff',
13+
padding: '0.5rem 0.8rem',
14+
borderRadius: '6px',
15+
fontWeight: 700,
16+
fontSize: '0.8rem',
17+
lineHeight: 1.4,
18+
textAlign: 'center',
19+
marginLeft: '1rem',
20+
boxShadow: '0 1px 4px rgba(0,0,0,0.2)',
21+
}}>Docs updated for<br /><strong>v0.4.0</strong><br />(coming soon)</div>
22+
823
# Introduction to GeoBrix
924

1025
GeoBrix is a high-performance spatial processing library for Databricks. It ships two **interchangeable execution tiers** — a **lightweight** pure-Python/PySpark tier (no JAR, no init script, no native GDAL; runs on Serverless, standard/shared, Lakeflow, and ARM compute) and a **heavyweight** Scala/GDAL tier (GDAL on Apache Spark, for distributed processing on classic x86 clusters). Both register the **same function names**, so moving between them is a one-line import swap — and GeoBrix is progressively bringing the lightweight tier to full parity with the heavyweight one. Today the lightweight tier covers all of **RasterX**, all of **VectorX** (MVT, TIN surfaces, legacy-geometry migration), and all of **GridX** (CARTO quadbin, British National Grid (BNG), and custom grids). See [Choosing an Execution Tier](./api/execution-tiers).

docs/docs/limitations.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,8 +76,10 @@ PROJ in a future release.
7676

7777
## Compute Requirements
7878

79-
GeoBrix requires Databricks Classic Clusters:
80-
- **Not** currently compatible with Serverless compute
79+
The compute requirements below apply to the **heavyweight** tier (Scala JAR + native GDAL). The **lightweight** tier (`geobrix[light]`, pure-Python on rasterio's bundled GDAL) has none of them — it runs on Serverless compute (environment v5), standard (shared) clusters, Lakeflow declarative pipelines, and ARM. See [Choosing an Execution Tier](./api/execution-tiers).
80+
81+
The heavyweight tier requires Databricks Classic Clusters:
82+
- **Not** compatible with Serverless compute (use the lightweight tier there)
8183
- Requires GDAL native libraries via init script, which are currently only supported on classic clusters
8284
- **Non-ARM instance types only (Intel or AMD x86_64).** The GDAL bundle ships `amd64` `.deb`s from the UbuntuGIS PPA — `amd64` and `x86_64` are the same architecture, and Intel and AMD CPUs are interchangeable. ARM-based instance types — AWS Graviton, Ampere, Apple Silicon — are not supported. The init script fails fast on `aarch64`.
8385

docs/docs/quick-start.mdx

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,23 @@ GeoBrix ships in two tiers that share the same function names — pick one with
2121
Pure-Python (rasterio / pyogrio / NumPy), no JAR and no GDAL or OGR to install. A single wheel via `%pip` or as a cluster library, and it runs on serverless, standard/shared, ARM, and Lakeflow declarative pipelines. Covers **RasterX** (every `rst_*` function), **VectorX** (`gbx_st_*`), and the **GridX quadbin** grid (`gbx_quadbin_*`), plus the lightweight readers.
2222

2323
```python
24-
%pip install geobrix # single wheel — no JAR, no GDAL/OGR
24+
# Stage the wheel in a Unity Catalog Volume, then install the [light] extra.
25+
# Use the quoted PEP 508 "name[extra] @ file://" form (Serverless-safe):
26+
%pip install "geobrix[light] @ file:///Volumes/<catalog>/<schema>/<volume>/geobrix-<version>-py3-none-any.whl"
2527

2628
from databricks.labs.gbx.pyrx import functions as rx
2729
rx.register(spark) # installs the gbx_rst_* SQL names, pyspark-backed
2830
```
2931

32+
:::warning Quote the `geobrix[light] @ file://…` requirement
33+
Install with the **named, quoted** form above. Do **not** put the extra on the
34+
path (`'/Volumes/…/…whl[light]'`): on **Serverless**, `%pip` writes the
35+
requirement to a file *including the quotes*, so pip reads `[light]` as part of
36+
the filename and fails with *"Expected package name at the start of dependency
37+
specifier."* The named form installs cleanly on Serverless, standard/shared,
38+
and ARM.
39+
:::
40+
3041
</TabItem>
3142
<TabItem value="heavyweight" label="Heavyweight">
3243

0 commit comments

Comments
 (0)