Skip to content

Duckdb migration#2

Open
maxmalynowsky wants to merge 42 commits intomainfrom
duckdb-migration
Open

Duckdb migration#2
maxmalynowsky wants to merge 42 commits intomainfrom
duckdb-migration

Conversation

@maxmalynowsky
Copy link
Copy Markdown
Member

No description provided.

maxmalynowsky and others added 12 commits April 30, 2026 22:19
The _05 cell-to-fid join now asks "where does this cell live?" against
both _01 (primary) and _04 (fallback) using the cell's own interior
point, instead of the prior "does any _01 interior point fall inside this
cell?" with a QUALIFY tiebreak. This fixes routing for cells that sit
inside _01 but don't happen to contain _01's interior point — concave
shapes, multipart polygons, sub-cells from polygonization. On COL ADM3
the missing-area check goes from 17 bad fids to 1 (residual case is a
polygonization fusion, symmetric in both join logics).

Precomputed bbox columns on _05_tmp1 (per-part _01) and _04 let
downstream joins use plain numeric comparisons instead of recomputing
ST_XMin/etc per candidate pair. Direct measurement on COL ADM3: _05
query 146.7s → 10.9s (13×) on the same machine and settings.

Tables renamed to reflect creation order in merge.py:
  _05_tmp4 (parts) → _05_tmp1
  _05_tmp1 (extension lines) → _05_tmp2
  _05_tmp2 (snapped lines) → _05_tmp3
  _05_tmp3 (polygonized cells) → _05_tmp4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace _check_missing_area with _check_input_preserved: ST_Difference
on each input vs same-fid output catches the case where outward extension
masks a feature losing area to a neighbour at an internal boundary.
Re-polygonize fids whose interior boundary was reassigned during merge,
injecting their _02a exterior rings into the line union. Surgical pass
first (flagged fids only), global if any remain. Tighten reassignment
threshold from 0.1% to 0.01%; defer drops of _02a/_02b/_04/_05_tmp1/
_05_tmp3/_05_tmp4 to outputs.py so repair has the tables it needs.

Also tee per-run logs to tmp/{name}.log via log_file() context manager.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
merge.py was getting long; the polygonize/repair half is a self-
contained unit that lives better in its own file. While here, normalize
the rest of the stage modules to public-first ordering for consistency
with merge.py and polygonize.py.
attempt.py stays unnumbered since it's an orchestrator wrapping
points + voronoi with retry, not a stage of its own. clean is _01a
because it's an optional sub-step that only runs on coverage violations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant