Duckdb migration#2
Open
maxmalynowsky wants to merge 42 commits intomainfrom
Open
Conversation
5d24fc3 to
746bf10
Compare
The _05 cell-to-fid join now asks "where does this cell live?" against both _01 (primary) and _04 (fallback) using the cell's own interior point, instead of the prior "does any _01 interior point fall inside this cell?" with a QUALIFY tiebreak. This fixes routing for cells that sit inside _01 but don't happen to contain _01's interior point — concave shapes, multipart polygons, sub-cells from polygonization. On COL ADM3 the missing-area check goes from 17 bad fids to 1 (residual case is a polygonization fusion, symmetric in both join logics). Precomputed bbox columns on _05_tmp1 (per-part _01) and _04 let downstream joins use plain numeric comparisons instead of recomputing ST_XMin/etc per candidate pair. Direct measurement on COL ADM3: _05 query 146.7s → 10.9s (13×) on the same machine and settings. Tables renamed to reflect creation order in merge.py: _05_tmp4 (parts) → _05_tmp1 _05_tmp1 (extension lines) → _05_tmp2 _05_tmp2 (snapped lines) → _05_tmp3 _05_tmp3 (polygonized cells) → _05_tmp4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace _check_missing_area with _check_input_preserved: ST_Difference on each input vs same-fid output catches the case where outward extension masks a feature losing area to a neighbour at an internal boundary.
Re-polygonize fids whose interior boundary was reassigned during merge,
injecting their _02a exterior rings into the line union. Surgical pass
first (flagged fids only), global if any remain. Tighten reassignment
threshold from 0.1% to 0.01%; defer drops of _02a/_02b/_04/_05_tmp1/
_05_tmp3/_05_tmp4 to outputs.py so repair has the tables it needs.
Also tee per-run logs to tmp/{name}.log via log_file() context manager.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
merge.py was getting long; the polygonize/repair half is a self- contained unit that lives better in its own file. While here, normalize the rest of the stage modules to public-first ordering for consistency with merge.py and polygonize.py.
attempt.py stays unnumbered since it's an orchestrator wrapping points + voronoi with retry, not a stage of its own. clean is _01a because it's an optional sub-step that only runs on coverage violations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.