perf(seaport_v4): bound OrdersMatched scan to incremental window#9781
Conversation
CI note: structural 90-min full-build timeout (not a regression)
Why it timed out: slim-CI selects the 16
The other 12 models each built in 1-8 min; the three heavy chains serialize at the tail and push the Why this change cannot be the cause: the edit only bounds the Precedent: same structural 90-min full-build timeout as #9774 and #9670 — needs human |
|
Correctness / validation risk worth double-checking: The CI note should be corrected: this patch also changes full-refresh SQL by adding the Also, since |
|
@tomfutago thanks — you're right on both counts, correcting the record. 1. Full-refresh SQL is NOT byte-identical to main (my note overstated it)The patch does add Full-refresh parity of the added bound (verified on prod / spellbook-hourly). A matched OrdersMatched/OrderFulfilled pair is emitted in the same transaction → same Measured on the order contracts:
2. Merge key
|
opensea_v4_base.base_trades |
value |
|---|---|
| rows | 27,336,768 |
distinct (block_number, tx_hash, sub_tx_trade_id) |
27,336,768 |
| key collisions | 0 |
On why I excluded sub_tx_trade_id from the EXCEPT: that column carries pre-existing nondeterminism — when two output rows in a tx share the same evt_index, the row_number() tie-break orders them arbitrarily, so an EXCEPT that includes it shows a handful of spurious diffs that flip run-to-run (16/24 → 19/19) on unchanged code. That behavior is on main today; this PR neither introduces nor affects it (it touches neither row_number nor evt_index). The key set — what the MERGE dedups on — is provably identical, per above. Happy to file the row_number tie-break idempotency as its own follow-up if useful; it's orthogonal to this scan-bound change.
|
@tomfutago — gentle nudge: both your 06-18 points were addressed in my 06-20 reply, and the change is fully proven, so I think this is ready for a re-review + merge-through whenever you have a moment. Recap of the resolution:
The red |
PR SummaryLow Risk Overview That change applies everywhere the macro is used (OpenSea v4 and related NFT trade models across chains). Join logic is unchanged: Reviewed by Cursor Bugbot for commit c3ecd28. Configure here. |
0xRobin
left a comment
There was a problem hiding this comment.
While we're in here, let's adopt the incremental_predicate(..) macro.
This model does not need 7 days worth of lookback.
Replace all hardcoded is_incremental() bounds (interval '7' day) with incremental_predicate() macro on all 4 sites: block_time in source_ethereum_transactions, and evt_block_time in both OrderFulfilled scans and the OrdersMatched scan. On hourly_spellbook the macro resolves to a 3-day window, shrinking the per-run lookback from 7d to 3d as requested by 0xRobin. OrderFulfilled and OrdersMatched are kept on the same window so joined pairs are never split across the boundary.
|
Adopted
On OrderFulfilled and OrdersMatched are kept on the same CI note: the |

The
seaport_v4_tradesmacro bounds theSeaport_evt_OrderFulfilledscans to the incremental 7-day window, but the siblingSeaport_evt_OrdersMatchedscan (theiv_orders_matchedCTE) has no time bound at all. So every hourly run scansOrdersMatchedover all history while the rest of the model only looks at the last 7 days. On base today that scan reads the full 805,482-row OrdersMatched history (forcing a ~34.4M-rowlogs_decodedread, 98% rejected) when only 779 rows fall in the incremental window — and it grows unbounded as history accumulates.This adds the same
is_incremental()7-day bound thatOrderFulfilledalready uses, so all three Seaport event scans prune to the same window.The fix lives in the shared macro, so it applies to all 16 models that call
seaport_v4_trades(opensea_v4 across ~12 chains plus magic_eden / spaace / mooar / nova).Equivalence (provable no-op)
iv_orders_matchedonly affects output throughleft join ... on b.om_tx_hash = a.tx_hash and b.om_order_hash = a.order_hash, wherea(OrderFulfilled) is already bounded to 7 days. A matched order emits itsOrdersMatchedandOrderFulfilledevents in the same transaction (sameblock_time), so any OrdersMatched row older than the cutoff has atx_hashthat is absent from the bounded OrderFulfilled set and can never join.Verified on prod (
spellbook-hourly, base, incremental window):sub_tx_trade_id(row_number() over (partition by tx_hash order by evt_index)), which is pre-existing tie-break nondeterminism — it varies run-to-run on the current code too (16/24 then 19/19 on identical inputs), independent of this change.A/B (base, 3 warm interleaved runs, medians, dtrino UTC)
This is an IO win (CPU is flat — the dominant CPU stage is the JSON-parse in
iv_base_pairs, untouched here; tracked separately). The IO saving grows over time because the eliminated scan is full-history.Fixes CUR2-2814
Towards CUR2-2700