Skip to content

perf(seaport_v4): bound OrdersMatched scan to incremental window#9781

Merged
0xRobin merged 2 commits into
mainfrom
andre/seaport-v4-bound-orders-matched
Jun 29, 2026
Merged

perf(seaport_v4): bound OrdersMatched scan to incremental window#9781
0xRobin merged 2 commits into
mainfrom
andre/seaport-v4-bound-orders-matched

Conversation

@a-monteiro

@a-monteiro a-monteiro commented Jun 13, 2026

Copy link
Copy Markdown
Member

The seaport_v4_trades macro bounds the Seaport_evt_OrderFulfilled scans to the incremental 7-day window, but the sibling Seaport_evt_OrdersMatched scan (the iv_orders_matched CTE) has no time bound at all. So every hourly run scans OrdersMatched over all history while the rest of the model only looks at the last 7 days. On base today that scan reads the full 805,482-row OrdersMatched history (forcing a ~34.4M-row logs_decoded read, 98% rejected) when only 779 rows fall in the incremental window — and it grows unbounded as history accumulates.

This adds the same is_incremental() 7-day bound that OrderFulfilled already uses, so all three Seaport event scans prune to the same window.

The fix lives in the shared macro, so it applies to all 16 models that call seaport_v4_trades (opensea_v4 across ~12 chains plus magic_eden / spaace / mooar / nova).

Equivalence (provable no-op)

iv_orders_matched only affects output through left join ... on b.om_tx_hash = a.tx_hash and b.om_order_hash = a.order_hash, where a (OrderFulfilled) is already bounded to 7 days. A matched order emits its OrdersMatched and OrderFulfilled events in the same transaction (same block_time), so any OrdersMatched row older than the cutoff has a tx_hash that is absent from the bounded OrderFulfilled set and can never join.

Verified on prod (spellbook-hourly, base, incremental window):

  • Join lemma: of the OrdersMatched rows that join the 7-day-bounded OrderFulfilled set, 779 join and 0 fall outside the window — the bound removes only non-joining rows.
  • Full-output EXCEPT both ways = 0 on all 28 meaningful output columns (359,280 rows), current macro vs. bounded macro. The only column that differs is sub_tx_trade_id (row_number() over (partition by tx_hash order by evt_index)), which is pre-existing tie-break nondeterminism — it varies run-to-run on the current code too (16/24 then 19/19 on identical inputs), independent of this change.

A/B (base, 3 warm interleaved runs, medians, dtrino UTC)

current (OM unbounded) this PR (OM bounded)
physical_input_bytes 12.86 GB 10.88 GB (-15.4%)
cpu_ms 548.4 s 568.3 s (flat, within noise)
wall 29.1 s 21.4 s (noisy)
peak mem ~3.0 GB ~2.3 GB
spill 0 0

This is an IO win (CPU is flat — the dominant CPU stage is the JSON-parse in iv_base_pairs, untouched here; tracked separately). The IO saving grows over time because the eliminated scan is full-history.


Fixes CUR2-2814
Towards CUR2-2700

@github-actions github-actions Bot added the WIP work in progress label Jun 13, 2026

Copy link
Copy Markdown
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions Bot added the dbt: hourly covers the hourly dbt subproject label Jun 13, 2026
@a-monteiro

a-monteiro commented Jun 13, 2026

Copy link
Copy Markdown
Member Author

Correction (2026-06-20): the "byte-identical to main" wording below was wrong — the patch also
adds a not is_incremental() start_date bound to iv_orders_matched, so the full-refresh SQL does
change (it only ever removes OrdersMatched rows, so it can't have caused the timeout). Full-refresh
output parity and merge-key stability are proven with prod data in
this reply. The
timeout conclusion below stands.

CI note: structural 90-min full-build timeout (not a regression)

dbt-run (hourly_spellbook) / dbt-test was cancelled at the 90-minute job timeout
(conclusion: cancelled, failing step dbt run initial model(s)) — this is not a test
failure or a regression from this change.

Why it timed out: slim-CI selects the 16 seaport_v4_trades macro consumers as modified and
full-builds each from scratch. On a fresh CI schema is_incremental() is false, so every model
runs as a full CREATE TABLE over all history. The high-volume chains dominate the wall clock:

model CI build time
opensea_v4_optimism_base_trades 4290 s (71.5 min)
opensea_v4_avalanche_c_base_trades 1419 s
opensea_v4_ethereum_base_trades still running at cancel (>75 min)
opensea_v4_polygon_base_trades still running at cancel (>75 min)

The other 12 models each built in 1-8 min; the three heavy chains serialize at the tail and push the
job past 90 min.

Why this change cannot be the cause: the edit only bounds the iv_orders_matched (Seaport
OrdersMatched) scan — a 7-day window in the incremental branch and a start_date floor in the
full-refresh branch. Both can only remove rows from that one build-side scan, so the CI full-refresh
does strictly-less-or-equal work than main; it cannot have caused or worsened the timeout. The timeout
is the heavy chains' full-history OrderFulfilled + transactions build, which this change doesn't touch.

Precedent: same structural 90-min full-build timeout as #9774 and #9670 — needs human
merge-through
, not a CI fix. Keeping this PR as draft. Correctness is already proven in the
description: EXCEPT = 0 both ways on the 28 meaningful output columns (359,280 rows; the only
diff is the sub_tx_trade_id = row_number() tie-break, excluded), plus the A/B (IO 12.86 -> 10.88 GB
per run, -15.4%; stops unbounded OrdersMatched full-history scan growth).

@tomfutago

Copy link
Copy Markdown
Contributor

Correctness / validation risk worth double-checking:

The CI note should be corrected: this patch also changes full-refresh SQL by adding the not is_incremental() start_date bound to iv_orders_matched. The change still looks safe if OrdersMatched and OrderFulfilled are always emitted in the same transaction/block time, but CI’s full-build timeout can’t be dismissed as “compiled SQL byte-identical to main” as written. Could you update the note and/or include a full-refresh parity check for the changed bound?

Also, since sub_tx_trade_id is part of the model unique_key, can you add one sentence/query result clarifying whether the (block_number, tx_hash, sub_tx_trade_id) key set is unchanged by this filter? I understand the row_number() tie-break is pre-existing nondeterminism, but excluding the merge key from the EXCEPT leaves a small validation gap for incremental merge safety.

@a-monteiro

Copy link
Copy Markdown
Member Author

@tomfutago thanks — you're right on both counts, correcting the record.

1. Full-refresh SQL is NOT byte-identical to main (my note overstated it)

The patch does add and evt_block_time >= TIMESTAMP '{{start_date}}' to iv_orders_matched in the not is_incremental() branch, so the full-refresh SQL changes. What I should have written: the added bound can only remove OrdersMatched rows, so it's a strictly-smaller-or-equal scan and cannot have caused or worsened the 90-min full-build timeout — that timeout is the heavy chains' OrderFulfilled + transactions full-history build, which this change doesn't touch.

Full-refresh parity of the added bound (verified on prod / spellbook-hourly). A matched OrdersMatched/OrderFulfilled pair is emitted in the same transaction → same block_time. Structurally: iv_orders_matched is a LEFT JOIN that only annotates OF-derived rows (om_tx_hash = a.tx_hash and om_order_hash = a.order_hash, macro L301-302) — it never creates trade rows; and iv_trades INNER JOINs source_ethereum_transactions, which is itself start_date-bounded on main (L21-23), as is iv_offer_consideration (L118-120). So an OM row with block_time < start_date could only ever join an OF row in that same (pre-start_date, already-filtered-on-main) transaction.

Measured on the order contracts:

chain OM rows (orderhash-unnested) OM before start_date pre-start_date OM joining a surviving OF row matched pairs with differing block_time
base (start_date 2023-07-19) 1,650,405 0 0 0
ethereum (start_date 2023-02-01) 2,567,409 0 0 0

matched_pairs_with_differing_block_time = 0 is the chain-independent invariant; and on both the oldest/biggest chains the bound removes 0 rows anyway, so full-refresh output is unchanged.

2. Merge key (block_number, tx_hash, sub_tx_trade_id) is unchanged by the filter

The set of output trade rows is identical — the recorded EXCEPT = 0 both ways was on the 28 stable columns including block_number and tx_hash (359,280 rows/side), i.e. an identical multiset — so per-(block_number, tx_hash) row counts N are identical, and row_number() over (partition by tx_hash order by evt_index) therefore emits the identical {1..N} set of ids per tx. The row_number input is preserved too: evt_index = coalesce(om_evt_index, 0) + evt_index (L199), and by the same-tx invariant above no surviving OF row loses its OM match, so om_evt_index / om_order_id are byte-identical for every output row — which is exactly why the 28-column EXCEPT was 0.

Live prod check on the full table — the key is collision-free:

opensea_v4_base.base_trades value
rows 27,336,768
distinct (block_number, tx_hash, sub_tx_trade_id) 27,336,768
key collisions 0

On why I excluded sub_tx_trade_id from the EXCEPT: that column carries pre-existing nondeterminism — when two output rows in a tx share the same evt_index, the row_number() tie-break orders them arbitrarily, so an EXCEPT that includes it shows a handful of spurious diffs that flip run-to-run (16/24 → 19/19) on unchanged code. That behavior is on main today; this PR neither introduces nor affects it (it touches neither row_number nor evt_index). The key set — what the MERGE dedups on — is provably identical, per above. Happy to file the row_number tie-break idempotency as its own follow-up if useful; it's orthogonal to this scan-bound change.

@a-monteiro

Copy link
Copy Markdown
Member Author

@tomfutago — gentle nudge: both your 06-18 points were addressed in my 06-20 reply, and the change is fully proven, so I think this is ready for a re-review + merge-through whenever you have a moment.

Recap of the resolution:

  • Full-refresh SQL note corrected. The patch does add the not is_incremental() start_date bound to iv_orders_matched — so it's not byte-identical to main — but the bound can only remove OrdersMatched rows, so it's strictly ≤ work and can't have caused the timeout. Full-refresh parity proven on prod: matched OM/OF pairs share the tx → same block_time (base 1,650,405 pairs / ethereum 2,567,409 pairs, 0 with differing block_time; 0 OM rows before start_date on either).
  • Merge-key stability. EXCEPT=0 both ways on all 28 cols incl block_number+tx_hash (359,280 rows) → identical per-tx counts → identical row_number() {1..N}; live prod opensea_v4_base.base_trades 27,336,768 rows == 27,336,768 distinct (block_number, tx_hash, sub_tx_trade_id) keys (0 collisions). The sub_tx_trade_id tie nondeterminism is pre-existing on main, not introduced here.

The red dbt-test is the structural 90-min full-build timeout (slim-CI full-refreshes all 16 macro consumers from scratch; this change only ever removes rows from one build-side scan) — same class as #9774/#9670, which were merged through. Happy to mark it ready for merge on your go.

@a-monteiro a-monteiro requested a review from tomfutago June 23, 2026 19:45
@a-monteiro a-monteiro marked this pull request as ready for review June 23, 2026 19:54
@cursor

cursor Bot commented Jun 23, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Single macro predicate change with documented join equivalence and full-output verification; performance-only scope across NFT trade models, no auth or data-schema changes.

Overview
Aligns Seaport_evt_OrdersMatched with the existing OrderFulfilled time pruning in the shared seaport_v4_trades macro. The iv_orders_matched CTE now applies evt_block_time >= start_date on full refreshes and a 7-day date_trunc('day', now() - interval '7' day) filter on incremental runs, instead of scanning all OrdersMatched history every hourly job.

That change applies everywhere the macro is used (OpenSea v4 and related NFT trade models across chains). Join logic is unchanged: OrdersMatched still keys off tx_hash and order_hash into the already window-bounded OrderFulfilled rows, so older OrdersMatched rows that never join are the only rows dropped—intended as a read/IO optimization, not a logic change.

Reviewed by Cursor Bugbot for commit c3ecd28. Configure here.

@github-actions github-actions Bot added ready-for-review this PR development is complete, please review and removed WIP work in progress labels Jun 23, 2026

@0xRobin 0xRobin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're in here, let's adopt the incremental_predicate(..) macro.
This model does not need 7 days worth of lookback.

Replace all hardcoded is_incremental() bounds (interval '7' day) with incremental_predicate() macro on all 4 sites: block_time in source_ethereum_transactions, and evt_block_time in both OrderFulfilled scans and the OrdersMatched scan. On hourly_spellbook the macro resolves to a 3-day window, shrinking the per-run lookback from 7d to 3d as requested by 0xRobin. OrderFulfilled and OrdersMatched are kept on the same window so joined pairs are never split across the boundary.
@a-monteiro

Copy link
Copy Markdown
Member Author

Adopted incremental_predicate() on all 4 is_incremental() bounds (commit 2cf4aaa):

  • block_time in source_ethereum_transactions (line 25)
  • evt_block_time in iv_orders_matched / OrdersMatched scan (line 49)
  • evt_block_time in both iv_offer_consideration subqueries / OrderFulfilled offer + consideration scans (lines 122 and 190)

On hourly_spellbook the macro resolves to a 3-day window (DBT_ENV_INCREMENTAL_TIME=3), so the lookback shrinks 7d → 3d as requested.

OrderFulfilled and OrdersMatched are kept on the same incremental_predicate('evt_block_time') window: a Seaport trade emits both events in the same transaction (same block_time), so bounding both scans to the same 3-day window guarantees every OM row in the window finds its matching OF row — no matched pair is split across the boundary.

CI note: the seaport_v4_trades full-build CI check is known to OOM/spill/time out independent of this change (pre-existing issue, same behaviour as PRs #9670 and #9774). A red full-build CI on this PR is expected and is not a regression — it needs a human merge-through.

@0xRobin 0xRobin added ready-for-merging and removed ready-for-review this PR development is complete, please review labels Jun 29, 2026
@0xRobin 0xRobin merged commit 2eb41eb into main Jun 29, 2026
9 of 11 checks passed
@0xRobin 0xRobin deleted the andre/seaport-v4-bound-orders-matched branch June 29, 2026 13:44
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 29, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

dbt: hourly covers the hourly dbt subproject ready-for-merging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants