Skip to content

repeat_row: Document refine_single_time_operator_selection's assumptions#36184

Draft
ggevay wants to merge 7 commits intoMaterializeInc:mainfrom
ggevay:repeat_row-3-refine_single_time_operator_selection
Draft

repeat_row: Document refine_single_time_operator_selection's assumptions#36184
ggevay wants to merge 7 commits intoMaterializeInc:mainfrom
ggevay:repeat_row-3-refine_single_time_operator_selection

Conversation

@ggevay
Copy link
Copy Markdown
Contributor

@ggevay ggevay commented Apr 21, 2026

This is the 3. PR in the repeat_row productionization work stream. This is on top of #36183. Resolves SQL-163.

Only the last 2 commits are new:

  1. slightly refactor some relevant tests;
  2. slightly refactor refine_single_time_operator_selection and refine_single_time_consolidation.

This is now a pure refactoring and comment clarification PR, because I dropped its original last commit.


The original last commit wanted to bail out from refine_single_time_operator_selection when it sees repeat_row. However, I realized that that would not really solve anything, because the dataflow's inputs might also have negative accumulations (consolidated to a single time), coming from repeat_row in other dataflows. So, in this longish slack thread, we decided to not worry about monotonic operators getting negatively accumulating input, but instead just add the errors from monotonic operators to the list of errors that we now have to promote from "definitely internal error" to "maybe you used repeat_row wrong, or internal error" (which I will do in a follow-up PR). Note that we are not totally sure whether all monotonic operators will reliably error out when presented with non-monotonic data. However, in the slack thread we also decided to not aim for always erroring when there is negatively accumulating data, because this seems to be ~impossible to achieve without adding further resource usage to various operators (not just monotonic operators).

(Another possibility would be to bail out from refine_single_time_operator_selection when there is repeat_row in either the current dataflow or any of its inputs, recursively, but that would lose too much performance.)

@ggevay ggevay requested review from a team as code owners April 21, 2026 14:07
@ggevay ggevay requested a review from mtabebe April 21, 2026 14:07
@ggevay ggevay added A-optimization Area: query optimization and transformation A-CLUSTER Topics related to the CLUSTER layer T-correctness Theme: relates to consistency and correctness of results. labels Apr 21, 2026
@ggevay
Copy link
Copy Markdown
Contributor Author

ggevay commented Apr 22, 2026

Bringing this back to draft, because I realized there is an issue: With repeat_row, Gets are no longer logically monotonic in a single-time context, because somebody might have written negatively accumulating stuff into an MV or index.

Edit: Being discussed on slack.

@ggevay ggevay marked this pull request as draft April 22, 2026 15:01
@ggevay ggevay force-pushed the repeat_row-3-refine_single_time_operator_selection branch from 2f40b20 to 10b4ddb Compare April 23, 2026 10:47
…ck inside the functions

Also, clarify the assumptions of `refine_single_time_operator_selection`.
@ggevay ggevay force-pushed the repeat_row-3-refine_single_time_operator_selection branch from 10b4ddb to 8952f54 Compare April 23, 2026 11:13
@ggevay
Copy link
Copy Markdown
Contributor Author

ggevay commented Apr 23, 2026

Adjusted the PR, see the new PR description. Ready for review.

@ggevay ggevay marked this pull request as ready for review April 23, 2026 11:13
@ggevay ggevay changed the title repeat_row: Fix refine_single_time_operator_selection to bail out for repeat_row repeat_row: Document refine_single_time_operator_selection's assumptions Apr 23, 2026
@ggevay ggevay removed request for a team and mtabebe April 23, 2026 11:36
@ggevay
Copy link
Copy Markdown
Contributor Author

ggevay commented Apr 23, 2026

Bringing this back to draft, because we'd like to think through the exact invariants around monotonic operators more thoroughly, as discussed on zoom.

@ggevay ggevay marked this pull request as draft April 23, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-CLUSTER Topics related to the CLUSTER layer A-optimization Area: query optimization and transformation T-correctness Theme: relates to consistency and correctness of results.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant