fix: support generators in is_in (#3195)#3689
Open
LukeTheoJohnson wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3195.
is_insays it accepts any iterable, but in practice only lists were really tested. It turns out lists, tuples, sets, and ranges all work fine everywhere — but passing a generator raises on polars (both eager and lazy) and dask, while quietly working on the others.The reason is that a generator can only be read once. Most backends happen to copy the values into a list internally, so they're fine — but polars and dask receive the raw generator and either reject it or try to read it more than once.
The fix is to spot a one-shot iterator and turn it into a list before handing it off, at the two public entry points (
Expr.is_inandSeries.is_in). The check (iter(x) is x) is only ever true for generators and similar single-use iterators — lists, tuples, sets, ranges, and native Series/arrays are all left exactly as they are today, so nothing that already works changes. It also avoids a subtle case where a generator could get used up partway through on lazy or partitioned backends.Added regression tests covering tuple, set, range, and generator inputs for both
ExprandSeries.One design call worth flagging: the issue suggested doing
list(other)per-backend (like duckdb already does). I went with one shared check at the public layer instead — partly because polars is a passthrough with no backend file to patch, and partly to avoid repeating the same line across backends. Happy to switch to the per-backend approach if you'd prefer.