-
Notifications
You must be signed in to change notification settings - Fork 2.1k
refactor: add try_to_proto to HashTableLookupExpr
#22451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
c5c78b2
67bebc0
d24abe8
74f61fe
63d14ff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -289,6 +289,38 @@ impl PartialEq for HashTableLookupExpr { | |
|
|
||
| impl Eq for HashTableLookupExpr {} | ||
|
|
||
| impl HashTableLookupExpr { | ||
| /// Serialize this expression to protobuf. | ||
| /// | ||
| /// `HashTableLookupExpr` holds an `Arc<Map>` (a runtime hash table built | ||
| /// on the build side) which cannot be serialized. We replace it with | ||
| /// `lit(true)`, which is safe because: | ||
| /// | ||
| /// - The filter is a performance optimisation, not a correctness requirement. | ||
| /// - `lit(true)` passes all rows so no valid rows are lost. | ||
| /// - In distributed execution the remote worker has no access to the | ||
| /// build-side hash table anyway. | ||
| pub fn try_to_proto( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this must be moved to the The way to test this code is being used is to delete the old paths in Since this is a small PR, you may want to considering adding
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Building on Jay's comment, I think this needs to live in The serializer calls Could you move this into the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am also seeing a separate build issue here. This new signature references
Could you gate this consistently with the existing |
||
| &self, | ||
| _ctx: &datafusion_physical_expr_common::physical_expr::proto_encode::PhysicalExprEncodeCtx<'_>, | ||
| ) -> datafusion_common::Result< | ||
| Option<datafusion_proto_models::protobuf::PhysicalExprNode>, | ||
| > { | ||
| use datafusion_proto_common::ScalarValue; | ||
| use datafusion_proto_common::scalar_value::Value; | ||
| use datafusion_proto_models::protobuf; | ||
| use datafusion_proto_models::protobuf::physical_expr_node::ExprType; | ||
|
|
||
| let value = ScalarValue { | ||
| value: Some(Value::BoolValue(true)), | ||
| }; | ||
| Ok(Some(protobuf::PhysicalExprNode { | ||
| expr_id: None, | ||
| expr_type: Some(ExprType::Literal(value)), | ||
| })) | ||
| } | ||
| } | ||
|
|
||
| impl Display for HashTableLookupExpr { | ||
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| write!(f, "{}", self.description) | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment comes from
roundtrip_hash_table_lookup_expr_to_lit()in/datafusion/datafusion/proto/tests/cases/roundtrip_physical_plan.rsbut I don't think it's very true.IIUC, hash joins might build a
HashTableLookupExprduring execution after the build side is done. These expressions get placed in the dynamic filter expr.If you serialize before executing the plan, then there's no code path where there would be a
HashTableLookupExprin the plan today. If you deserialize and execute that plan, then theHashJoinExecmay create a freshHashTableLookupExprfor the dynamic filter. In this case, all the row pruning is preserved.If you serialize after executing, then any potential
HashTableLookupExprwould be replaced with lit(true). I don't think this has any impact. One must callreset_stateto re-execute plans, in which case I would expect theHashTableLookupExprto disappear. In this case, all the row pruning is preserved as well.Maybe we can change the comment to explain these two cases? ^
Since
HashTableLookupExpris public it might be good to warn users that it does not get serialized. We could (a) file a ticket to track thatHashTableLookupExprare not serialized and (b) add a comment directly onHashTableLookupExpr.