Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,38 @@ impl PartialEq for HashTableLookupExpr {

impl Eq for HashTableLookupExpr {}

impl HashTableLookupExpr {
/// Serialize this expression to protobuf.
///
/// `HashTableLookupExpr` holds an `Arc<Map>` (a runtime hash table built
/// on the build side) which cannot be serialized. We replace it with
/// `lit(true)`, which is safe because:
///
/// - The filter is a performance optimisation, not a correctness requirement.
/// - `lit(true)` passes all rows so no valid rows are lost.
/// - In distributed execution the remote worker has no access to the
/// build-side hash table anyway.
Copy link
Copy Markdown
Contributor

@jayshrivastava jayshrivastava May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment comes from roundtrip_hash_table_lookup_expr_to_lit() in /datafusion/datafusion/proto/tests/cases/roundtrip_physical_plan.rs but I don't think it's very true.

IIUC, hash joins might build a HashTableLookupExpr during execution after the build side is done. These expressions get placed in the dynamic filter expr.

If you serialize before executing the plan, then there's no code path where there would be a HashTableLookupExpr in the plan today. If you deserialize and execute that plan, then the HashJoinExec may create a fresh HashTableLookupExpr for the dynamic filter. In this case, all the row pruning is preserved.

If you serialize after executing, then any potential HashTableLookupExpr would be replaced with lit(true). I don't think this has any impact. One must call reset_state to re-execute plans, in which case I would expect the HashTableLookupExpr to disappear. In this case, all the row pruning is preserved as well.

Maybe we can change the comment to explain these two cases? ^

Since HashTableLookupExpr is public it might be good to warn users that it does not get serialized. We could (a) file a ticket to track that HashTableLookupExpr are not serialized and (b) add a comment directly on HashTableLookupExpr.

pub fn try_to_proto(
Copy link
Copy Markdown
Contributor

@jayshrivastava jayshrivastava May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this must be moved to the impl PhysicalExpr for HashTableLookupExpr block to actually be used.

The way to test this code is being used is to delete the old paths in datafusion/datafusion/proto/src/physical_plan/to_proto.rs and datafusion/datafusion/proto/src/physical_plan/from_proto.rs.

Since this is a small PR, you may want to considering adding try_from_proto as well :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on Jay's comment, I think this needs to live in impl PhysicalExpr for HashTableLookupExpr rather than as an inherent HashTableLookupExpr::try_to_proto method.

The serializer calls expr.try_to_proto(&ctx) through &dyn PhysicalExpr, so this implementation is not reached. Serialization still depends on the centralized downcast fallback in datafusion/proto/src/physical_plan/to_proto.rs.

Could you move this into the PhysicalExpr impl with the same #[cfg(feature = "proto")] signature? That should make the expression-local serialization path actually take effect.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also seeing a separate build issue here. This new signature references physical_expr::proto_encode, datafusion_proto_common, and datafusion_proto_models unconditionally, but datafusion-physical-plan does not expose those dependencies in a normal build.

cargo check -p datafusion-physical-plan currently fails with unresolved imports for proto_encode, datafusion_proto_common, and datafusion_proto_models.

Could you gate this consistently with the existing PhysicalExpr::try_to_proto API and wire the dependencies through the appropriate proto feature?

&self,
_ctx: &datafusion_physical_expr_common::physical_expr::proto_encode::PhysicalExprEncodeCtx<'_>,
) -> datafusion_common::Result<
Option<datafusion_proto_models::protobuf::PhysicalExprNode>,
> {
use datafusion_proto_common::ScalarValue;
use datafusion_proto_common::scalar_value::Value;
use datafusion_proto_models::protobuf;
use datafusion_proto_models::protobuf::physical_expr_node::ExprType;

let value = ScalarValue {
value: Some(Value::BoolValue(true)),
};
Ok(Some(protobuf::PhysicalExprNode {
expr_id: None,
expr_type: Some(ExprType::Literal(value)),
}))
}
}

impl Display for HashTableLookupExpr {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", self.description)
Expand Down