Skip to content

Commit fcc9cc4

Browse files
adriangbclaude
andauthored
refactor(physical-expr): add proto ctx expr helpers and adopt in InList/Like (#22513)
## Which issue does this PR close? - Part of #22418 (decentralizing `datafusion-proto` serialization onto the expressions themselves; follows #21835, #22471, #22503). ## Rationale for this change Expressions migrating to the `try_to_proto` / `try_from_proto` hooks keep hand-rolling the same boilerplate that the central `datafusion-proto` match already factors out via `parse_physical_exprs`, `parse_required_physical_expr`, and `serialize_physical_exprs`. Those free functions can't be reused by expression authors: they take `PhysicalProtoConverterExtension` / `PhysicalPlanDecodeContext`, which the `PhysicalExprEncodeCtx` / `PhysicalExprDecodeCtx` surfaces deliberately hide. This was raised in review on #22503 — rather than re-implement the list maps and "missing required field" checks in every migrated expression, expose the same shapes on the ctx structs. ## What changes are included in this PR? - **`datafusion-physical-expr-common`**: three thin convenience methods, built on the existing `encode_child` / `decode` primitives: - `PhysicalExprEncodeCtx::encode_children_expressions` - `PhysicalExprDecodeCtx::decode_required_expression` (also standardizes the `Missing required field "<name>"` error so each expression no longer spells its own) - `PhysicalExprDecodeCtx::decode_children_expressions` - **`datafusion-physical-expr`**: adopt them in `InListExpr` and `LikeExpr`, removing the hand-rolled list maps and per-field `ok_or_else` checks. ### Behavior note `decode_required_expression` couples the presence check with the decode, so `LikeExpr` now decodes children left-to-right rather than validating both required fields up front. The end result is unchanged (a missing required field still errors), but a present sibling is decoded before a later missing field is reported. The `try_from_proto_rejects_missing_pattern` unit test is updated to reflect this. ## Are these changes tested? Yes — covered by existing tests, no new ones needed: - The isolated `proto_tests` modules in `in_list.rs` and `like.rs` already exercise all three helpers (list encode/decode, required decode, and the missing-field + child-error paths) through the migrated `try_to_proto` / `try_from_proto`. - The `datafusion-proto` round-trip integration tests (`roundtrip_inlist`, `roundtrip_like`, `roundtrip_filter_with_not_and_in_list`, `test_tpch_part_in_list_query_with_real_parquet_data`, etc.) continue to pass. - `cargo clippy --all-targets --features proto -D warnings` is clean on the touched crates; `cargo fmt --all` applied. ## Are there any user-facing changes? No. The per-expression `missing required field` error wording is preserved: `LikeExpr` is byte-for-byte identical, and `InList` only changes its label from `InList` to `InListExpr` (the actual type name). The format now lives in one place (`decode_required_expression`) instead of being hand-spelled per expression. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1b8451c commit fcc9cc4

3 files changed

Lines changed: 73 additions & 31 deletions

File tree

datafusion/physical-expr-common/src/physical_expr.rs

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -556,6 +556,24 @@ pub mod proto_encode {
556556
) -> Result<PhysicalExprNode> {
557557
self.encoder.encode(expr)
558558
}
559+
560+
/// Encode a sequence of child expressions, preserving order.
561+
///
562+
/// Convenience wrapper over [`Self::encode_child`] for expressions
563+
/// holding a `repeated` proto field (e.g. the `list` of an `InList`).
564+
/// The first encode error short-circuits.
565+
pub fn encode_children_expressions<'b, I>(
566+
&self,
567+
exprs: I,
568+
) -> Result<Vec<PhysicalExprNode>>
569+
where
570+
I: IntoIterator<Item = &'b Arc<dyn PhysicalExpr>>,
571+
{
572+
exprs
573+
.into_iter()
574+
.map(|expr| self.encode_child(expr))
575+
.collect()
576+
}
559577
}
560578

561579
/// Internal dispatch trait. Implementors live in `datafusion-proto` and
@@ -636,6 +654,43 @@ pub mod proto_decode {
636654
pub fn decode(&self, node: &PhysicalExprNode) -> Result<Arc<dyn PhysicalExpr>> {
637655
self.decoder.decode(node, self.schema)
638656
}
657+
658+
/// Decode a required child node, erroring if it is absent.
659+
///
660+
/// Proto child expressions are encoded as `Option<Box<PhysicalExprNode>>`;
661+
/// pass the field directly (e.g. `node.expr.as_deref()`). `expr_name`
662+
/// is the expression being decoded (e.g. `"InListExpr"`) and `field`
663+
/// the proto field (e.g. `"expr"`); both are woven into the error so
664+
/// it names *where* the missing field is, without each author
665+
/// hand-rolling the string.
666+
pub fn decode_required_expression(
667+
&self,
668+
node: Option<&PhysicalExprNode>,
669+
expr_name: &str,
670+
field: &str,
671+
) -> Result<Arc<dyn PhysicalExpr>> {
672+
let node = node.ok_or_else(|| {
673+
datafusion_common::internal_datafusion_err!(
674+
"{expr_name} is missing required field '{field}'"
675+
)
676+
})?;
677+
self.decode(node)
678+
}
679+
680+
/// Decode a sequence of child nodes, preserving order.
681+
///
682+
/// Convenience wrapper over [`Self::decode`] for expressions holding a
683+
/// `repeated` proto field (e.g. the `list` of an `InList`). The first
684+
/// decode error short-circuits.
685+
pub fn decode_children_expressions<'b, I>(
686+
&self,
687+
nodes: I,
688+
) -> Result<Vec<Arc<dyn PhysicalExpr>>>
689+
where
690+
I: IntoIterator<Item = &'b PhysicalExprNode>,
691+
{
692+
nodes.into_iter().map(|node| self.decode(node)).collect()
693+
}
639694
}
640695

641696
/// Internal dispatch trait. Implementors live in `datafusion-proto`.

datafusion/physical-expr/src/expressions/in_list.rs

Lines changed: 5 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -263,17 +263,9 @@ impl InListExpr {
263263
}
264264
};
265265

266-
let expr = ctx.decode(node.expr.as_deref().ok_or_else(|| {
267-
datafusion_common::DataFusionError::Internal(
268-
"InList is missing required field 'expr'".to_string(),
269-
)
270-
})?)?;
271-
272-
let list = node
273-
.list
274-
.iter()
275-
.map(|e| ctx.decode(e))
276-
.collect::<Result<Vec<_>>>()?;
266+
let expr =
267+
ctx.decode_required_expression(node.expr.as_deref(), "InListExpr", "expr")?;
268+
let list = ctx.decode_children_expressions(&node.list)?;
277269

278270
Ok(Arc::new(InListExpr::try_new(
279271
expr,
@@ -491,11 +483,7 @@ impl PhysicalExpr for InListExpr {
491483
expr_type: Some(protobuf::physical_expr_node::ExprType::InList(Box::new(
492484
protobuf::PhysicalInListNode {
493485
expr: Some(Box::new(ctx.encode_child(&self.expr)?)),
494-
list: self
495-
.list
496-
.iter()
497-
.map(|e| ctx.encode_child(e))
498-
.collect::<Result<Vec<_>>>()?,
486+
list: ctx.encode_children_expressions(&self.list)?,
499487
negated: self.negated,
500488
},
501489
))),
@@ -4007,7 +3995,7 @@ mod proto_tests {
40073995
let err = InListExpr::try_from_proto(&node, &ctx).unwrap_err();
40083996
assert!(matches!(
40093997
err,
4010-
DataFusionError::Internal(msg) if msg.contains("InList is missing required field 'expr'")
3998+
DataFusionError::Internal(msg) if msg.contains("InListExpr is missing required field 'expr'")
40113999
));
40124000
}
40134001

datafusion/physical-expr/src/expressions/like.rs

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -190,22 +190,19 @@ impl LikeExpr {
190190
_ => return internal_err!("PhysicalExprNode is not a LikeExpr"),
191191
};
192192

193-
let expr = like_expr.expr.as_deref().ok_or_else(|| {
194-
datafusion_common::DataFusionError::Internal(
195-
"LikeExpr is missing required field 'expr'".to_string(),
196-
)
197-
})?;
198-
let pattern = like_expr.pattern.as_deref().ok_or_else(|| {
199-
datafusion_common::DataFusionError::Internal(
200-
"LikeExpr is missing required field 'pattern'".to_string(),
201-
)
202-
})?;
203-
204193
Ok(Arc::new(LikeExpr::new(
205194
like_expr.negated,
206195
like_expr.case_insensitive,
207-
ctx.decode(expr)?,
208-
ctx.decode(pattern)?,
196+
ctx.decode_required_expression(
197+
like_expr.expr.as_deref(),
198+
"LikeExpr",
199+
"expr",
200+
)?,
201+
ctx.decode_required_expression(
202+
like_expr.pattern.as_deref(),
203+
"LikeExpr",
204+
"pattern",
205+
)?,
209206
)))
210207
}
211208
}
@@ -491,7 +488,9 @@ mod proto_tests {
491488
fn try_from_proto_rejects_missing_pattern() {
492489
let node = like_node(false, false, Some(Box::new(column_node("a"))), None);
493490
let schema = Schema::empty();
494-
let decoder = UnreachableDecoder;
491+
// `expr` is present, so it is decoded before the missing-`pattern`
492+
// check fires; use a decoder that succeeds for that first child.
493+
let decoder = StubDecoder::ok();
495494
let ctx = PhysicalExprDecodeCtx::new(&schema, &decoder);
496495
let err = LikeExpr::try_from_proto(&node, &ctx).unwrap_err();
497496
assert!(matches!(

0 commit comments

Comments
 (0)