Migrate UnKnownColumn proto hooks#22464
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Moves protobuf (de)serialization for UnKnownColumn out of the generic physical-plan proto glue and into the expression type itself (proto feature-gated), aligning with the “each expr owns its own proto” direction.
Changes:
- Remove
UnKnownColumnspecial-casing fromserialize_physical_expr_with_converterinto_proto.rs. - Route
UnknownColumndecoding throughUnKnownColumn::try_from_protoinfrom_proto.rs. - Implement
try_to_proto/try_from_protoforUnKnownColumnbehind theprotofeature.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| datafusion/proto/src/physical_plan/to_proto.rs | Removes inlined UnKnownColumn serialization path. |
| datafusion/proto/src/physical_plan/from_proto.rs | Switches decoding to UnKnownColumn::try_from_proto. |
| datafusion/physical-expr/src/expressions/unknown_column.rs | Adds proto encode/decode implementations for UnKnownColumn. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Ok(Some(protobuf::PhysicalExprNode { | ||
| expr_id: None, | ||
| expr_type: Some(protobuf::physical_expr_node::ExprType::UnknownColumn( | ||
| protobuf::UnknownColumn { | ||
| name: self.name.clone(), | ||
| }, | ||
| )), | ||
| })) |
There was a problem hiding this comment.
while the previous inlined serialization path in to_proto.rs populated expr_id from the serializer’s expr_id
I believe this variable was an option and was None since only dynamic filters set expr_id
| let protobuf::UnknownColumn { name } = match &node.expr_type { | ||
| Some(protobuf::physical_expr_node::ExprType::UnknownColumn(c)) => c, | ||
| _ => return internal_err!("PhysicalExprNode is not an UnKnownColumn"), | ||
| }; |
|
Hi @koopatroopa787 Thank you for picking this up. Can you please update the PR body with more details. Use the pr template for this. |
Hii @kumarUjjawal, updated based on the PR template, please have a look and let me know if any changes needed. |
| use datafusion_common::{Result, internal_err}; | ||
|
|
||
| #[cfg(feature = "proto")] | ||
| use datafusion_proto_models::protobuf; |
There was a problem hiding this comment.
nit: you can move this use statement into the try_to_proto and try_from_proto methods
c6b0b8b to
bfa248a
Compare
|
@jayshrivastava Thanks — I updated the decode path to include the actual xpr_type and xpr_id in the error, so mismatched proto nodes are easier to diagnose. I also moved the proto import into the ry_to_proto / ry_from_proto methods as suggested, and rebased the branch onto the latest main. |
|
@koopatroopa787 we should add a |
|
Moved the #[cfg(test)] mod tests to the end of unknown_column.rs so impl Hash/PartialEq come first (resolves clippy 'items-after-test-module'). Ran cargo clippy locally and the lint is fixed. Pushed the change to fix/unknown-column-proto-hooks. Next I'll add the requested datafusion-proto roundtrip test and re-run CI; let me know if you'd prefer using #[allow(clippy::items_after_test_module)] instead. |
|
Added a proto roundtrip unit test for UnKnownColumn (unknown_column_proto_roundtrip) and a small stabilization fix for a flaky statistics test (compareBernoulli p with tolerance). Ran cargo test and cargo clippy locally; all checks for datafusion-physical-expr pass. Pushed updates to fix/unknown-column-proto-hooks. |
|
@kumarUjjawal Addressed. I added unknown_column_proto_roundtrip to verify deserialize behavior for UnKnownColumn via the new proto hook path, and pushed it on fix/unknown-column-proto-hooks (commit 9e6fce9). I also fixed the clippy failure (items-after-test-module) by moving the test module after impls. Local cargo clippy -p datafusion-physical-expr -- -D warnings and cargo test -p datafusion-physical-expr --lib --features proto pass. |
|
All CI checks are passing and the PR has been approved by @jayshrivastava. Could a committer please take a look when you get a chance? Happy to make any changes needed. |
|
I think the "roundtrip" test doesn't round-trip. It builds a proto node by hand and only calls try_from_proto. It never calls try_to_proto and never goes through the proto crate's central serialize/parse path? |
| use std::sync::Arc; | ||
|
|
||
| #[cfg(feature = "proto")] | ||
| use arrow::datatypes::Schema; | ||
|
|
||
| #[cfg(feature = "proto")] | ||
| use datafusion_common::{Result, internal_err}; |
There was a problem hiding this comment.
do we need these re-imports since we are already using super?
There was a problem hiding this comment.
these explicit imports are still needed. use super::* only brings in items defined in the parent module (like UnKnownColumn itself) — it does NOT re-export items that are only privately used in the parent scope (like Arc, Schema, Result, internal_err). You can see the same pattern in binary.rs's test module, which also uses use super::* but still explicitly imports use arrow::array::BooleanArray and others. Removing these would cause compile errors when the proto feature is enabled.
There was a problem hiding this comment.
datafusion/datafusion/physical-expr/src/expressions/like.rs declares its own module for proto tests. We could follow that pattern
#[cfg(all(test, feature = "proto"))]
mod proto_tests {
}
| Distribution::new_bernoulli(ScalarValue::from(15.0 / 16.0))? | ||
| ); | ||
| let got = neq.evaluate_statistics(&[left_stat, right_stat])?; | ||
| let expected = Distribution::new_bernoulli(ScalarValue::from(15.0 / 16.0))?; |
There was a problem hiding this comment.
What is this test change?
There was a problem hiding this comment.
It was a floating-point stabilization fix I added while debugging a local CI run, but it doesn't belong here. I've reverted it in the latest commit and restored the original assert_eq!. The PR now only contains the UnKnownColumn proto hook migration as intended.
6272761 to
0ca8aa3
Compare
|
|
||
| let decoded = UnKnownColumn::try_from_proto(&node, &ctx).unwrap(); | ||
| let col = decoded.as_ref().downcast_ref::<UnKnownColumn>().unwrap(); | ||
| assert_eq!(col.name(), name.as_str()); |
There was a problem hiding this comment.
I think what Kumar means by roundtrip is something like
use crate::proto_test_util::{
StubDecoder, StubEncoder
};
#[test]
fn unknown_column_proto_roundtrip() {
let expr = UnKnownColumn::new("col_a");
let encoder = StubEncoder::ok();
let ctx = PhysicalExprEncodeCtx::new(&encoder);
let encoded = expr.try_to_proto(&ctx).unwrap().unwrap();
let decode_ctx = PhysicalExprDecodeCtx::new(&Schema::empty(), &UnreachableDecoder {});
let decoded = UnKnownColumn::try_from_proto(&encoded, &decode_ctx).unwrap();
let roundtripped_expr = decoded.as_ref().downcast_ref::<UnKnownColumn>().unwrap();
assert_eq!(&expr, roundtripped_expr);
}I approved the PR bc I figured there would already be a roundtrip test in datafusion/datafusion/proto/tests/cases/roundtrip_physical_plan.rs but that doesn't seem to be the case.
I agree with Kumar that we should add some here.
| )), | ||
| }; | ||
| let schema = Schema::empty(); | ||
| let decoder = DummyDecode; |
There was a problem hiding this comment.
The UnreachableDecoder is a good utility to use here. https://github.com/apache/datafusion/blob/4a41173ba3df9b5d47638599c819a1e6e46ad92b/datafusion/physical-expr/src/expressions/like.rs?plain=1#L465
|
Alright so after doing another review I realized the issue with roundtrip. It only called try_from_proto on a hand-built node, never going through try_to_proto. Rewrote the whole test module to follow the like.rs pattern: try_to_proto_encodes_unknown_column — calls try_to_proto, destructures the result, checks the name is encoded correctly and expr_id is None On the imports: use super::* only brings in items defined in the parent module, not private use aliases like Arc, Schema, Result, internal_err. The new proto_tests module imports everything explicitly now anyway, so it's consistent with like.rs regardless. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Kanishk Sachan <koopatroopa787>
…ems-after-test-module lint Closes: address clippy failure in CI Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…equality in binary statistics test Closes: address requested roundtrip test and stabilize failing test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
23dcf74 to
127d075
Compare
Which issue does this PR close?
UnKnownColumnto usetry_to_proto/try_from_proto#22420.Rationale for this change
This issue is part of the migration to the new proto hooks ( ry_to_proto / ry_from_proto). UnKnownColumn was still handled in the legacy match arms, so this ports it to the hook-based path for consistency with other migrated expressions (e.g. Column, BinaryExpr).
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
No user-facing changes.