feat(PartitionedOutput): Add outputChannels support by xin-zhang2 · Pull Request #1972 · IBM/velox

xin-zhang2 · 2026-04-28T18:34:14Z

Add outputChannels support in OptimizedPartitionedOutput.

yingsu00 · 2026-05-11T23:55:06Z

      });
 }

+RowVectorPtr OptimizedPartitionedOutput::prepareOutput(


This is preparing input, not output. Rename to prepareInput

Renamed to prepareSerializerInput

yingsu00 · 2026-05-11T23:55:56Z

+    return input;
+  }
+
+  std::vector<VectorPtr> outputColumns;


outputColumns -> reorderedInputColumns

Renamed to serializerInputColumns as it is passed to the serializer append() and it only contains the unique columns from output.

yingsu00 · 2026-05-12T00:26:58Z

    PartitionBuildContext& ctx) {
  auto* rowVector = vector_->as<RowVector>();
  partitionedChildren_.reserve(rowVector->childrenSize());
+  std::unordered_map<const BaseVector*, PartitionedVectorPtr>


Actually, I think the input-output mapping shall be done in PrestoIterativePartitioningSerializer::flushRowChildren(), not in PartitionedVector level. The PartitionedVector is NOT supposed to handle or know the remapping business which should happen in upper levels. Also, the change made here is hard to understand.

yingsu00 · 2026-05-12T00:28:59Z

      });
 }

+RowVectorPtr OptimizedPartitionedOutput::prepareOutput(


Actually, let's not do the mapping at AddInput time, but at flush time. The place it should happen is PrestoIterativePartitioningSerializer::flushRowChildren().

xin-zhang2 · 2026-05-15T14:28:56Z

@yingsu00 I've moved the mapping to flush time. Now the serailizer includes a member outputToInputChannels_ for this mapping, and the input passed to append() is prepared in OptimizedPartitionedOutput to include only the unique columns from the output.
Could you please take a look? Thanks.

yingsu00 · 2026-05-16T01:14:09Z


 namespace facebook::velox::exec {

+void OptimizedPartitionedOutput::initializeSerializerLayout() {


Why is the initialization of Serializer done in the operator? Move the mapping to the Serializer constructor

yingsu00 · 2026-05-16T01:25:32Z

      });
 }

+RowVectorPtr OptimizedPartitionedOutput::prepareSerializerInput(


Why do we still need this? The remapping shall be done at flush time but you are still mapping them at addInput time. You only need to set up the map once when the Serializer is created, then flushRowChildren shall flush the serialized columns out in the new order.

Suppose the outputType duplicates some input columns, constructing the input RowVector would make the Serializer do repeating job. Your previous change in PartitionedVector removed that repeating work, but it was at a wrong place and wrong level. The whole re-construct input vector thing is NOT supposed to happen at all.

We could set up the mapping in the serializer constructor and pass the input vector directly to the serializer append(), as you suggested, and then apply the column mapping during the flush time.

My concern is that this would require all input columns to be partitioned before flush. While outputType may duplicate or reorder input columns, it may also prune some. Re-constructing the input vector to include only the unique columns referenced by the output columns could help us avoid partitioning unnecessary input columns. Does this seem reasonable to you?

xin-zhang2 requested a review from yingsu00 April 28, 2026 18:34

xin-zhang2 added the OptimizedPartitioning label Apr 28, 2026

xin-zhang2 force-pushed the PartitionedOutput-output branch 3 times, most recently from a884aa0 to 960998f Compare April 28, 2026 22:09

xin-zhang2 force-pushed the PartitionedOutput-output branch 2 times, most recently from e1eb062 to e335c23 Compare May 7, 2026 15:20

yingsu00 assigned xin-zhang2 May 10, 2026

yingsu00 reviewed May 12, 2026

View reviewed changes

feat(PartitionedOutput): Add outputChannels support

f66413c

xin-zhang2 force-pushed the PartitionedOutput-output branch from e335c23 to f66413c Compare May 15, 2026 14:18

xin-zhang2 requested a review from yingsu00 May 15, 2026 14:32

yingsu00 requested changes May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(PartitionedOutput): Add outputChannels support#1972

feat(PartitionedOutput): Add outputChannels support#1972
xin-zhang2 wants to merge 1 commit into
IBM:optimized_partitionedoutputfrom
xin-zhang2:PartitionedOutput-output

xin-zhang2 commented Apr 28, 2026 •

edited

Loading

Uh oh!

yingsu00 May 11, 2026

Uh oh!

xin-zhang2 May 15, 2026

Uh oh!

yingsu00 May 11, 2026

Uh oh!

xin-zhang2 May 15, 2026

Uh oh!

yingsu00 May 12, 2026

Uh oh!

yingsu00 May 12, 2026

Uh oh!

xin-zhang2 commented May 15, 2026

Uh oh!

yingsu00 May 16, 2026

Uh oh!

yingsu00 May 16, 2026

Uh oh!

xin-zhang2 May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		namespace facebook::velox::exec {

		void OptimizedPartitionedOutput::initializeSerializerLayout() {

Conversation

xin-zhang2 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xin-zhang2 commented May 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xin-zhang2 commented Apr 28, 2026 •

edited

Loading