You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lazily set the producer head at execution time (#478)
This is one PR from the following stack of PRs:
- #477
- #463
- #464
- #478
<- you are here
- #479
- #486
- #432
---
Introduces the `ProducerHead` type:
```rust
pub enum ProducerHead {
/// No specific head node is necessary.
None,
/// The head node should be a [BroadcastExec].
BroadcastExec { output_partitions: usize },
/// The head node should be a [RepartitionExec].
RepartitionExec { partitioning: Partitioning },
}
```
Which is passed over the network while remotely executing tasks in order
to set the appropriate node at the head of a stage.
Today, this is a noop because the right head node in stages is ensured
statically at planning time, but in follow up PRs, network boundaries
can get swamped and reorganized arbitrarily.
One example that happens in AQE:
1. A JOIN is planned as a CollectLeft
```js
HashJoinExec: mode=CollectLeft
CoalescePartitionsExec:
[Stage 1] => NetworkBroadcastExec
BroadcastExec
DistributedLeafExec: unknown size
DistributedLeafExec: unknown size
```
2. While collecting runtime statistics, it happens that `Stage 1` is
huge, and during AQE the JOINs are swapped
```js
HashJoinExec: mode=CollectLeft
DistributedLeafExec: small size
CoalescePartitionsExec:
[Stage 1] => NetworkBroadcastExec
BroadcastExec
DistributedLeafExec: big size
```
3. The `Stage 1` is now on the probe side, so it needs to be rewritten
to a `NetworkShuffleExec`, otherwise duplicate data will be returned:
```js
HashJoinExec: mode=CollectLeft
DistributedLeafExec: small size
CoalescePartitionsExec:
[Stage 1] => NetworkShuffleExec
RepartitionExec // <- dynamically swapped at runtime based on the passed `ProducerHead`
DistributedLeafExec: big size
```
Passing a `ProducerHead` at execution time unlocks two things:
1. dynamically set the fanout width accounting for a dynamically scaled
upper stage
2. dynamically set the correct operator `BroadcastExec` or
`RepartitionExec` based on the network boundary above, which might have
changed because of AQE
0 commit comments