You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[StreamingDataLoader, 4/N] feat: Introduce sample pre-allocation for dynamic streaming (#16)
## Background
In PR #9, we introduced initial support for the `StreamingDataLoader`
interface. Currently, the system assumes prompts are pre-loaded into the
TransferQueue. However, a critical use case involves generation workers
put both prompts and responses into `TransferQueue` on the run (e.g.,
`rollout_buffer` mechanism in
[Slime](https://github.com/THUDM/slime/blob/main/slime_plugins/rollout_buffer/README.md)).
Since TransferQueue supports dynamic expansion, if the producer has not
yet pushed any data to the TransferQueue, the TransferQueue appears
empty. Consequently, the consumer's `check_consumption_status` API
incorrectly assumes no data is available and prematurely terminates the
data retrieval iteration.
## Solution
This PR introduces a new environment variable,
`TQ_PRE_ALLOC_SAMPLE_NUM`, to handle sample pre-allocation in
TransferQueue.
- **Mechanism**: When set (typically to `global_batch_size`), the
controller pre-allocates a fixed number of global indexes before data
production begins.
- **Effect**: The `check_consumption_status` API now accounts for these
pre-allocated slots. This ensures the `StreamingDataLoader` waits for
the pending data instead of exiting immediately when the TransferQueue
is temporarily empty.
## Other Changes
Deprecate `TQ_INIT_SAMPLE_NUM`, `TQ_INIT_FIELD_NUM`,
`TQ_SAMPLE_MIN_EXPANSION_SIZE` and `TQ_SAMPLE_MIN_EXPANSION_SIZE` for
simplicity.
---
CC: @NINGBENZHE
---------
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
0 commit comments