You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TransferQueue is a high-performance data storage and transfer module with panoramic data visibility and streaming scheduling capabilities, optimized for efficient dataflow in post-training workflows.
@@ -33,9 +38,9 @@ TransferQueue offers **fine-grained, sample-level** data management and **load-b
33
38
34
39
35
40
<h2id="updates">🔄 Updates</h2>
36
-
41
+
-**Oct 21, 2025**: Official integration into verl is ready [verl/pulls/3649](https://github.com/volcengine/verl/pull/3649). Following PRs will optimize the single controller architecture by fully decoupling data & control flows.
37
42
-**July 22, 2025**: We present a series of Chinese blogs on <ahref="https://zhuanlan.zhihu.com/p/1930244241625449814">Zhihu 1</a>, <ahref="https://zhuanlan.zhihu.com/p/1933259599953232589">2</a>.
38
-
-**July 21, 2025**: We start an RFC on verl community [RFC#2662](https://github.com/volcengine/verl/discussions/2662).
43
+
-**July 21, 2025**: We started an RFC on verl community [verl/discussions/2662](https://github.com/volcengine/verl/discussions/2662).
39
44
-**July 2, 2025**: We publish the paper [AsyncFlow](https://arxiv.org/abs/2507.01663).
40
45
41
46
@@ -48,15 +53,15 @@ TransferQueue offers **fine-grained, sample-level** data management and **load-b
48
53
49
54
In the control plane, `TransferQueueController` tracks the **production status** and **consumption status** of each training sample as metadata. When all the required data fields are ready (i.e., written to the `TransferQueueStorage`), we know that this data sample can be consumed by downstream tasks.
50
55
51
-
For consumption status, we record the consumption records for each computational task (e.g., `generate_sequences`, `compute_log_prob`, etc.). Therefore, even different computation tasks require the same data field, they can consume the data independently without interfering with each other.
56
+
For consumption status, we record the consumption records for each computational task (e.g., `generate_sequences`, `compute_log_prob`, etc.). Therefore, even when different computation tasks require the same data field, they can consume the data independently without interfering with each other.
> In the future, we plan to support **load-balancing** and **dynamic batching** capabilities in the control plane. Besides, we will support data management for disaggregated frameworks where each rank manages the data retrieval by itself, rather than coordinated by a single controller.
64
+
> In the future, we plan to support **load-balancing** and **dynamic batching** capabilities in the control plane. Additionally, we will support data management for disaggregated frameworks where each rank manages the data retrieval by itself, rather than coordinated by a single controller.
60
65
61
66
### Data Plane: Distributed Data Storage
62
67
@@ -86,13 +91,13 @@ The interaction workflow of TransferQueue system is as follows:
86
91
2.`TransferQueueController` scans the production and consumption metadata for each sample (row), and dynamically assembles a micro-batch metadata according to the load-balancing policy. This mechanism enables sample-level data scheduling.
87
92
3. The process retrieves the actual data from distributed storage units using the metadata provided by the controller.
88
93
89
-
To simplify the usage of TransferQueue, we have encapsulated this process into `AsyncTransferQueueClient` and `TransferQueueClient`. These clients provide both asynchronous and synchronous interfaces for data transfer, allowing users to easily integrate TransferQueue to their framework.
94
+
To simplify the usage of TransferQueue, we have encapsulated this process into `AsyncTransferQueueClient` and `TransferQueueClient`. These clients provide both asynchronous and synchronous interfaces for data transfer, allowing users to easily integrate TransferQueue into their framework.
90
95
91
96
92
97
> In the future, we will provide a `StreamingDataLoader` interface for disaggregated frameworks as discussed in [RFC#2662](https://github.com/volcengine/verl/discussions/2662). Leveraging this abstraction, each rank can automatically get its own data like `DataLoader` in PyTorch. The TransferQueue system will handle the underlying data scheduling and transfer logic caused by different parallelism strategies, significantly simplifying the design of disaggregated frameworks.
93
98
94
99
95
-
<h2id="show-cases">🔥 Show Cases</h2>
100
+
<h2id="show-cases">🔥 Showcases</h2>
96
101
97
102
### General Usage
98
103
@@ -146,7 +151,7 @@ We will soon release the Python package on PyPI.
146
151
147
152
### Build wheel package from source code
148
153
149
-
The building and installation steps are the following:
@@ -167,11 +172,15 @@ The building and installation steps are the following:
167
172
168
173
<h2id="milestones"> 🛣️ RoadMap</h2>
169
174
175
+
-[ ] Support data rewrite for partial rollout & agentic post-training
176
+
-[x] Provide a general storage abstraction layer `TransferQueueStorageManager` to manage distributed storage units, which simplifies `Client` design and makes it possible to introduce different storage backends ([PR66](https://github.com/TransferQueue/TransferQueue/pull/66))
177
+
-[ ] Provide a `KVStorageManager` to cover all the KV based storage backends
178
+
-[ ] Support topic-based data partitioning to maintain train/val/test data simultaneously
170
179
-[ ] Release the first stable version through PyPI
171
180
-[ ] Support disaggregated framework (each rank retrieves its own data without going through a centralized node)
172
181
-[ ] Provide a `StreamingDataLoader` interface for disaggregated framework
173
182
-[ ] Support load-balancing and dynamic batching
174
-
-[ ]Provide a general storage abstraction layer for different backends (e.g., [MoonCakeStore](https://github.com/kvcache-ai/Mooncake))
183
+
-[ ]Support high-performance storage backends for RDMA transmission (e.g., [MoonCakeStore](https://github.com/kvcache-ai/Mooncake), [Ray Direct Transport](https://docs.ray.io/en/master/ray-core/direct-transport.html)...)
175
184
-[ ] High-performance serialization and deserialization
0 commit comments