Skip to content

Commit b155bf3

Browse files
authored
[chore] Update README and bump version to 0.1.6 (Ascend#67)
As title --------- Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
1 parent f0047b9 commit b155bf3

4 files changed

Lines changed: 69 additions & 26 deletions

File tree

README.md

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ Currently, we support the following storage backends:
7676

7777
- SimpleStorage: A basic CPU memory storage with minimal data format constraints and easy usability.
7878
- [Yuanrong](https://gitee.com/openeuler/yuanrong-datasystem) (beta, [#PR107](https://github.com/TransferQueue/TransferQueue/pull/107), [#PR96](https://github.com/TransferQueue/TransferQueue/pull/96)): An Ascend native data system that provides hierarchical storage interfaces including HBM/DRAM/SSD.
79-
- [MooncakeStore](https://github.com/kvcache-ai/Mooncake) (alpha, [#PR162](https://github.com/TransferQueue/TransferQueue/pull/162)): A high-performance, KV-based hierarchical storage that supports RDMA transport between GPU and DRAM.
79+
- [MooncakeStore](https://github.com/kvcache-ai/Mooncake) (beta, [#PR162](https://github.com/TransferQueue/TransferQueue/pull/162)): A high-performance, KV-based hierarchical storage that supports RDMA transport between GPU and DRAM.
8080
- [RayRDT](https://docs.ray.io/en/master/ray-core/direct-transport.html) (alpha, [#PR167](https://github.com/TransferQueue/TransferQueue/pull/167)): Ray's new feature that allows Ray to store and pass objects directly between Ray actors.
8181

8282
Among them, `SimpleStorageUnit` serves as our default storage backend, coordinated by the `AsyncSimpleStorageManager` class. Each storage unit can be deployed on a separate node, allowing for distributed data management.
@@ -121,6 +121,8 @@ To simplify the usage of TransferQueue, we have provided a Redis-style high-leve
121121
- **Metadata Tags**: Lightweight metadata for status tracking
122122
- **Pluggable Backends**: Supports multiple backends
123123

124+
Refer to [tutorials/basic.ipynb](https://github.com/Ascend/TransferQueue/blob/main/tutorial/basic.ipynb) and [tutorials/02_kv_interface.py](https://github.com/Ascend/TransferQueue/blob/main/tutorial/02_kv_interface.py) for detailed usage examples.
125+
124126
#### StreamingDataLoader API
125127

126128
Designed as a drop-in replacement for the standard PyTorch `DataLoader`, this API allows each rank to automatically consume data without single-controller intervention.
@@ -147,17 +149,12 @@ Developers can leverage `TransferQueueClient` directly to implement advanced fea
147149
#### verl
148150
The primary motivation for integrating TransferQueue to verl now is to **alleviate the data transfer bottleneck of the single controller `RayPPOTrainer`**. Currently, all `DataProto` objects must be routed through `RayPPOTrainer`, resulting in a single point bottleneck of the whole post-training system.
149151

150-
![verl_dataflow_DataProto](https://github.com/TransferQueue/community_doc/blob/main/docs/verl_workflow.jpeg?raw=true)
151-
152-
Leveraging TransferQueue, we separate experience data transfer from metadata dispatch by
153-
154-
- Replacing `DataProto` with `BatchMeta` (metadata) and `TensorDict` (actual data) structures
155-
- Preserving verl's original Dispatch/Collect logic via BatchMeta (maintaining single-controller debuggability)
156-
- Accelerating data transfer by TransferQueue's distributed storage units
152+
<p align="center">
153+
<img src="https://raw.githubusercontent.com/wuxibin89/verl/refs/heads/wuxibin/doc_images/docs/_static/transfer_queue.png" width="100%">
154+
</p>
157155

158-
![verl_dataflow_TransferQueue](https://github.com/TransferQueue/community_doc/blob/main/docs/verl_workflow_with_tq.jpeg?raw=true)
156+
Official integration to verl is available at [verl/pulls/5401](https://github.com/verl-project/verl/pull/5401), with design doc at [[RFC] PPOTrainer with TransferQueue Integration](https://github.com/verl-project/verl/issues/5400). You may also refer to our [recipe](https://github.com/Ascend/TransferQueue/blob/main/recipe/simple_use_case/single_controller_demo.py), where we mimic the verl usage in a high-level manner.
159157

160-
You may refer to the [recipe](https://github.com/Ascend/TransferQueue/tree/dev/recipe/simple_use_case), where we mimic the verl usage in both async & sync scenarios. Official integration to verl is also available now at [verl/pulls/3649](https://github.com/volcengine/verl/pull/3649) (with subsequent PRs to further optimize the integration).
161158

162159
### Disaggregated Example
163160

@@ -216,11 +213,11 @@ pip install TransferQueue
216213
<img src="https://github.com/TransferQueue/community_doc/blob/main/docs/performance_0.1.1.dev2.png?raw=true" width="100%">
217214
</p>
218215

219-
> Note: The above benchmark for TransferQueue is based on our naive `SimpleStorage` backend. By introducing high-performance storage backends and optimizing serialization/deserialization, we expect to achieve even better performance. Warmly welcome contributions from the community!
216+
> Note: Optimization for MooncakeStore and other backends are still in process. Warmly welcome contributions from the community!
220217
221-
For detailed performance benchmarks, please refer to [this blog](https://www.yuque.com/haomingzi-lfse7/hlx5g0/tml8ke0zkgn6roey?singleDoc#).
218+
For detailed performance benchmarks, please refer to [this blog](https://www.yuque.com/haomingzi-lfse7/lhp4el/tml8ke0zkgn6roey?singleDoc#).
222219

223-
We also provide a [stress test report](https://www.yuque.com/haomingzi-lfse7/hlx5g0/ydbwgo5k2umaag78?singleDoc#) that demonstrates **768 concurrent clients writing 1.4 TB of data** into TransferQueue across 4 nodes. The system remains stable without any crashes or data loss, achieving 80% bandwidth.
220+
We also provide a [stress test report](https://www.yuque.com/haomingzi-lfse7/lhp4el/mt0vedqy7c337pgg?singleDoc#) that demonstrates more than **8192 concurrent clients writing 2 TB of data** into TransferQueue across 4 nodes. The system remains stable without any crashes or data loss.
224221

225222
<h2 id="customize"> 🛠️ Customize TransferQueue</h2>
226223

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ yuanrong = [
118118
"openyuanrong-datasystem"
119119
]
120120
mooncake = [
121-
"mooncake-transfer-engine"
121+
"mooncake-transfer-engine==0.3.10.post1"
122122
]
123123

124124
# If you need to mimic `package_dir={'': '.'}`:

scripts/performance_test/README_PERFTEST.md

Lines changed: 57 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -42,19 +42,63 @@ python perftest.py \
4242
| `--head_node_ip` | Head node IP address | - | Yes |
4343
| `--worker_node_ip` | Worker node IP address (required for Yuanrong) | None | No |
4444
| `--output_csv` | Path to output CSV file | None | No |
45+
| `--use_complex_case` | Use complex test case with nested tensors and NonTensorStack fields | False | No |
4546

4647
## Backend Configuration
4748

4849
The script reads the backend configuration directly from the provided `--backend_config` YAML file. The backend type is determined by `backend.storage_backend` in the config file. When `--backend` is specified, it overrides the value in the config.
4950

50-
For device support of each backend:
51-
- `SimpleStorage`: `cpu`
52-
- `Yuanrong`: `cpu`, `npu`
53-
- `MooncakeStore`: `cpu`, `gpu`
51+
### SimpleStorage Configuration
5452

55-
## Test Data Format
53+
```yaml
54+
backend:
55+
storage_backend: SimpleStorage
56+
SimpleStorage:
57+
total_storage_size: 100000
58+
num_data_storage_units: 16
59+
```
60+
61+
### Yuanrong Configuration
62+
63+
```yaml
64+
backend:
65+
storage_backend: Yuanrong
66+
Yuanrong:
67+
port: 31501
68+
enable_yr_npu_transport: true
69+
```
70+
71+
For Yuanrong backend, writer runs on the head node and reader runs on the worker node. `--worker_node_ip` is required.
72+
73+
### MooncakeStore Configuration
74+
75+
```yaml
76+
backend:
77+
storage_backend: MooncakeStore
78+
MooncakeStore:
79+
auto_init: true
80+
metadata_server: localhost:50050
81+
master_server_address: localhost:50051
82+
local_hostname: ""
83+
protocol: rdma
84+
global_segment_size: 86294967296
85+
local_buffer_size: 86294967296
86+
device_name: ""
87+
```
88+
89+
## Test Scenarios
90+
91+
### Simple Test Case (Default)
92+
93+
When `--use_complex_case` is **not** specified (default), the test creates a `TensorDict` with only regular tensors:
5694

57-
The test case creates a `TensorDict` with three types of fields to simulate real training batches:
95+
- **Regular tensors**: Shape `(batch_size, seq_length)`, float32.
96+
97+
Each regular tensor field size = `batch_size × seq_length × 4` bytes.
98+
99+
### Complex Test Case
100+
101+
When `--use_complex_case` is specified, the test creates a `TensorDict` with three types of fields to simulate real training batches:
58102

59103
1. **Regular tensors**: Shape `(batch_size, seq_length)`, float32.
60104
2. **Nested tensors** (non-NPU devices): Variable-length ragged sequences with lengths forming an arithmetic progression from 1 to `seq_length`. Average length ≈ `seq_length / 2`, so each nested field is roughly half the size of a regular field.
@@ -73,10 +117,6 @@ Each iteration performs a PUT → LIST → GET → DELETE cycle via TransferQueu
73117

74118
The test runs `--num_test_iterations` iterations. Data creation only happens in the first iteration; subsequent iterations reuse the same TensorDict to isolate transfer overhead.
75119

76-
## Yuanrong Backend
77-
78-
For Yuanrong backend, writer runs on the head node and reader runs on the worker node. `--worker_node_ip` is required.
79-
80120
## Running Full Test Suite
81121

82122
The `run_perf_test.sh` script automates the full test suite across all backends and data sizes, then generates a comparison chart:
@@ -130,12 +170,18 @@ After running the tests, `draw_figure.py` reads all CSV files from `results/` an
130170

131171
## Examples
132172

133-
### SimpleStorage backend
173+
### SimpleStorage backend (simple case)
134174
```bash
135175
python perftest.py --backend_config=perftest_config.yaml --backend=SimpleStorage \
136176
--head_node_ip=192.168.0.1
137177
```
138178

179+
### SimpleStorage backend (complex case)
180+
```bash
181+
python perftest.py --backend_config=perftest_config.yaml --backend=SimpleStorage \
182+
--head_node_ip=192.168.0.1 --use_complex_case
183+
```
184+
139185
### Yuanrong backend (inter-node)
140186
```bash
141187
python perftest.py --backend_config=perftest_config.yaml --backend=Yuanrong \

transfer_queue/version/version

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.1.6.dev0
1+
0.1.6

0 commit comments

Comments
 (0)