You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenYuanrong-datasystem relies on etcd for cluster coordination.
64
-
Download and install etcd from the official releases: [ETCD GitHub Releases](https://github.com/etcd-io/etcd/releases)
65
-
66
-
```bash
67
-
# Example for Linux ARM64 (adjust for your architecture)
68
-
# Unpack and install etcd
69
-
ETCD_VERSION = "v3.6.5"# Replace with the desired version
70
-
tar -xvf etcd-${ETCD_VERSION}-linux-arm64.tar.gz
71
-
cd etcd-${ETCD_VERSION}-linux-arm64
72
-
73
-
# Copy the executable file to the system path
74
-
sudo cp etcd etcdctl /usr/local/bin/
75
-
76
-
# Verify installation
77
-
etcd --version
78
-
etcdctl version
79
-
```
80
-
81
-
#### 4. (Optional) Install CANN and torch-npu
61
+
#### 3. (Optional) Install CANN and torch-npu
82
62
83
63
If you have NPU devices and want to accelerate the transmission of NPU tensor,
84
64
you can install **Ascend-cann-toolkit** and **torch-npu**.
@@ -106,19 +86,36 @@ pip install torch-npu==2.8.0
106
86
Next, we will provide deployment and code examples for single-node scenarios.
107
87
For multi-node scenarios, please refer to [Appendix B](#B-deploy-multi-node-datasystem-for-multi-node-training-and-inference-scenarios).
108
88
109
-
Unlike using TransferQueue with its default backend, integrating OpenYuanrong-Datasystem requires **pre-launching** the datasystem services before running your Python application.
TransferQueue automatically initializes Yuanrong datasystem workers across all Ray cluster nodes. Just set `auto_init: True` in the configuration and TransferQueue will handle the multi-node deployment.
270
247
248
+
Let's take two nodes (for instance, 192.168.0.1 and 192.168.0.2) as an example.
271
249
272
-
#### Deploy multi-nodes datasystem
273
-
On each node, you need to connect to the etcd service on the head node using your local node's IP address.
Now you can use datasystem on head-node and work-node.
284
272
285
273
> For more detailed deployment instructions, please refer to [yuanrong documents](https://gitcode.com/openeuler/yuanrong-datasystem/blob/master/README.md#%E9%83%A8%E7%BD%B2-openyuanrong-datasystem).
286
274
> The configuration parameters for deploying the data system can refer [dscli config](https://gitcode.com/openeuler/yuanrong-datasystem/blob/master/docs/source_zh_cn/deployment/dscli.md#%E9%85%8D%E7%BD%AE%E9%A1%B9%E8%AF%B4%E6%98%8E).
287
275
288
276
There is a demo with multi-node scenarios as fellow.
289
277
290
-
#### Deploy ray
291
-
```bash
292
-
# on head node
293
-
ray start --head --resources='{"node:10.170.27.24": 1}'
294
-
295
-
# on worker node (assume ray port of head_node is 6379)
296
-
ray start --address="10.170.27.24:6379" --resources='{"node:10.170.27.33": 1}'
297
-
```
298
-
299
278
#### Run demo
300
-
In the demo below, we use ray actors to implement distributed deployment of processes.
279
+
In the demo below, we use ray actors to implement distributed deployment of processes.
301
280
The actor writer writes data to the head node, and the actor reader reads data from the worker nodes.
0 commit comments