Skip to content

Commit 62ad05f

Browse files
Merge pull request #39 from DSFans2014/docs/readme
docs: update readme
2 parents 9b50dbe + e512bbc commit 62ad05f

4 files changed

Lines changed: 77 additions & 143 deletions

File tree

README.md

Lines changed: 36 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,13 @@
22

33
## Introduction
44

5-
This Ascend device plugin is implemented for [HAMi](https://github.com/Project-HAMi/HAMi) scheduling.
5+
This Ascend device plugin is implemented for [HAMi](https://github.com/Project-HAMi/HAMi) and [volcano](https://github.com/volcano-sh/volcano) scheduling.
66

7-
Memory slicing is supported based on virtualization template, lease available template is automatically used. For detailed information, check [templeate](./config.yaml)
7+
Memory slicing is supported based on virtualization template, lease available template is automatically used. For detailed information, check [template](./ascend-device-configmap.yaml)
88

99
## Prerequisites
1010

11-
[ascend-docker-runtime](https://gitee.com/ascend/ascend-docker-runtime)
11+
[ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime)
1212

1313
## Compile
1414

@@ -24,51 +24,32 @@ docker buildx build -t $IMAGE_NAME .
2424

2525
## Deployment
2626

27-
Due to dependencies with HAMi, you need to set
27+
### Label the Node with `ascend=on`
2828

29-
```
30-
devices.ascend.enabled=true
31-
```
32-
33-
during HAMi installation. For more details, see 'devices' section in values.yaml.
3429

35-
```yaml
36-
devices:
37-
ascend:
38-
enabled: true
39-
image: "ascend-device-plugin:master"
40-
imagePullPolicy: IfNotPresent
41-
extraArgs: []
42-
nodeSelector:
43-
ascend: "on"
44-
tolerations: []
45-
resources:
46-
- huawei.com/Ascend910A
47-
- huawei.com/Ascend910A-memory
48-
- huawei.com/Ascend910B
49-
- huawei.com/Ascend910B-memory
50-
- huawei.com/Ascend310P
51-
- huawei.com/Ascend310P-memory
5230
```
31+
kubectl label node {ascend-node} ascend=on
32+
```
5333

54-
Note that resources here(hawei.com/Ascend910A,huawei.com/Ascend910B,...) is managed in hami-scheduler-device configMap. It defines three different templates(910A,910B,310P).
55-
56-
label your NPU nodes with 'ascend=on'
34+
### Deploy ConfigMap
5735

5836
```
59-
kubectl label node {ascend-node} ascend=on
37+
kubectl apply -f ascend-device-configmap.yaml
6038
```
6139

62-
Deploy ascend-device-plugin by running
40+
### Deploy `ascend-device-plugin`
6341

6442
```bash
6543
kubectl apply -f ascend-device-plugin.yaml
6644
```
6745

46+
If scheduling Ascend devices in HAMi, simply set `devices.ascend.enabled` to true when deploying HAMi, and the ConfigMap and `ascend-device-plugin` will be automatically deployed. refer https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/README.md#huawei-ascend
6847

6948
## Usage
7049

71-
You can allocate a slice of NPU by specifying both resource number and resource memory. For more examples, see [examples](./examples/)
50+
To exclusively use an entire card or request multiple cards, you only need to set the corresponding resourceName. If multiple tasks need to share the same NPU, you need to set the corresponding resource request to 1 and configure the appropriate ResourceMemoryName.
51+
52+
### Usage in HAMi
7253

7354
```yaml
7455
...
@@ -81,3 +62,26 @@ You can allocate a slice of NPU by specifying both resource number and resource
8162
# if you don't specify Ascend910B-memory, it will use a whole NPU.
8263
huawei.com/Ascend910B-memory: "4096"
8364
```
65+
For more examples, see [examples](./examples/)
66+
67+
### Usage in volcano
68+
69+
Volcano must be installed prior to usage, for more information see [here](https://github.com/volcano-sh/volcano/tree/master/docs/user-guide/how_to_use_vnpu.md)
70+
71+
```yaml
72+
apiVersion: v1
73+
kind: Pod
74+
metadata:
75+
name: ascend-pod
76+
spec:
77+
schedulerName: volcano
78+
containers:
79+
- name: ubuntu-container
80+
image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
81+
command: ["sleep"]
82+
args: ["100000"]
83+
resources:
84+
limits:
85+
huawei.com/Ascend310P: "1"
86+
huawei.com/Ascend310P-memory: "4096"
87+
```

README_cn.md

Lines changed: 41 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22

33
## 说明
44

5-
基于[HAMi](https://github.com/Project-HAMi/HAMi)调度机制的ascend device plugin。
5+
Ascend device plugin 是用来支持在 [HAMi](https://github.com/Project-HAMi/HAMi)[volcano](https://github.com/volcano-sh/volcano) 中调度昇腾NPU设备.
66

7-
支持基于显存调度,显存是基于昇腾的虚拟化模板来切分的,会找到满足显存需求的最小模板来作为容器的显存。模版的具体信息参考[配置模版](./config.yaml)
7+
昇腾NPU虚拟化切分是通过模板来配置的,在调度时会找到满足显存需求的最小模板来作为容器的显存。各芯片的模板配置信息参考[这里](./ascend-device-configmap.yaml)
88

9-
启动容器依赖[ascend-docker-runtime](https://gitee.com/ascend/ascend-docker-runtime)
9+
## 环境要求
1010

11-
## 编译
11+
部署 [ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime)
1212

13-
### 编译二进制文件
13+
## 编译
1414

1515
```bash
1616
make all
@@ -24,47 +24,33 @@ docker buildx build -t $IMAGE_NAME .
2424

2525
## 部署
2626

27-
由于和HAMi的一些依赖关系,部署集成在HAMi的部署中,指定以下字段:
28-
29-
```
30-
devices.ascend.enabled=true
31-
```
27+
### 给 Node 打 ascend 标签
3228

33-
相关的每一种NPU设备的资源名,参考values.yaml中的以下字段,目前本组件支持3种型号的NPU切片(310p,910A,910B)若不需要修改的话可以直接使用以下的默认配置:
3429

35-
```yaml
36-
devices:
37-
ascend:
38-
enabled: true
39-
image: "ascend-device-plugin:master"
40-
imagePullPolicy: IfNotPresent
41-
extraArgs: []
42-
nodeSelector:
43-
ascend: "on"
44-
tolerations: []
45-
resources:
46-
- huawei.com/Ascend910A
47-
- huawei.com/Ascend910A-memory
48-
- huawei.com/Ascend910B
49-
- huawei.com/Ascend910B-memory
50-
- huawei.com/Ascend310P
51-
- huawei.com/Ascend310P-memory
30+
```
31+
kubectl label node {ascend-node} ascend=on
5232
```
5333

54-
将集群中的NPU节点打上如下标签:
34+
### 部署 ConfigMap
5535

5636
```
57-
kubectl label node {ascend-node} ascend=on
37+
kubectl apply -f ascend-device-configmap.yaml
5838
```
5939

60-
最后使用以下指令部署ascend-device-plugin
40+
### 部署 `ascend-device-plugin`
6141

6242
```bash
6343
kubectl apply -f ascend-device-plugin.yaml
6444
```
6545

46+
如果要在HAMi中使用升腾NPU, 在部署HAMi时设置 `devices.ascend.enabled` 为 true 会自动部署 ConfigMap 和 `ascend-device-plugin`。 参考 https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/README.md#huawei-ascend
47+
6648
## 使用
6749

50+
如果要独占整卡或者申请多张卡只需要设置对应的 resourceName 即可。如果多个任务要共享同一张卡,需要将 resourceName 设置为1,并且设置对应的 ResourceMemoryName。
51+
52+
### 在 HAMi 中使用
53+
6854
```yaml
6955
...
7056
containers:
@@ -73,6 +59,29 @@ kubectl apply -f ascend-device-plugin.yaml
7359
resources:
7460
limits:
7561
huawei.com/Ascend910B: "1"
76-
# 不填写显存默认使用整张卡
62+
# 如果不指定显存大小, 就会使用整张卡
7763
huawei.com/Ascend910B-memory: "4096"
7864
```
65+
For more examples, see [examples](./examples/)
66+
67+
### 在 volcano 中使用
68+
69+
在 volcano 中使用时需要提前部署好 volcano, 更多信息请[参考这里](https://github.com/volcano-sh/volcano/tree/master/docs/user-guide/how_to_use_vnpu.md)
70+
71+
```yaml
72+
apiVersion: v1
73+
kind: Pod
74+
metadata:
75+
name: ascend-pod
76+
spec:
77+
schedulerName: volcano
78+
containers:
79+
- name: ubuntu-container
80+
image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
81+
command: ["sleep"]
82+
args: ["100000"]
83+
resources:
84+
limits:
85+
huawei.com/Ascend310P: "1"
86+
huawei.com/Ascend310P-memory: "4096"
87+
```

config.yaml

Lines changed: 0 additions & 79 deletions
This file was deleted.

0 commit comments

Comments
 (0)