|
1 | | -# Ascend Device Plugin |
2 | | -[](https://app.fossa.com/projects/git%2Bgithub.com%2FProject-HAMi%2Fascend-device-plugin?ref=badge_shield) |
3 | | - |
4 | | - |
5 | | -## Introduction |
6 | | - |
7 | | -This Ascend device plugin is implemented for [HAMi](https://github.com/Project-HAMi/HAMi) and [volcano](https://github.com/volcano-sh/volcano) scheduling. |
8 | | - |
9 | | -Memory slicing is supported based on virtualization template, lease available template is automatically used. For detailed information, check [template](./ascend-device-configmap.yaml) |
10 | | - |
11 | | -## Prerequisites |
12 | | - |
13 | | -[ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime) |
14 | | - |
15 | | -```bash |
16 | | -git submodule add https://gitcode.com/Ascend/mind-cluster.git |
17 | | -``` |
18 | | - |
19 | | -## Compile |
20 | | - |
21 | | -```bash |
22 | | -make all |
23 | | -``` |
24 | | - |
25 | | -### Build |
26 | | - |
27 | | -```bash |
28 | | -docker buildx build -t $IMAGE_NAME . |
29 | | -``` |
30 | | - |
31 | | -## Deployment |
32 | | - |
33 | | -### Label the Node with `ascend=on` |
34 | | - |
35 | | -```bash |
36 | | -kubectl label node {ascend-node} ascend=on |
37 | | -``` |
38 | | - |
39 | | -### Deploy ConfigMap |
40 | | - |
41 | | -```bash |
42 | | -kubectl apply -f ascend-device-configmap.yaml |
43 | | -``` |
44 | | - |
45 | | -### Deply RuntimeClass |
46 | | - |
47 | | -```bash |
48 | | -kubectl apply -f ascend-runtimeclass.yaml |
49 | | -``` |
50 | | - |
51 | | -### Deploy `ascend-device-plugin` |
52 | | - |
53 | | -```bash |
54 | | -kubectl apply -f ascend-device-plugin.yaml |
55 | | -``` |
56 | | - |
57 | | -If scheduling Ascend devices in HAMi, simply set `devices.ascend.enabled` to true when deploying HAMi, and the ConfigMap and `ascend-device-plugin` will be automatically deployed. refer https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/README.md#huawei-ascend |
58 | | - |
59 | | -If you require HAMi to automatically add the `runtimeClassName` configuration to Pods requesting Ascend resources (this is disabled by default), you should set `devices.ascend.runtimeClassName` value to **a non-empty string** in HAMi’s `values.yaml` file, ensuring it matches the name of the `RuntimeClass` resource. For example: |
60 | | - |
61 | | -```yaml |
62 | | -devices: |
63 | | - ascend: |
64 | | - runtimeClassName: ascend |
65 | | -``` |
66 | | -
|
67 | | -## Usage |
68 | | -
|
69 | | -To exclusively use an entire card or request multiple cards, you only need to set the corresponding resourceName. If multiple tasks need to share the same NPU, you need to set the corresponding resource request to 1 and configure the appropriate ResourceMemoryName. |
70 | | -
|
71 | | -### Usage in HAMi |
72 | | -
|
73 | | -```yaml |
74 | | -... |
75 | | - containers: |
76 | | - - name: npu_pod |
77 | | - ... |
78 | | - resources: |
79 | | - limits: |
80 | | - huawei.com/Ascend910B: "1" |
81 | | - # if you don't specify Ascend910B-memory, it will use a whole NPU. |
82 | | - huawei.com/Ascend910B-memory: "4096" |
83 | | -``` |
84 | | -
|
85 | | -For more examples, see [examples](./examples/) |
86 | | -
|
87 | | -### Usage in volcano |
88 | | -
|
89 | | -Volcano must be installed prior to usage, for more information see [here](https://github.com/volcano-sh/volcano/tree/master/docs/user-guide/how_to_use_vnpu.md) |
90 | | -
|
91 | | -```yaml |
92 | | -apiVersion: v1 |
93 | | -kind: Pod |
94 | | -metadata: |
95 | | - name: ascend-pod |
96 | | -spec: |
97 | | - schedulerName: volcano |
98 | | - containers: |
99 | | - - name: ubuntu-container |
100 | | - image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04 |
101 | | - command: ["sleep"] |
102 | | - args: ["100000"] |
103 | | - resources: |
104 | | - limits: |
105 | | - huawei.com/Ascend310P: "1" |
106 | | - huawei.com/Ascend310P-memory: "4096" |
107 | | - ``` |
108 | | -
|
109 | | -## License |
| 1 | +# Ascend Device Plugin |
| 2 | +[](https://app.fossa.com/projects/git%2Bgithub.com%2FProject-HAMi%2Fascend-device-plugin?ref=badge_shield) |
| 3 | + |
| 4 | + |
| 5 | +## Introduction |
| 6 | + |
| 7 | +This Ascend device plugin is implemented for [HAMi](https://github.com/Project-HAMi/HAMi) and [volcano](https://github.com/volcano-sh/volcano) scheduling. |
| 8 | + |
| 9 | +#### 1. Template-based Hard Slicing (vNPU) |
| 10 | + |
| 11 | +Memory slicing is supported based on virtualization template, lease available template is automatically used. For detailed information, check [template](./ascend-device-configmap.yaml) |
| 12 | + |
| 13 | +#### 2. Soft Slicing with Runtime Interception (hami-vnpu-core) |
| 14 | + |
| 15 | +This project implements a soft slicing mechanism based on `libvnpu.so` interception and `limiter` token scheduling, enabling fine-grained resource sharing. For detailed information, check [hami-vnpu-core](https://github.com/Project-HAMi/hami-vnpu-core) |
| 16 | + |
| 17 | +## Prerequisites |
| 18 | + |
| 19 | +[ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime) |
| 20 | + |
| 21 | +update submodule: |
| 22 | + |
| 23 | +```bash |
| 24 | +git submodule update --init --recursive |
| 25 | +``` |
| 26 | + |
| 27 | +hami-vnpu-core Soft Slicing Requirements: |
| 28 | + |
| 29 | +- **Ascend Driver Version**: ≥ 25.5 |
| 30 | +- **Chip Mode**: enable `device-share` mode on Ascend chips for virtualization |
| 31 | + |
| 32 | +## Compile |
| 33 | + |
| 34 | +```bash |
| 35 | +make all |
| 36 | +``` |
| 37 | + |
| 38 | +### Build |
| 39 | + |
| 40 | +```bash |
| 41 | +docker buildx build -t $IMAGE_NAME . |
| 42 | +``` |
| 43 | + |
| 44 | +## Deployment |
| 45 | + |
| 46 | +### Label the Node with `ascend=on` |
| 47 | + |
| 48 | +```bash |
| 49 | +kubectl label node {ascend-node} ascend=on |
| 50 | +``` |
| 51 | + |
| 52 | +### Deploy ConfigMap |
| 53 | + |
| 54 | +```bash |
| 55 | +kubectl apply -f ascend-device-configmap.yaml |
| 56 | +``` |
| 57 | + |
| 58 | +### Deply RuntimeClass |
| 59 | + |
| 60 | +```bash |
| 61 | +kubectl apply -f ascend-runtimeclass.yaml |
| 62 | +``` |
| 63 | + |
| 64 | +### Deploy `ascend-device-plugin` |
| 65 | + |
| 66 | +```bash |
| 67 | +kubectl apply -f ascend-device-plugin.yaml |
| 68 | +``` |
| 69 | + |
| 70 | +If scheduling Ascend devices in HAMi, simply set `devices.ascend.enabled` to true when deploying HAMi, and the ConfigMap and `ascend-device-plugin` will be automatically deployed. refer https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/README.md#huawei-ascend |
| 71 | + |
| 72 | +If you require HAMi to automatically add the `runtimeClassName` configuration to Pods requesting Ascend resources (this is disabled by default), you should set `devices.ascend.runtimeClassName` value to **a non-empty string** in HAMi’s `values.yaml` file, ensuring it matches the name of the `RuntimeClass` resource. For example: |
| 73 | + |
| 74 | +```yaml |
| 75 | +devices: |
| 76 | + ascend: |
| 77 | + runtimeClassName: ascend |
| 78 | +``` |
| 79 | +
|
| 80 | +## Usage |
| 81 | +
|
| 82 | +To exclusively use an entire card or request multiple cards, you only need to set the corresponding resourceName. If multiple tasks need to share the same NPU, you need to set the corresponding resource request to 1 and configure the appropriate ResourceMemoryName. |
| 83 | +
|
| 84 | +### Usage in HAMi |
| 85 | +
|
| 86 | +```yaml |
| 87 | +... |
| 88 | + containers: |
| 89 | + - name: npu_pod |
| 90 | + ... |
| 91 | + resources: |
| 92 | + limits: |
| 93 | + huawei.com/Ascend910B: "1" |
| 94 | + # if you don't specify Ascend910B-memory, it will use a whole NPU. |
| 95 | + huawei.com/Ascend910B-memory: "4096" |
| 96 | +``` |
| 97 | +
|
| 98 | +For more examples, see [examples](./examples/) |
| 99 | +
|
| 100 | +### Soft Slicing Configuration (HAMi) |
| 101 | +
|
| 102 | +```yaml |
| 103 | +apiVersion: v1 |
| 104 | +kind: Pod |
| 105 | +metadata: |
| 106 | + name: ascend-soft-slice-pod |
| 107 | + annotations: |
| 108 | + huawei.com/vnpu-mode: 'hami-core' # Enables hami-vnpu-core soft-segmentation for this pod |
| 109 | +spec: |
| 110 | + containers: |
| 111 | + - name: npu_pod |
| 112 | + ... |
| 113 | + resources: |
| 114 | + limits: |
| 115 | + huawei.com/Ascend910B3: "1" # Request 1 physical NPU |
| 116 | + huawei.com/Ascend910B3-memory: "28672" # Request 28Gi memory |
| 117 | + huawei.com/Ascend910B3-core: "40" # Request 40% core |
| 118 | +``` |
| 119 | +
|
| 120 | +### Usage in volcano |
| 121 | +
|
| 122 | +Volcano must be installed prior to usage, for more information see [here](https://github.com/volcano-sh/volcano/tree/master/docs/user-guide/how_to_use_vnpu.md) |
| 123 | +
|
| 124 | +```yaml |
| 125 | +apiVersion: v1 |
| 126 | +kind: Pod |
| 127 | +metadata: |
| 128 | + name: ascend-pod |
| 129 | +spec: |
| 130 | + schedulerName: volcano |
| 131 | + containers: |
| 132 | + - name: ubuntu-container |
| 133 | + image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04 |
| 134 | + command: ["sleep"] |
| 135 | + args: ["100000"] |
| 136 | + resources: |
| 137 | + limits: |
| 138 | + huawei.com/Ascend310P: "1" |
| 139 | + huawei.com/Ascend310P-memory: "4096" |
| 140 | +``` |
| 141 | +
|
| 142 | +## License |
| 143 | +
|
110 | 144 | [](https://app.fossa.com/projects/git%2Bgithub.com%2FProject-HAMi%2Fascend-device-plugin?ref=badge_large) |
0 commit comments