Skip to content

Commit ce4c6ea

Browse files
Merge pull request #61 from ashergaga/main
feat: integrate hami-vnpu-core to support HBM and Compute Core virtualization
2 parents b9f873f + 30cccb3 commit ce4c6ea

9 files changed

Lines changed: 515 additions & 234 deletions

File tree

Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,5 +21,6 @@ RUN make all
2121
FROM $BASE_IMAGE
2222
ENV LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/driver/lib64/common
2323
COPY --from=build /build/ascend-device-plugin /usr/local/bin/ascend-device-plugin
24+
COPY ./lib/hami-vnpu-core/ /usr/local/hami-vnpu-core-assets/
2425

2526
ENTRYPOINT ["ascend-device-plugin"]

README.md

Lines changed: 143 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -1,110 +1,144 @@
1-
# Ascend Device Plugin
2-
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2FProject-HAMi%2Fascend-device-plugin.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2FProject-HAMi%2Fascend-device-plugin?ref=badge_shield)
3-
4-
5-
## Introduction
6-
7-
This Ascend device plugin is implemented for [HAMi](https://github.com/Project-HAMi/HAMi) and [volcano](https://github.com/volcano-sh/volcano) scheduling.
8-
9-
Memory slicing is supported based on virtualization template, lease available template is automatically used. For detailed information, check [template](./ascend-device-configmap.yaml)
10-
11-
## Prerequisites
12-
13-
[ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime)
14-
15-
```bash
16-
git submodule add https://gitcode.com/Ascend/mind-cluster.git
17-
```
18-
19-
## Compile
20-
21-
```bash
22-
make all
23-
```
24-
25-
### Build
26-
27-
```bash
28-
docker buildx build -t $IMAGE_NAME .
29-
```
30-
31-
## Deployment
32-
33-
### Label the Node with `ascend=on`
34-
35-
```bash
36-
kubectl label node {ascend-node} ascend=on
37-
```
38-
39-
### Deploy ConfigMap
40-
41-
```bash
42-
kubectl apply -f ascend-device-configmap.yaml
43-
```
44-
45-
### Deply RuntimeClass
46-
47-
```bash
48-
kubectl apply -f ascend-runtimeclass.yaml
49-
```
50-
51-
### Deploy `ascend-device-plugin`
52-
53-
```bash
54-
kubectl apply -f ascend-device-plugin.yaml
55-
```
56-
57-
If scheduling Ascend devices in HAMi, simply set `devices.ascend.enabled` to true when deploying HAMi, and the ConfigMap and `ascend-device-plugin` will be automatically deployed. refer https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/README.md#huawei-ascend
58-
59-
If you require HAMi to automatically add the `runtimeClassName` configuration to Pods requesting Ascend resources (this is disabled by default), you should set `devices.ascend.runtimeClassName` value to **a non-empty string** in HAMi’s `values.yaml` file, ensuring it matches the name of the `RuntimeClass` resource. For example:
60-
61-
```yaml
62-
devices:
63-
ascend:
64-
runtimeClassName: ascend
65-
```
66-
67-
## Usage
68-
69-
To exclusively use an entire card or request multiple cards, you only need to set the corresponding resourceName. If multiple tasks need to share the same NPU, you need to set the corresponding resource request to 1 and configure the appropriate ResourceMemoryName.
70-
71-
### Usage in HAMi
72-
73-
```yaml
74-
...
75-
containers:
76-
- name: npu_pod
77-
...
78-
resources:
79-
limits:
80-
huawei.com/Ascend910B: "1"
81-
# if you don't specify Ascend910B-memory, it will use a whole NPU.
82-
huawei.com/Ascend910B-memory: "4096"
83-
```
84-
85-
For more examples, see [examples](./examples/)
86-
87-
### Usage in volcano
88-
89-
Volcano must be installed prior to usage, for more information see [here](https://github.com/volcano-sh/volcano/tree/master/docs/user-guide/how_to_use_vnpu.md)
90-
91-
```yaml
92-
apiVersion: v1
93-
kind: Pod
94-
metadata:
95-
name: ascend-pod
96-
spec:
97-
schedulerName: volcano
98-
containers:
99-
- name: ubuntu-container
100-
image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
101-
command: ["sleep"]
102-
args: ["100000"]
103-
resources:
104-
limits:
105-
huawei.com/Ascend310P: "1"
106-
huawei.com/Ascend310P-memory: "4096"
107-
```
108-
109-
## License
1+
# Ascend Device Plugin
2+
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2FProject-HAMi%2Fascend-device-plugin.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2FProject-HAMi%2Fascend-device-plugin?ref=badge_shield)
3+
4+
5+
## Introduction
6+
7+
This Ascend device plugin is implemented for [HAMi](https://github.com/Project-HAMi/HAMi) and [volcano](https://github.com/volcano-sh/volcano) scheduling.
8+
9+
#### 1. Template-based Hard Slicing (vNPU)
10+
11+
Memory slicing is supported based on virtualization template, lease available template is automatically used. For detailed information, check [template](./ascend-device-configmap.yaml)
12+
13+
#### 2. Soft Slicing with Runtime Interception (hami-vnpu-core)
14+
15+
This project implements a soft slicing mechanism based on `libvnpu.so` interception and `limiter` token scheduling, enabling fine-grained resource sharing. For detailed information, check [hami-vnpu-core](https://github.com/Project-HAMi/hami-vnpu-core)
16+
17+
## Prerequisites
18+
19+
[ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime)
20+
21+
update submodule:
22+
23+
```bash
24+
git submodule update --init --recursive
25+
```
26+
27+
hami-vnpu-core Soft Slicing Requirements:
28+
29+
- **Ascend Driver Version**: ≥ 25.5
30+
- **Chip Mode**: enable `device-share` mode on Ascend chips for virtualization
31+
32+
## Compile
33+
34+
```bash
35+
make all
36+
```
37+
38+
### Build
39+
40+
```bash
41+
docker buildx build -t $IMAGE_NAME .
42+
```
43+
44+
## Deployment
45+
46+
### Label the Node with `ascend=on`
47+
48+
```bash
49+
kubectl label node {ascend-node} ascend=on
50+
```
51+
52+
### Deploy ConfigMap
53+
54+
```bash
55+
kubectl apply -f ascend-device-configmap.yaml
56+
```
57+
58+
### Deply RuntimeClass
59+
60+
```bash
61+
kubectl apply -f ascend-runtimeclass.yaml
62+
```
63+
64+
### Deploy `ascend-device-plugin`
65+
66+
```bash
67+
kubectl apply -f ascend-device-plugin.yaml
68+
```
69+
70+
If scheduling Ascend devices in HAMi, simply set `devices.ascend.enabled` to true when deploying HAMi, and the ConfigMap and `ascend-device-plugin` will be automatically deployed. refer https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/README.md#huawei-ascend
71+
72+
If you require HAMi to automatically add the `runtimeClassName` configuration to Pods requesting Ascend resources (this is disabled by default), you should set `devices.ascend.runtimeClassName` value to **a non-empty string** in HAMi’s `values.yaml` file, ensuring it matches the name of the `RuntimeClass` resource. For example:
73+
74+
```yaml
75+
devices:
76+
ascend:
77+
runtimeClassName: ascend
78+
```
79+
80+
## Usage
81+
82+
To exclusively use an entire card or request multiple cards, you only need to set the corresponding resourceName. If multiple tasks need to share the same NPU, you need to set the corresponding resource request to 1 and configure the appropriate ResourceMemoryName.
83+
84+
### Usage in HAMi
85+
86+
```yaml
87+
...
88+
containers:
89+
- name: npu_pod
90+
...
91+
resources:
92+
limits:
93+
huawei.com/Ascend910B: "1"
94+
# if you don't specify Ascend910B-memory, it will use a whole NPU.
95+
huawei.com/Ascend910B-memory: "4096"
96+
```
97+
98+
For more examples, see [examples](./examples/)
99+
100+
### Soft Slicing Configuration (HAMi)
101+
102+
```yaml
103+
apiVersion: v1
104+
kind: Pod
105+
metadata:
106+
name: ascend-soft-slice-pod
107+
annotations:
108+
huawei.com/vnpu-mode: 'hami-core' # Enables hami-vnpu-core soft-segmentation for this pod
109+
spec:
110+
containers:
111+
- name: npu_pod
112+
...
113+
resources:
114+
limits:
115+
huawei.com/Ascend910B3: "1" # Request 1 physical NPU
116+
huawei.com/Ascend910B3-memory: "28672" # Request 28Gi memory
117+
huawei.com/Ascend910B3-core: "40" # Request 40% core
118+
```
119+
120+
### Usage in volcano
121+
122+
Volcano must be installed prior to usage, for more information see [here](https://github.com/volcano-sh/volcano/tree/master/docs/user-guide/how_to_use_vnpu.md)
123+
124+
```yaml
125+
apiVersion: v1
126+
kind: Pod
127+
metadata:
128+
name: ascend-pod
129+
spec:
130+
schedulerName: volcano
131+
containers:
132+
- name: ubuntu-container
133+
image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
134+
command: ["sleep"]
135+
args: ["100000"]
136+
resources:
137+
limits:
138+
huawei.com/Ascend310P: "1"
139+
huawei.com/Ascend310P-memory: "4096"
140+
```
141+
142+
## License
143+
110144
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2FProject-HAMi%2Fascend-device-plugin.svg?type=large)](https://app.fossa.com/projects/git%2Bgithub.com%2FProject-HAMi%2Fascend-device-plugin?ref=badge_large)

0 commit comments

Comments
 (0)