You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+80Lines changed: 80 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -96,4 +96,84 @@ grpcurl \
96
96
97
97
*Note: The limiter daemon now automatically sets the socket permissions to `0o777`, so you do not need `sudo` to run this command from the host.*
98
98
99
+
## Reporting Compute Utilization via gRPC
100
+
101
+
In addition to `SetPriority`, the daemon exposes a `GetUtilization` RPC that reports the compute utilization of this container, per NPU device.
102
+
103
+
### How it works
104
+
105
+
Each manager thread participates in a global baton-passing scheme: only the container that currently owns the baton is allowed to issue NPU work. A background reporter thread (one per device) wakes on each baton handoff (the same `signal_counter` futex the managers use) and on fixed window deadlines. It records how long our manager held the baton within each window, then computes `busy_us / window_us` at each deadline and keeps a rolling history for averaging. Window boundaries do not shift when baton events are processed.
106
+
107
+
The reporter only reads shared memory — it does not modify or interfere with the scheduler.
108
+
109
+
Each device entry also includes **memory** numbers:
110
+
111
+
- When **`NPU_MEM_QUOTA`** was set at limiter start (`limit_enforced: true`), totals come from the same quota tracker as the hook: `total_mb` is the configured quota (floor MB), `used_mb` is `memory_used` from shared memory (floor MB), and `free_mb` is `total_mb - used_mb`. This matches what the limiter enforces; it is **not**`rtMemGetInfoEx` in that mode.
112
+
- When **no quota** was set (`limit_enforced: false`), the daemon uses **`rtMemGetInfoEx`** on that device (after `rtSetDevice` with the visible-device index) so you still get total / used / derived free in MB.
113
+
114
+
### Environment variables
115
+
116
+
Read once at daemon startup:
117
+
118
+
| Variable | Default | Effect |
119
+
| --- | --- | --- |
120
+
|`NPU_REPORT_INTERVAL_MS`|`1000`| Window length in ms. Set to `0` to disable the reporter thread; `GetUtilization` will then return zero percentages. |
121
+
|`NPU_REPORT_HISTORY_SCALE`|`10`| Number of recent windows averaged for `utilization_recent_windows_avg_percent`. Minimum `1`. |
|`devices[].device_id`| Physical device id (matches entries in `ASCEND_RT_VISIBLE_DEVICES`). |
164
+
|`devices[].tracked`|`true` once the reporter has located this device's manager slot. |
165
+
|`devices[].utilization_last_interval_percent`| Utilization of the most recently completed window, `0..=100`. |
166
+
|`devices[].utilization_recent_windows_avg_percent`| Mean utilization over up to `history_scale` completed windows, `0..=100`. |
167
+
|`devices[].memory.limit_enforced`|`true` if `NPU_MEM_QUOTA` was set at daemon start for this pod. |
168
+
|`devices[].memory.total_mb`| Quota total in MB (floor) when enforced; otherwise runtime total from `rtMemGetInfoEx`. |
169
+
|`devices[].memory.used_mb`| Tracked used memory in MB (floor) when enforced; otherwise `total_mb - free_mb` from runtime. |
170
+
|`devices[].memory.free_mb`|`total_mb - used_mb` when enforced; otherwise runtime free in MB (floor). |
171
+
172
+
Notes:
173
+
174
+
- Scope is **per container**. Multiple containers publish independently on their own UDS.
175
+
- History is in-memory and resets on daemon restart.
176
+
- "Busy" here means wall time that this container's manager held the global baton between consecutive handoffs; it does not count time spent waiting in the queue.
177
+
- If no work happens for a whole window, the value is `0%` and reports continue to arrive at the fixed cadence.
0 commit comments