Project-HAMi
diff --git a/‎.gitignore‎
Lines changed: 487 additions & 0 deletions b/‎.gitignore‎
Lines changed: 487 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 80 additions & 0 deletions b/‎README.md‎
Lines changed: 80 additions & 0 deletions
diff --git a/‎crates/limiter/src/externed_api.rs‎
Lines changed: 2 additions & 0 deletions b/‎crates/limiter/src/externed_api.rs‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎crates/limiter/src/lib.rs‎
Lines changed: 2 additions & 0 deletions b/‎crates/limiter/src/lib.rs‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎crates/limiter/src/main.rs‎
Lines changed: 104 additions & 15 deletions b/‎crates/limiter/src/main.rs‎
Lines changed: 104 additions & 15 deletions
diff --git a/‎crates/limiter/src/memory_report.rs‎
Lines changed: 79 additions & 0 deletions b/‎crates/limiter/src/memory_report.rs‎
Lines changed: 79 additions & 0 deletions
@@ -96,4 +96,84 @@ grpcurl \
 
 *Note: The limiter daemon now automatically sets the socket permissions to `0o777`, so you do not need `sudo` to run this command from the host.*
 
+## Reporting Compute Utilization via gRPC
+
+In addition to `SetPriority`, the daemon exposes a `GetUtilization` RPC that reports the compute utilization of this container, per NPU device.
+
+### How it works
+
+Each manager thread participates in a global baton-passing scheme: only the container that currently owns the baton is allowed to issue NPU work. A background reporter thread (one per device) wakes on each baton handoff (the same `signal_counter` futex the managers use) and on fixed window deadlines. It records how long our manager held the baton within each window, then computes `busy_us / window_us` at each deadline and keeps a rolling history for averaging. Window boundaries do not shift when baton events are processed.
+
+The reporter only reads shared memory — it does not modify or interfere with the scheduler.
+
+Each device entry also includes **memory** numbers:
+
+- When **`NPU_MEM_QUOTA`** was set at limiter start (`limit_enforced: true`), totals come from the same quota tracker as the hook: `total_mb` is the configured quota (floor MB), `used_mb` is `memory_used` from shared memory (floor MB), and `free_mb` is `total_mb - used_mb`. This matches what the limiter enforces; it is **not** `rtMemGetInfoEx` in that mode.
+- When **no quota** was set (`limit_enforced: false`), the daemon uses **`rtMemGetInfoEx`** on that device (after `rtSetDevice` with the visible-device index) so you still get total / used / derived free in MB.
+
+### Environment variables
+
+Read once at daemon startup:
+
+| Variable | Default | Effect |
+| --- | --- | --- |
+| `NPU_REPORT_INTERVAL_MS` | `1000` | Window length in ms. Set to `0` to disable the reporter thread; `GetUtilization` will then return zero percentages. |
+| `NPU_REPORT_HISTORY_SCALE` | `10` | Number of recent windows averaged for `utilization_recent_windows_avg_percent`. Minimum `1`. |
+
+### Example usage
+
+```bash
+grpcurl \
+  -plaintext \
+  -unix \
+  -d '{}' \
+  /tmp/hami-shared-region/npu_limiter_2.sock \
+  npu_limiter.LimiterControl/GetUtilization
+```
+
+Example response:
+
+```json
+{
+  "intervalMs": "1000",
+  "historyScale": "10",
+  "devices": [
+    {
+      "deviceId": 0,
+      "tracked": true,
+      "utilizationLastIntervalPercent": 37.25,
+      "utilizationRecentWindowsAvgPercent": 32.88,
+      "memory": {
+        "limitEnforced": true,
+        "totalMb": "10240",
+        "usedMb": "1248",
+        "freeMb": "8992"
+      }
+    }
+  ]
+}
+```
+
+### Field meanings
+
+| Field | Meaning |
+| --- | --- |
+| `interval_ms` | Effective window length (from `NPU_REPORT_INTERVAL_MS`). |
+| `history_scale` | Effective rolling-average size (from `NPU_REPORT_HISTORY_SCALE`). |
+| `devices[].device_id` | Physical device id (matches entries in `ASCEND_RT_VISIBLE_DEVICES`). |
+| `devices[].tracked` | `true` once the reporter has located this device's manager slot. |
+| `devices[].utilization_last_interval_percent` | Utilization of the most recently completed window, `0..=100`. |
+| `devices[].utilization_recent_windows_avg_percent` | Mean utilization over up to `history_scale` completed windows, `0..=100`. |
+| `devices[].memory.limit_enforced` | `true` if `NPU_MEM_QUOTA` was set at daemon start for this pod. |
+| `devices[].memory.total_mb` | Quota total in MB (floor) when enforced; otherwise runtime total from `rtMemGetInfoEx`. |
+| `devices[].memory.used_mb` | Tracked used memory in MB (floor) when enforced; otherwise `total_mb - free_mb` from runtime. |
+| `devices[].memory.free_mb` | `total_mb - used_mb` when enforced; otherwise runtime free in MB (floor). |
+
+Notes:
+
+- Scope is **per container**. Multiple containers publish independently on their own UDS.
+- History is in-memory and resets on daemon restart.
+- "Busy" here means wall time that this container's manager held the global baton between consecutive handoffs; it does not count time spent waiting in the queue.
+- If no work happens for a whole window, the value is `0%` and reports continue to arrive at the fixed cadence.
+
 ---
@@ -13,6 +13,8 @@ unsafe extern "C" {
     pub fn rtGetDevicePhyIdByIndex(device_index: u32, phy_device: *mut u32) -> i32;
     pub fn rtDeviceSynchronize() -> i32;
     pub fn rtStreamGetCaptureInfo(stream: u64, status: *mut u32, model: *mut u64) -> i32;
+    /// Same as hooked in libvnpu; used by the limiter daemon when no quota is set.
+    pub fn rtMemGetInfoEx(mem_info_type: u64, free: *mut usize, total: *mut usize) -> u64;
 }
 
 // RT ERROR CODE
 
@@ -2,6 +2,8 @@ pub mod worker;
 pub mod manager;
 pub mod shmem;
 pub mod externed_api;
+pub mod reporter;
+pub mod memory_report;
 use ctor::ctor;
 
 #[ctor]
 
@@ -1,8 +1,11 @@
 // Use the library crate's name to access the module
 use limiter::manager::ContainerManager;
+use limiter::memory_report;
+use limiter::reporter::UtilizationReporter;
 use limiter::shmem::{shm_setup, GlobalRegistry, LocalContainerShmem, local_shmem_name_for};
 use std::collections::BTreeSet;
 use std::thread;
+use std::time::Duration;
 use std::sync::Arc;
 use std::sync::atomic::{AtomicU64, Ordering};
 use std::os::unix::fs::PermissionsExt;
@@ -16,11 +19,19 @@ pub mod npu_limiter {
 }
 
 use npu_limiter::limiter_control_server::{LimiterControl, LimiterControlServer};
-use npu_limiter::{SetPriorityRequest, SetPriorityResponse};
+use npu_limiter::{
+    DeviceUtilization, GetUtilizationRequest, GetUtilizationResponse, MemorySnapshot,
+    SetPriorityRequest, SetPriorityResponse,
+};
 
-#[derive(Debug, Clone)]
+/// gRPC service. Keeps a handle to the shared priority atomic plus one
+/// utilization reporter per device managed by this daemon.
+#[derive(Clone)]
 pub struct LimiterControlService {
     priority_atomic: Arc<AtomicU64>,
+    reporters: Arc<Vec<(u32, Arc<UtilizationReporter>)>>,
+    /// Read-only mapping of each device's local shmem (same names as managers).
+    local_shmems: Arc<Vec<&'static LocalContainerShmem>>,
 }
 
 #[tonic::async_trait]
@@ -42,6 +53,48 @@ impl LimiterControl for LimiterControlService {
             message: format!("Priority updated to {}", new_priority),
         }))
     }
+
+    async fn get_utilization(
+        &self,
+        _request: Request<GetUtilizationRequest>,
+    ) -> Result<Response<GetUtilizationResponse>, Status> {
+        // All reporters on this daemon share the same interval/history config,
+        // so any of them can supply the header. Fall back to defaults if the
+        // daemon was launched without any devices (which shouldn't happen).
+        let (interval_ms, history_scale) = self
+            .reporters
+            .first()
+            .map(|(_, r)| (r.interval_ms(), r.history_scale()))
+            .unwrap_or((0, 0));
+
+        let devices = self
+            .reporters
+            .iter()
+            .enumerate()
+            .map(|(idx, (dev, r))| {
+                let snap = r.snapshot();
+                let m = memory_report::memory_metrics(self.local_shmems[idx], idx as i32);
+                DeviceUtilization {
+                    device_id: *dev,
+                    tracked: snap.tracked,
+                    utilization_last_interval_percent: snap.last_interval_percent,
+                    utilization_recent_windows_avg_percent: snap.recent_avg_percent,
+                    memory: Some(MemorySnapshot {
+                        limit_enforced: m.limit_enforced,
+                        total_mb: m.total_mb,
+                        used_mb: m.used_mb,
+                        free_mb: m.free_mb,
+                    }),
+                }
+            })
+            .collect();
+
+        Ok(Response::new(GetUtilizationResponse {
+            interval_ms,
+            history_scale,
+            devices,
+        }))
+    }
 }
 
 fn parse_visible_devices() -> Vec<u32> {
@@ -82,29 +135,61 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
     // Shared priority across all devices in this container
     let priority_atomic = Arc::new(AtomicU64::new(0.0f64.to_bits()));
 
+    // One utilization reporter per device. The gRPC handler reads snapshots
+    // from these Arcs; each reporter thread is started inside its device's
+    // manager thread (where we open the shared memory).
+    let mut reporters: Vec<(u32, Arc<UtilizationReporter>)> = Vec::with_capacity(devices.len());
+    for dev in &devices {
+        reporters.push((*dev, UtilizationReporter::from_env()));
+    }
+
     let mut handles = Vec::new();
-    for dev in devices {
+    for (dev_idx, dev) in devices.iter().enumerate() {
         let global_path = format!("{}_dev{}", global_base, dev);
-        let local_path = local_shmem_name_for(dev);
+        let local_path = local_shmem_name_for(*dev);
         let pid_for_thread = pid;
         let p_atomic = priority_atomic.clone();
+        let reporter = reporters[dev_idx].1.clone();
 
         let handle = thread::spawn(move || {
-            // 1. 创建共享内存 (拥有内存的绝对控制权)
-            // 只有 Manager 有权限使用 create_shmem
-            let global_reg = shm_setup::open_global_registry::<GlobalRegistry>(&global_path);
+            // 1. Open shared memory. `open_global_registry` returns a
+            //    `&'static mut GlobalRegistry`; the reporter needs a shared
+            //    `&'static GlobalRegistry` view of the same mapping, and the
+            //    manager (further below) consumes the same shared reference.
+            //    All fields are atomics, so aliasing as `&` is sound.
+            let global_reg_mut: &'static mut GlobalRegistry =
+                shm_setup::open_global_registry::<GlobalRegistry>(&global_path);
+            let global_reg: &'static GlobalRegistry =
+                unsafe { &*(global_reg_mut as *const GlobalRegistry) };
             let local_shm = shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());
 
-            // 2. 初始化 Manager (调用 new)
-            let mut manager = ContainerManager::new(global_reg, local_shm, pid_for_thread as i32, p_atomic);
+            // 2. Start the utilization reporter. It locates its slot by
+            //    matching on pid, which is written during `ContainerManager::new`
+            //    just below, so a brief retry inside the reporter is expected.
+            reporter.start(global_reg, pid_for_thread);
 
-            // 3. 进入主循环，开始不断调度和分发 Token
-            // 这行代码会阻塞线程，直到进程被 Kill
+            // 3. Initialize the manager and enter its scheduling loop.
+            let mut manager =
+                ContainerManager::new(global_reg, local_shm, pid_for_thread as i32, p_atomic);
             manager.run();
         });
         handles.push(handle);
     }
 
+    // Map each device's local shmem for quota / used reads (managers created it above).
+    let local_shmems: Vec<&'static LocalContainerShmem> = devices
+        .iter()
+        .map(|dev| {
+            let path = local_shmem_name_for(*dev);
+            loop {
+                if let Some(ptr) = shm_setup::try_open_shmem::<LocalContainerShmem>(&path) {
+                    break unsafe { &*(ptr as *const LocalContainerShmem) };
+                }
+                thread::sleep(Duration::from_millis(20));
+            }
+        })
+        .collect();
+
     // Start gRPC server on UDS if configured
     let uds_path = std::env::var("NPU_LIMITER_UDS_PATH").unwrap_or_else(|_| "/tmp/npu_limiter.sock".to_string());
 
@@ -122,9 +207,13 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
     let uds_stream = UnixListenerStream::new(uds);
 
     log::info!("[Daemon] Starting gRPC control server on UDS: {}", uds_path);
-    
-    let service = LimiterControlService { priority_atomic };
-    
+
+    let service = LimiterControlService {
+        priority_atomic,
+        reporters: Arc::new(reporters),
+        local_shmems: Arc::new(local_shmems),
+    };
+
     let reflection_service = tonic_reflection::server::Builder::configure()
         .register_encoded_file_descriptor_set(npu_limiter::FILE_DESCRIPTOR_SET)
         .build_v1()?;
@@ -141,4 +230,4 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
     }
 
     Ok(())
-}
+}
@@ -0,0 +1,79 @@
+//! Memory metrics for gRPC reporting: quota view from `LocalContainerShmem` when
+//! `NPU_MEM_QUOTA` is set; otherwise `rtMemGetInfoEx` on the device (same source
+//! the hook uses when quota is not enforced).
+
+use crate::externed_api::{rtMemGetInfoEx, rtSetDevice};
+use crate::shmem::LocalContainerShmem;
+use std::sync::atomic::Ordering;
+use std::sync::{Mutex, OnceLock};
+
+const MB: u64 = 1024 * 1024;
+/// Matches the hook's `rtMemGetInfoEx` first argument when not using quota.
+const MEM_INFO_TYPE: u64 = 0;
+
+/// Serialize runtime memory queries so `rtSetDevice` / `rtMemGetInfoEx` do not
+/// race with other threads using the runtime.
+fn rt_mem_lock() -> std::sync::MutexGuard<'static, ()> {
+    static LOCK: OnceLock<Mutex<()>> = OnceLock::new();
+    LOCK.get_or_init(|| Mutex::new(()))
+        .lock()
+        .expect("memory report lock poisoned")
+}
+
+#[derive(Debug, Clone, Copy)]
+pub struct MemoryMetrics {
+    pub limit_enforced: bool,
+    pub total_mb: u64,
+    pub used_mb: u64,
+    pub free_mb: u64,
+}
+
+/// `logical_device` is the index among visible devices (0, 1, …), passed to
+/// `rtSetDevice` when falling back to the runtime.
+pub fn memory_metrics(local: &LocalContainerShmem, logical_device: i32) -> MemoryMetrics {
+    let limit = local.memory_limit.load(Ordering::Relaxed);
+    if limit > 0 {
+        let total_mb = limit / MB;
+        let used = local.memory_used.load(Ordering::Acquire);
+        let used_mb = used / MB;
+        let free_mb = total_mb.saturating_sub(used_mb);
+        return MemoryMetrics {
+            limit_enforced: true,
+            total_mb,
+            used_mb,
+            free_mb,
+        };
+    }
+
+    let _g = rt_mem_lock();
+    unsafe {
+        if rtSetDevice(logical_device) != 0 {
+            return MemoryMetrics {
+                limit_enforced: false,
+                total_mb: 0,
+                used_mb: 0,
+                free_mb: 0,
+            };
+        }
+        let mut free = 0usize;
+        let mut total = 0usize;
+        let rc = rtMemGetInfoEx(MEM_INFO_TYPE, &mut free, &mut total);
+        if rc != 0 {
+            return MemoryMetrics {
+                limit_enforced: false,
+                total_mb: 0,
+                used_mb: 0,
+                free_mb: 0,
+            };
+        }
+        let total_mb = (total as u64) / MB;
+        let free_mb = (free as u64) / MB;
+        let used_mb = total_mb.saturating_sub(free_mb);
+        MemoryMetrics {
+            limit_enforced: false,
+            total_mb,
+            used_mb,
+            free_mb,
+        }
+    }
+}
Original file line number	Diff line number	Diff line change
`@@ -13,6 +13,8 @@ unsafe extern "C" {`
`13`	`13`	`pub fn rtGetDevicePhyIdByIndex(device_index: u32, phy_device: *mut u32) -> i32;`
`14`	`14`	`pub fn rtDeviceSynchronize() -> i32;`
`15`	`15`	`pub fn rtStreamGetCaptureInfo(stream: u64, status: mut u32, model: mut u64) -> i32;`
	`16`	`+ /// Same as hooked in libvnpu; used by the limiter daemon when no quota is set.`
	`17`	`+ pub fn rtMemGetInfoEx(mem_info_type: u64, free: mut usize, total: mut usize) -> u64;`
`16`	`18`	`}`
`17`	`19`
`18`	`20`	`// RT ERROR CODE`