Hami update complete by mehryar72 · Pull Request #1 · Project-HAMi/hami-vnpu-core

mehryar72 · 2026-04-21T17:25:42Z

this branch , holds features such as dynamic NPU priority change and utilization report.

…n are uses, as woker are expetec to be in waiting already, so no need to wait longer Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>

Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>

hami-robot · 2026-04-21T17:25:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mehryar72

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot

Pull request overview

This PR extends the NPU limiter/virtualization stack with a control-plane (gRPC over Unix socket) to dynamically update priority at runtime and to report per-device utilization/memory, plus adds scripts/docs to run vLLM and multi-process load tests under HAMI/VNPU.

Changes:

Add gRPC UDS server to the limiter daemon for SetPriority and GetUtilization (with reflection), plus utilization + memory reporting internals.
Update worker/manager shared-memory handling for multi-device scenarios (per-physical-device local shmem naming, device-change handling).
Add runnable scripts (vLLM interactive + multi-process torch test) and expanded docs/README for debugging and gRPC usage.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
script/mab/run_vllm_interactive.sh	New script to launch limiter + vLLM in a container and run an interactive streaming benchmark client.
script/mab/run_hami_mab_multi.sh	New script to launch limiter and run multiple NPU worker processes for load testing.
script/mab/run_hami_mab.sh	Exposes limiter gRPC UDS path and makes shared dir configurable via env override.
script/mab/multi_process_test.py	New multi-process PyTorch NPU inference loop used by the multi runner.
script/mab/clean_up.sh	Extends cleanup to stop new container name pattern and remove additional logs/registry files.
docs/limiter-debug-analysis.md	New debugging/analysis doc describing limiter “no progress” behavior and intended improvements.
crates/limiter/src/worker.rs	Refactors `SchedulerClient` init/lifecycle for multi-device support and device-change-safe teardown.
crates/limiter/src/shmem.rs	Adds `try_open_shmem` and introduces per-physical-device local SHM naming helper.
crates/limiter/src/reporter.rs	New utilization reporter thread that tracks baton ownership and computes rolling utilization.
crates/limiter/src/memory_report.rs	New memory metrics helper (quota-based when enforced, runtime query otherwise).
crates/limiter/src/manager.rs	Makes priority dynamically readable (atomic) and tweaks scheduling constants/behavior.
crates/limiter/src/main.rs	Adds gRPC control server over UDS and spawns per-device manager + reporter threads.
crates/limiter/src/lib.rs	Exposes new `reporter` and `memory_report` modules.
crates/limiter/src/externed_api.rs	Adds missing runtime externs used for device detection and resource destruction.
crates/limiter/build.rs	Adds tonic-build proto compilation + descriptor set generation.
crates/limiter/Cargo.toml	Adds tonic/tokio/prost and build-dependency for proto compilation.
crates/hook/src/hook.rs	Adds `rtSetDevice` hook + HCCL stream-op hooks; replaces lazy_static-based limiter singleton.
README.md	Documents gRPC dynamic priority and utilization/memory reporting usage via grpcurl.
.gitignore	Adds a repository-wide ignore list (including a `*.proto` rule).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T17:34:44Z

+
+    let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
+    tonic_build::configure()
+        .file_descriptor_set_path(out_dir.join("limiter_descriptor.bin"))
+        .compile_protos(&["proto/limiter.proto"], &["proto"])?;
+


tonic_build::compile_protos is configured to compile proto/limiter.proto, but there is no crates/limiter/proto/ directory (and no limiter.proto) in this branch, so the crate will not build. Add the missing proto file/directory (and ensure it’s committed), or update compile_protos to point at the correct existing location.

Suggested change

let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());

tonic_build::configure()

.file_descriptor_set_path(out_dir.join("limiter_descriptor.bin"))

.compile_protos(&["proto/limiter.proto"], &["proto"])?;

let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());

let manifest_dir = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap());

let proto_dir = manifest_dir.join("../../proto");

let limiter_proto = proto_dir.join("limiter.proto");

tonic_build::configure()

.file_descriptor_set_path(out_dir.join("limiter_descriptor.bin"))

.compile_protos(&[limiter_proto], &[proto_dir])?;

Copilot · 2026-04-21T17:34:44Z

+                if !saw_progress && start_time.elapsed() > Duration::from_micros(50) {
                    debug!("[Manager] --- no progress for 5ms, enter STATE_MEASURING early");
                    break;


The no-progress cutoff is now Duration::from_micros(50) (50µs) but the log message still says "5ms". 50µs is orders of magnitude smaller than the previous behavior and is likely to trigger false "no progress" early exits under normal kernel launch latency. Consider restoring a ms-scale threshold (or making it configurable via an env var as described in the docs) and update the log message to match the actual unit/value.

Copilot · 2026-04-21T17:34:44Z

+        if new_priority <= 0.0 {
+            return Err(Status::invalid_argument("Priority must be > 0"));
+        }
+
+        self.priority_atomic.store(new_priority.to_bits(), Ordering::Relaxed);
+        log::info!("Updated priority via gRPC to: {}", new_priority);
+


set_priority only checks new_priority <= 0.0. This accepts NaN / +inf (because NaN <= 0.0 is false), which would then be stored into the shared atomic and can break token math (clamp, casts, rounding). Validate new_priority.is_finite() (and optionally enforce a reasonable upper bound) before storing it.

Suggested change

if new_priority <= 0.0 {

return Err(Status::invalid_argument("Priority must be > 0"));

}

self.priority_atomic.store(new_priority.to_bits(), Ordering::Relaxed);

log::info!("Updated priority via gRPC to: {}", new_priority);

if !new_priority.is_finite() || new_priority <= 0.0 {

return Err(Status::invalid_argument("Priority must be a finite value > 0"));

}

self.priority_atomic.store(new_priority.to_bits(), Ordering::Relaxed);

log::info!("Updated priority via gRPC to: {}", new_priority);

Copilot · 2026-04-21T17:34:44Z

+#   --host-port <int>           Host port for vLLM (default: 9000+id)
+#   --tokens <int>              Max tokens for each run (default: 2000)
+#   --label <string>            Label shown in benchmark output
+#   --image <string>            Container image id/tag (default set below)
+#   --ready-timeout <int>       Seconds to wait for vLLM readiness (default: 45)
+#
+# Controls inside the benchmark:
+#   s/b : start or restart generation
+#   p   : pause current generation and show averages
+#   q   : quit (also tears down the container)
+
+usage() {
+    cat <<EOF
+Usage: $0 --id <ID> --model <MODEL_PATH> --rtv <LIST> --tp <N> --mem-quota <MB> --priority <PERCENT> [options]
+
+Required:
+  --id <ID>            Unique instance id (also sets default ports)
+  --model <PATH>       Host path to model directory
+  --rtv <LIST>         ASCEND_RT_VISIBLE_DEVICES list (e.g. 0,1)
+  --tp <N>             Tensor parallel size
+  --mem-quota <MB>     NPU_MEM_QUOTA value (MB)
+  --priority <PERCENT> NPU_PRIORITY / share percent
+
+Optional:
+  --host-port <PORT>   Host port for vLLM (default: 9000+ID)
+  --tokens <N>         Max tokens per run (default: 2000)
+  --label <TEXT>       Label shown in benchmark output
+  --image <TAG>        Container image (default: registry-cbu.huawei.com/ascend/vllm-ascend:v0.10.1rc1)
+  --ready-timeout <S>  Seconds to wait for vLLM to be ready (default: 45)
+  -h|--help            Show this help


The help text says --host-port defaults to 9000+ID and --ready-timeout defaults to 45, but the actual defaults are HOST_VLLM_PORT=$((9600 + ID)) and READY_TIMEOUT=300. Update the usage/comments (or the code) so the documented defaults match runtime behavior.

Copilot · 2026-04-21T17:34:45Z

+export NPU_PRIORITY="$NPU_PRIORITY"
+export NPU_MEM_QUOTA="$NPU_MEM_QUOTA"
+export NPU_GLOBAL_SHM_PATH="$GLOBAL_SHM_PATH"
+export NPU_LOCAL_SHM_NAME="vnpu_local_session_${ID}"


NPU_LOCAL_SHM_NAME is used as the base name for shm_open in shmem::shm_setup::{create_shmem,open_shmem}. On Linux, shm_open expects names to start with /; using vnpu_local_session_${ID} may fail with EINVAL. Consider prefixing the value with / here (or normalizing inside local_shmem_name_for / create_shmem).

Suggested change

export NPU_LOCAL_SHM_NAME="vnpu_local_session_${ID}"

export NPU_LOCAL_SHM_NAME="/vnpu_local_session_${ID}"

Copilot · 2026-04-21T17:34:45Z

+## Implemented Improvements
+
+1. **Manager identifier in logs** – All log lines include `[Manager #N]` where N is the global slot index.
+2. **Configurable no-progress threshold** – Set `NPU_NO_PROGRESS_MS` (default 5). For vLLM, try e.g. `NPU_NO_PROGRESS_MS=50`.
+3. **Tiered log levels** – Use `RUST_LOG` to control verbosity:


This doc says the no-progress threshold is configurable via NPU_NO_PROGRESS_MS (default 5), but the current code in crates/limiter/src/manager.rs uses a hard-coded Duration::from_micros(50) and doesn’t read NPU_NO_PROGRESS_MS. Either implement the env var as documented, or update this section to match actual behavior.

Copilot · 2026-04-21T17:34:45Z

+    // Map each device's local shmem for quota / used reads (managers created it above).
+    let local_shmems: Vec<&'static LocalContainerShmem> = devices
+        .iter()
+        .map(|dev| {
+            let path = local_shmem_name_for(*dev);
+            loop {
+                if let Some(ptr) = shm_setup::try_open_shmem::<LocalContainerShmem>(&path) {
+                    break unsafe { &*(ptr as *const LocalContainerShmem) };
+                }
+                thread::sleep(Duration::from_millis(20));
            }


The loop that maps local_shmems retries forever until try_open_shmem succeeds. If a manager thread panics or SHM creation fails, the daemon will hang at startup with no error. Add a timeout / max retries (and surface an actionable error) so failures don’t turn into an infinite wait.

Copilot · 2026-04-21T17:34:45Z

 sudo rm -f ${HOST_PROJECT_DIR}/inst*_manager.log
 sudo rm -f ${HOST_PROJECT_DIR}/inst*_app.log
+sudo rm -f ${HOST_PROJECT_DIR}/vllm_*.log
+sudo rm -f ${HOST_PROJECT_DIR}/limiter_*.log


clean_up.sh removes inst*_app.log, but the new multi-process runner writes ${LOG_PREFIX}_apps.log (plural) via tee. Add that pattern (and consider quoting the paths) so the cleanup script actually removes logs produced by the new scripts.

Suggested change

sudo rm -f ${HOST_PROJECT_DIR}/inst*_manager.log

sudo rm -f ${HOST_PROJECT_DIR}/inst*_app.log

sudo rm -f ${HOST_PROJECT_DIR}/vllm_*.log

sudo rm -f ${HOST_PROJECT_DIR}/limiter_*.log

sudo rm -f "${HOST_PROJECT_DIR}"/inst*_manager.log

sudo rm -f "${HOST_PROJECT_DIR}"/inst*_app.log

sudo rm -f "${HOST_PROJECT_DIR}"/inst*_apps.log

sudo rm -f "${HOST_PROJECT_DIR}"/vllm_*.log

sudo rm -f "${HOST_PROJECT_DIR}"/limiter_*.log

Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>

… used. Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>

Copilot

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T19:40:59Z

 #![allow(non_snake_case)]
-use lazy_static::lazy_static;
 use limiter::worker::SchedulerClient;
 use once_cell;


use once_cell; appears unused (the file uses once_cell::sync::Lazy directly). If the crate has #![deny(warnings)] in CI, this will fail the build; otherwise it’s still noise. Remove the unused import.

Suggested change

use once_cell;

Copilot · 2026-04-21T19:41:00Z

        local.memory_limit = AtomicU64::new(memory_limit_bytes);
        local.memory_used = AtomicU64::new(0);


local.memory_limit = AtomicU64::new(...) and local.memory_used = AtomicU64::new(...) replace the AtomicU64 objects via plain (non-atomic) writes. If any other thread reads these fields with atomic ops (e.g., gRPC GetUtilization calling memory_metrics) concurrently, this is a data race/UB under Rust's memory model. Prefer initializing/updating these fields via store(...) on the existing atomics instead of assigning new AtomicU64 values.

Suggested change

local.memory_limit = AtomicU64::new(memory_limit_bytes);

local.memory_used = AtomicU64::new(0);

local.memory_limit.store(memory_limit_bytes, Ordering::Relaxed);

local.memory_used.store(0, Ordering::Relaxed);

Copilot · 2026-04-21T19:41:00Z

+            thread::sleep(OPEN_LOCAL_SHMEM_POLL);
        };
-
-    let local_path = local_shmem_name();
-    // 1.Create SHM
-    let global_reg = shm_setup::open_global_registry::<GlobalRegistry>(&global_path);
-    let local_shm = shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());
+        local_shmems.push(unsafe { &*(ptr as *const LocalContainerShmem) });
+    }


try_open_shmem returns &'static mut LocalContainerShmem, but the result is later cast to &LocalContainerShmem and stored. Creating a shared reference while the original &mut exists (even briefly) is undefined behavior. Prefer using raw pointers for the mapping (or change try_open_shmem to return a raw pointer/shared ref) so you never materialize an &mut when only read access is needed.

Copilot · 2026-04-21T19:41:00Z

+    let raw = std::env::var("ASCEND_RT_VISIBLE_DEVICES").unwrap_or_else(|_| "0".to_string());
+    let mut set = BTreeSet::new();
+    for v in raw.split(',').filter_map(|s| s.trim().parse::<u32>().ok()) {
+        set.insert(v);
+    }


Using a BTreeSet here deduplicates and sorts ASCEND_RT_VISIBLE_DEVICES, which can change the visible-device ordering. If the runtime uses env order for logical device indices, this can make idx as i32 (used later for rtSetDevice in memory reporting) refer to a different physical device than the returned device_id. Preserve the original env order instead of sorting.

Copilot · 2026-04-21T19:41:00Z

+            // 1. Open shared memory. `open_global_registry` returns a
+            //    `&'static mut GlobalRegistry`; the reporter needs a shared
+            //    `&'static GlobalRegistry` view of the same mapping, and the
+            //    manager (further below) consumes the same shared reference.
+            //    All fields are atomics, so aliasing as `&` is sound.
+            let global_reg_mut: &'static mut GlobalRegistry =
+                shm_setup::open_global_registry::<GlobalRegistry>(&global_path);
+            let global_reg: &'static GlobalRegistry =
+                unsafe { &*(global_reg_mut as *const GlobalRegistry) };
+            let local_shm = shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());
+
+            // 2. Start the utilization reporter. It locates its slot by
+            //    matching on pid, which is written during `ContainerManager::new`
+            //    just below, so a brief retry inside the reporter is expected.
+            reporter.start(
+                global_reg,
+                unsafe { &*(local_shm as *const LocalContainerShmem) },
+                pid_for_thread,
+            );
+
+            // 3. Initialize the manager and enter its scheduling loop.
+            let mut manager =
+                ContainerManager::new(global_reg, local_shm, pid_for_thread as i32, p_atomic);


The current pattern keeps an &'static mut GlobalRegistry (global_reg_mut) alive while also creating an &'static GlobalRegistry pointing at the same mapping. Holding & and &mut aliases to the same memory is undefined behavior in Rust, even if you only access atomics. Consider changing open_global_registry/create_shmem call sites to work with raw pointers (or return a shared reference from the API) so you never have overlapping &mut + & lifetimes for the same region.

Suggested change

// 1. Open shared memory. `open_global_registry` returns a

// `&'static mut GlobalRegistry`; the reporter needs a shared

// `&'static GlobalRegistry` view of the same mapping, and the

// manager (further below) consumes the same shared reference.

// All fields are atomics, so aliasing as `&` is sound.

let global_reg_mut: &'static mut GlobalRegistry =

shm_setup::open_global_registry::<GlobalRegistry>(&global_path);

let global_reg: &'static GlobalRegistry =

unsafe { &*(global_reg_mut as *const GlobalRegistry) };

let local_shm = shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());

// 2. Start the utilization reporter. It locates its slot by

// matching on pid, which is written during `ContainerManager::new`

// just below, so a brief retry inside the reporter is expected.

reporter.start(

global_reg,

unsafe { &*(local_shm as *const LocalContainerShmem) },

pid_for_thread,

);

// 3. Initialize the manager and enter its scheduling loop.

let mut manager =

ContainerManager::new(global_reg, local_shm, pid_for_thread as i32, p_atomic);

// 1. Open shared memory, but convert the returned mutable

// references to raw pointers immediately so we do not keep

// overlapping `&mut` and `&` bindings alive for the same

// mapped region.

let global_reg_ptr =

shm_setup::open_global_registry::<GlobalRegistry>(&global_path)

as *mut GlobalRegistry;

let local_shm_ptr =

shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str())

as *mut LocalContainerShmem;

// 2. Start the utilization reporter. It locates its slot by

// matching on pid, which is written during `ContainerManager::new`

// just below, so a brief retry inside the reporter is expected.

reporter.start(

unsafe { &*global_reg_ptr },

unsafe { &*local_shm_ptr },

pid_for_thread,

);

// 3. Initialize the manager and enter its scheduling loop.

let mut manager = unsafe {

ContainerManager::new(

&*global_reg_ptr,

&mut *local_shm_ptr,

pid_for_thread as i32,

p_atomic,

)

};

Copilot · 2026-04-21T19:41:01Z

+            let local_shm = shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());
+
+            // 2. Start the utilization reporter. It locates its slot by
+            //    matching on pid, which is written during `ContainerManager::new`
+            //    just below, so a brief retry inside the reporter is expected.
+            reporter.start(
+                global_reg,
+                unsafe { &*(local_shm as *const LocalContainerShmem) },
+                pid_for_thread,
+            );
+
+            // 3. Initialize the manager and enter its scheduling loop.
+            let mut manager =
+                ContainerManager::new(global_reg, local_shm, pid_for_thread as i32, p_atomic);


local_shm is an &'static mut LocalContainerShmem, but this passes a shared &LocalContainerShmem to the reporter via pointer cast while the mutable reference is still live. Holding & and &mut aliases to the same region is undefined behavior in Rust. Consider using raw pointers for the mapping (or changing the shmem APIs) so you never have overlapping mutable/shared references.

Suggested change

let local_shm = shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());

// 2. Start the utilization reporter. It locates its slot by

// matching on pid, which is written during `ContainerManager::new`

// just below, so a brief retry inside the reporter is expected.

reporter.start(

global_reg,

unsafe { &*(local_shm as *const LocalContainerShmem) },

pid_for_thread,

);

// 3. Initialize the manager and enter its scheduling loop.

let mut manager =

ContainerManager::new(global_reg, local_shm, pid_for_thread as i32, p_atomic);

// Use separate mappings for the reporter and manager so we never

// create overlapping `&LocalContainerShmem` / `&mut LocalContainerShmem`

// references to the same Rust object.

let local_shm_for_manager =

shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());

let local_shm_for_reporter =

shm_setup::open_shmem::<LocalContainerShmem>(local_path.as_str());

// 2. Start the utilization reporter. It locates its slot by

// matching on pid, which is written during `ContainerManager::new`

// just below, so a brief retry inside the reporter is expected.

reporter.start(global_reg, local_shm_for_reporter, pid_for_thread);

// 3. Initialize the manager and enter its scheduling loop.

let mut manager = ContainerManager::new(

global_reg,

local_shm_for_manager,

pid_for_thread as i32,

p_atomic,

);

Copilot · 2026-04-21T19:41:01Z

+    // Soften permissions so host users can access it without sudo
+    if let Ok(mut perms) = std::fs::metadata(&uds_path).map(|m| m.permissions()) {
+        perms.set_mode(0o777);


The UDS socket is chmod'd to 0o777, making it world-writable. That allows any local user on the host to change container priority and query utilization if they can access the mount path. Consider defaulting to a less permissive mode (e.g., 0o660) and/or gating the relaxation behind an env var so operators can choose the risk level.

Suggested change

// Soften permissions so host users can access it without sudo

if let Ok(mut perms) = std::fs::metadata(&uds_path).map(|m| m.permissions()) {

perms.set_mode(0o777);

// Default to owner/group access; allow world-writable only as an explicit opt-in.

let uds_mode = match std::env::var("NPU_LIMITER_UDS_WORLD_WRITABLE") {

Ok(value) if value == "1" || value.eq_ignore_ascii_case("true") => 0o777,

_ => 0o660,

};

if let Ok(mut perms) = std::fs::metadata(&uds_path).map(|m| m.permissions()) {

perms.set_mode(uds_mode);

Copilot · 2026-04-21T19:41:01Z

+    -v $HOST_PROJECT_DIR:$CONTAINER_PROJECT_DIR \
+    -v /mnt/nvme0/LLMs/:/models \
+    --name "$CONTAINER_NAME" \
+    -e LD_PRELOAD=$LIBRARY_PATH \


This sets LD_PRELOAD=$LIBRARY_PATH at the container level, so the limiter daemon itself (started later) will run under the hook library. That can cause the manager process to initialize SchedulerClient unexpectedly (e.g., via rtSetDevice / rtMemGetInfoEx in gRPC memory reporting) and register as a worker in local shmem. Recommend removing container-wide LD_PRELOAD and only setting it for the workload processes.

Suggested change

-e LD_PRELOAD=$LIBRARY_PATH \

Mehryara723 added 5 commits April 22, 2026 01:23

decrease 50ms wait to 50 microssecond , before passing lck if no toke…

d9d3539

…n are uses, as woker are expetec to be in waiting already, so no need to wait longer Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>

multiprocess test added

2ed4394

Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>

added multidevice support, teh desing could be different.

2a30a68

Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>

merge commit

fa815c4

Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>

added a grpc api for getting utilisation and meminfo in json

aa1ee1e

Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>

mehryar72 requested review from ICI-Infra-RTV and Copilot April 21, 2026 17:25

mehryar72 self-assigned this Apr 21, 2026

hami-robot Bot added the dco-signoff: no label Apr 21, 2026

hami-robot Bot added the size/XXL label Apr 21, 2026

Copilot started reviewing on behalf of mehryar72 April 21, 2026 17:28 View session

mehryar72 force-pushed the hami-update-complete branch from 9ece591 to aa1ee1e Compare April 21, 2026 17:31

hami-robot Bot added dco-signoff: yes and removed dco-signoff: no labels Apr 21, 2026

Copilot AI reviewed Apr 21, 2026

View reviewed changes

hami-robot Bot added dco-signoff: no and removed dco-signoff: yes labels Apr 21, 2026

Mehryara723 added 2 commits April 22, 2026 03:24

copilot review fixes

2402956

Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>

change calultiopn of utilisation to ignore windows that no tokens are…

51d29f4

… used. Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>

mehryar72 force-pushed the hami-update-complete branch from a877601 to 51d29f4 Compare April 21, 2026 19:32

hami-robot Bot added dco-signoff: yes and removed dco-signoff: no labels Apr 21, 2026

mehryar72 requested a review from Copilot April 21, 2026 19:32

Copilot started reviewing on behalf of mehryar72 April 21, 2026 19:34 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

	export NPU_LOCAL_SHM_NAME="vnpu_local_session_${ID}"
	export NPU_LOCAL_SHM_NAME="/vnpu_local_session_${ID}"

-sudo rm -f ${HOST_PROJECT_DIR}/inst*_manager.log
-sudo rm -f ${HOST_PROJECT_DIR}/inst*_app.log
-sudo rm -f ${HOST_PROJECT_DIR}/vllm_*.log
-sudo rm -f ${HOST_PROJECT_DIR}/limiter_*.log
+sudo rm -f "${HOST_PROJECT_DIR}"/inst*_manager.log
+sudo rm -f "${HOST_PROJECT_DIR}"/inst*_app.log
+sudo rm -f "${HOST_PROJECT_DIR}"/inst*_apps.log
+sudo rm -f "${HOST_PROJECT_DIR}"/vllm_*.log
+sudo rm -f "${HOST_PROJECT_DIR}"/limiter_*.log

		local.memory_limit = AtomicU64::new(memory_limit_bytes);
		local.memory_used = AtomicU64::new(0);

-    // Soften permissions so host users can access it without sudo
-    if let Ok(mut perms) = std::fs::metadata(&uds_path).map(|m| m.permissions()) {
-        perms.set_mode(0o777);
+    // Default to owner/group access; allow world-writable only as an explicit opt-in.
+    let uds_mode = match std::env::var("NPU_LIMITER_UDS_WORLD_WRITABLE") {
+        Ok(value) if value == "1" || value.eq_ignore_ascii_case("true") => 0o777,
+        _ => 0o660,
+    };
+    if let Ok(mut perms) = std::fs::metadata(&uds_path).map(|m| m.permissions()) {
+        perms.set_mode(uds_mode);

Conversation

mehryar72 commented Apr 21, 2026

Uh oh!

hami-robot Bot commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants