Skip to content

Hami update complete#1

Open
mehryar72 wants to merge 7 commits into
mainfrom
hami-update-complete
Open

Hami update complete#1
mehryar72 wants to merge 7 commits into
mainfrom
hami-update-complete

Conversation

@mehryar72
Copy link
Copy Markdown
Collaborator

this branch , holds features such as dynamic NPU priority change and utilization report.

…n are uses, as woker are expetec to be in waiting already, so no need to wait longer

Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>
Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>
Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>
Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>
Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>
@mehryar72 mehryar72 self-assigned this Apr 21, 2026
@hami-robot
Copy link
Copy Markdown

hami-robot Bot commented Apr 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mehryar72

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the NPU limiter/virtualization stack with a control-plane (gRPC over Unix socket) to dynamically update priority at runtime and to report per-device utilization/memory, plus adds scripts/docs to run vLLM and multi-process load tests under HAMI/VNPU.

Changes:

  • Add gRPC UDS server to the limiter daemon for SetPriority and GetUtilization (with reflection), plus utilization + memory reporting internals.
  • Update worker/manager shared-memory handling for multi-device scenarios (per-physical-device local shmem naming, device-change handling).
  • Add runnable scripts (vLLM interactive + multi-process torch test) and expanded docs/README for debugging and gRPC usage.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
script/mab/run_vllm_interactive.sh New script to launch limiter + vLLM in a container and run an interactive streaming benchmark client.
script/mab/run_hami_mab_multi.sh New script to launch limiter and run multiple NPU worker processes for load testing.
script/mab/run_hami_mab.sh Exposes limiter gRPC UDS path and makes shared dir configurable via env override.
script/mab/multi_process_test.py New multi-process PyTorch NPU inference loop used by the multi runner.
script/mab/clean_up.sh Extends cleanup to stop new container name pattern and remove additional logs/registry files.
docs/limiter-debug-analysis.md New debugging/analysis doc describing limiter “no progress” behavior and intended improvements.
crates/limiter/src/worker.rs Refactors SchedulerClient init/lifecycle for multi-device support and device-change-safe teardown.
crates/limiter/src/shmem.rs Adds try_open_shmem and introduces per-physical-device local SHM naming helper.
crates/limiter/src/reporter.rs New utilization reporter thread that tracks baton ownership and computes rolling utilization.
crates/limiter/src/memory_report.rs New memory metrics helper (quota-based when enforced, runtime query otherwise).
crates/limiter/src/manager.rs Makes priority dynamically readable (atomic) and tweaks scheduling constants/behavior.
crates/limiter/src/main.rs Adds gRPC control server over UDS and spawns per-device manager + reporter threads.
crates/limiter/src/lib.rs Exposes new reporter and memory_report modules.
crates/limiter/src/externed_api.rs Adds missing runtime externs used for device detection and resource destruction.
crates/limiter/build.rs Adds tonic-build proto compilation + descriptor set generation.
crates/limiter/Cargo.toml Adds tonic/tokio/prost and build-dependency for proto compilation.
crates/hook/src/hook.rs Adds rtSetDevice hook + HCCL stream-op hooks; replaces lazy_static-based limiter singleton.
README.md Documents gRPC dynamic priority and utilization/memory reporting usage via grpcurl.
.gitignore Adds a repository-wide ignore list (including a *.proto rule).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/hook/src/hook.rs Outdated
Comment thread crates/limiter/build.rs
Comment on lines +6 to +11

let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
tonic_build::configure()
.file_descriptor_set_path(out_dir.join("limiter_descriptor.bin"))
.compile_protos(&["proto/limiter.proto"], &["proto"])?;

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tonic_build::compile_protos is configured to compile proto/limiter.proto, but there is no crates/limiter/proto/ directory (and no limiter.proto) in this branch, so the crate will not build. Add the missing proto file/directory (and ensure it’s committed), or update compile_protos to point at the correct existing location.

Suggested change
let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
tonic_build::configure()
.file_descriptor_set_path(out_dir.join("limiter_descriptor.bin"))
.compile_protos(&["proto/limiter.proto"], &["proto"])?;
let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
let manifest_dir = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap());
let proto_dir = manifest_dir.join("../../proto");
let limiter_proto = proto_dir.join("limiter.proto");
tonic_build::configure()
.file_descriptor_set_path(out_dir.join("limiter_descriptor.bin"))
.compile_protos(&[limiter_proto], &[proto_dir])?;

Copilot uses AI. Check for mistakes.
Comment thread crates/limiter/src/manager.rs Outdated
Comment on lines 309 to 311
if !saw_progress && start_time.elapsed() > Duration::from_micros(50) {
debug!("[Manager] --- no progress for 5ms, enter STATE_MEASURING early");
break;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The no-progress cutoff is now Duration::from_micros(50) (50µs) but the log message still says "5ms". 50µs is orders of magnitude smaller than the previous behavior and is likely to trigger false "no progress" early exits under normal kernel launch latency. Consider restoring a ms-scale threshold (or making it configurable via an env var as described in the docs) and update the log message to match the actual unit/value.

Copilot uses AI. Check for mistakes.
Comment thread crates/limiter/src/main.rs Outdated
Comment on lines +44 to +50
if new_priority <= 0.0 {
return Err(Status::invalid_argument("Priority must be > 0"));
}

self.priority_atomic.store(new_priority.to_bits(), Ordering::Relaxed);
log::info!("Updated priority via gRPC to: {}", new_priority);

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_priority only checks new_priority <= 0.0. This accepts NaN / +inf (because NaN <= 0.0 is false), which would then be stored into the shared atomic and can break token math (clamp, casts, rounding). Validate new_priority.is_finite() (and optionally enforce a reasonable upper bound) before storing it.

Suggested change
if new_priority <= 0.0 {
return Err(Status::invalid_argument("Priority must be > 0"));
}
self.priority_atomic.store(new_priority.to_bits(), Ordering::Relaxed);
log::info!("Updated priority via gRPC to: {}", new_priority);
if !new_priority.is_finite() || new_priority <= 0.0 {
return Err(Status::invalid_argument("Priority must be a finite value > 0"));
}
self.priority_atomic.store(new_priority.to_bits(), Ordering::Relaxed);
log::info!("Updated priority via gRPC to: {}", new_priority);

Copilot uses AI. Check for mistakes.
Comment thread script/mab/run_vllm_interactive.sh Outdated
Comment on lines +16 to +45
# --host-port <int> Host port for vLLM (default: 9000+id)
# --tokens <int> Max tokens for each run (default: 2000)
# --label <string> Label shown in benchmark output
# --image <string> Container image id/tag (default set below)
# --ready-timeout <int> Seconds to wait for vLLM readiness (default: 45)
#
# Controls inside the benchmark:
# s/b : start or restart generation
# p : pause current generation and show averages
# q : quit (also tears down the container)

usage() {
cat <<EOF
Usage: $0 --id <ID> --model <MODEL_PATH> --rtv <LIST> --tp <N> --mem-quota <MB> --priority <PERCENT> [options]

Required:
--id <ID> Unique instance id (also sets default ports)
--model <PATH> Host path to model directory
--rtv <LIST> ASCEND_RT_VISIBLE_DEVICES list (e.g. 0,1)
--tp <N> Tensor parallel size
--mem-quota <MB> NPU_MEM_QUOTA value (MB)
--priority <PERCENT> NPU_PRIORITY / share percent

Optional:
--host-port <PORT> Host port for vLLM (default: 9000+ID)
--tokens <N> Max tokens per run (default: 2000)
--label <TEXT> Label shown in benchmark output
--image <TAG> Container image (default: registry-cbu.huawei.com/ascend/vllm-ascend:v0.10.1rc1)
--ready-timeout <S> Seconds to wait for vLLM to be ready (default: 45)
-h|--help Show this help
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text says --host-port defaults to 9000+ID and --ready-timeout defaults to 45, but the actual defaults are HOST_VLLM_PORT=$((9600 + ID)) and READY_TIMEOUT=300. Update the usage/comments (or the code) so the documented defaults match runtime behavior.

Copilot uses AI. Check for mistakes.
export NPU_PRIORITY="$NPU_PRIORITY"
export NPU_MEM_QUOTA="$NPU_MEM_QUOTA"
export NPU_GLOBAL_SHM_PATH="$GLOBAL_SHM_PATH"
export NPU_LOCAL_SHM_NAME="vnpu_local_session_${ID}"
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPU_LOCAL_SHM_NAME is used as the base name for shm_open in shmem::shm_setup::{create_shmem,open_shmem}. On Linux, shm_open expects names to start with /; using vnpu_local_session_${ID} may fail with EINVAL. Consider prefixing the value with / here (or normalizing inside local_shmem_name_for / create_shmem).

Suggested change
export NPU_LOCAL_SHM_NAME="vnpu_local_session_${ID}"
export NPU_LOCAL_SHM_NAME="/vnpu_local_session_${ID}"

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +66
## Implemented Improvements

1. **Manager identifier in logs** – All log lines include `[Manager #N]` where N is the global slot index.
2. **Configurable no-progress threshold** – Set `NPU_NO_PROGRESS_MS` (default 5). For vLLM, try e.g. `NPU_NO_PROGRESS_MS=50`.
3. **Tiered log levels** – Use `RUST_LOG` to control verbosity:
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc says the no-progress threshold is configurable via NPU_NO_PROGRESS_MS (default 5), but the current code in crates/limiter/src/manager.rs uses a hard-coded Duration::from_micros(50) and doesn’t read NPU_NO_PROGRESS_MS. Either implement the env var as documented, or update this section to match actual behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +179 to 189
// Map each device's local shmem for quota / used reads (managers created it above).
let local_shmems: Vec<&'static LocalContainerShmem> = devices
.iter()
.map(|dev| {
let path = local_shmem_name_for(*dev);
loop {
if let Some(ptr) = shm_setup::try_open_shmem::<LocalContainerShmem>(&path) {
break unsafe { &*(ptr as *const LocalContainerShmem) };
}
thread::sleep(Duration::from_millis(20));
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop that maps local_shmems retries forever until try_open_shmem succeeds. If a manager thread panics or SHM creation fails, the daemon will hang at startup with no error. Add a timeout / max retries (and surface an actionable error) so failures don’t turn into an infinite wait.

Copilot uses AI. Check for mistakes.
Comment thread script/mab/clean_up.sh Outdated
Comment on lines +24 to +27
sudo rm -f ${HOST_PROJECT_DIR}/inst*_manager.log
sudo rm -f ${HOST_PROJECT_DIR}/inst*_app.log
sudo rm -f ${HOST_PROJECT_DIR}/vllm_*.log
sudo rm -f ${HOST_PROJECT_DIR}/limiter_*.log
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean_up.sh removes inst*_app.log, but the new multi-process runner writes ${LOG_PREFIX}_apps.log (plural) via tee. Add that pattern (and consider quoting the paths) so the cleanup script actually removes logs produced by the new scripts.

Suggested change
sudo rm -f ${HOST_PROJECT_DIR}/inst*_manager.log
sudo rm -f ${HOST_PROJECT_DIR}/inst*_app.log
sudo rm -f ${HOST_PROJECT_DIR}/vllm_*.log
sudo rm -f ${HOST_PROJECT_DIR}/limiter_*.log
sudo rm -f "${HOST_PROJECT_DIR}"/inst*_manager.log
sudo rm -f "${HOST_PROJECT_DIR}"/inst*_app.log
sudo rm -f "${HOST_PROJECT_DIR}"/inst*_apps.log
sudo rm -f "${HOST_PROJECT_DIR}"/vllm_*.log
sudo rm -f "${HOST_PROJECT_DIR}"/limiter_*.log

Copilot uses AI. Check for mistakes.
Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>
… used.

Signed-off-by: m84396954 <mehryar.abbasi1@h-partners.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/hook/src/hook.rs
#![allow(non_snake_case)]
use lazy_static::lazy_static;
use limiter::worker::SchedulerClient;
use once_cell;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use once_cell; appears unused (the file uses once_cell::sync::Lazy directly). If the crate has #![deny(warnings)] in CI, this will fail the build; otherwise it’s still noise. Remove the unused import.

Suggested change
use once_cell;

Copilot uses AI. Check for mistakes.
Comment on lines 96 to 97
local.memory_limit = AtomicU64::new(memory_limit_bytes);
local.memory_used = AtomicU64::new(0);
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local.memory_limit = AtomicU64::new(...) and local.memory_used = AtomicU64::new(...) replace the AtomicU64 objects via plain (non-atomic) writes. If any other thread reads these fields with atomic ops (e.g., gRPC GetUtilization calling memory_metrics) concurrently, this is a data race/UB under Rust's memory model. Prefer initializing/updating these fields via store(...) on the existing atomics instead of assigning new AtomicU64 values.

Suggested change
local.memory_limit = AtomicU64::new(memory_limit_bytes);
local.memory_used = AtomicU64::new(0);
local.memory_limit.store(memory_limit_bytes, Ordering::Relaxed);
local.memory_used.store(0, Ordering::Relaxed);

Copilot uses AI. Check for mistakes.
Comment on lines +205 to +208
thread::sleep(OPEN_LOCAL_SHMEM_POLL);
};

let local_path = local_shmem_name();
// 1.Create SHM
let global_reg = shm_setup::open_global_registry::<GlobalRegistry>(&global_path);
let local_shm = shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());
local_shmems.push(unsafe { &*(ptr as *const LocalContainerShmem) });
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_open_shmem returns &'static mut LocalContainerShmem, but the result is later cast to &LocalContainerShmem and stored. Creating a shared reference while the original &mut exists (even briefly) is undefined behavior. Prefer using raw pointers for the mapping (or change try_open_shmem to return a raw pointer/shared ref) so you never materialize an &mut when only read access is needed.

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +111
let raw = std::env::var("ASCEND_RT_VISIBLE_DEVICES").unwrap_or_else(|_| "0".to_string());
let mut set = BTreeSet::new();
for v in raw.split(',').filter_map(|s| s.trim().parse::<u32>().ok()) {
set.insert(v);
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a BTreeSet here deduplicates and sorts ASCEND_RT_VISIBLE_DEVICES, which can change the visible-device ordering. If the runtime uses env order for logical device indices, this can make idx as i32 (used later for rtSetDevice in memory reporting) refer to a different physical device than the returned device_id. Preserve the original env order instead of sorting.

Copilot uses AI. Check for mistakes.
Comment on lines +161 to +183
// 1. Open shared memory. `open_global_registry` returns a
// `&'static mut GlobalRegistry`; the reporter needs a shared
// `&'static GlobalRegistry` view of the same mapping, and the
// manager (further below) consumes the same shared reference.
// All fields are atomics, so aliasing as `&` is sound.
let global_reg_mut: &'static mut GlobalRegistry =
shm_setup::open_global_registry::<GlobalRegistry>(&global_path);
let global_reg: &'static GlobalRegistry =
unsafe { &*(global_reg_mut as *const GlobalRegistry) };
let local_shm = shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());

// 2. Start the utilization reporter. It locates its slot by
// matching on pid, which is written during `ContainerManager::new`
// just below, so a brief retry inside the reporter is expected.
reporter.start(
global_reg,
unsafe { &*(local_shm as *const LocalContainerShmem) },
pid_for_thread,
);

// 3. Initialize the manager and enter its scheduling loop.
let mut manager =
ContainerManager::new(global_reg, local_shm, pid_for_thread as i32, p_atomic);
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current pattern keeps an &'static mut GlobalRegistry (global_reg_mut) alive while also creating an &'static GlobalRegistry pointing at the same mapping. Holding & and &mut aliases to the same memory is undefined behavior in Rust, even if you only access atomics. Consider changing open_global_registry/create_shmem call sites to work with raw pointers (or return a shared reference from the API) so you never have overlapping &mut + & lifetimes for the same region.

Suggested change
// 1. Open shared memory. `open_global_registry` returns a
// `&'static mut GlobalRegistry`; the reporter needs a shared
// `&'static GlobalRegistry` view of the same mapping, and the
// manager (further below) consumes the same shared reference.
// All fields are atomics, so aliasing as `&` is sound.
let global_reg_mut: &'static mut GlobalRegistry =
shm_setup::open_global_registry::<GlobalRegistry>(&global_path);
let global_reg: &'static GlobalRegistry =
unsafe { &*(global_reg_mut as *const GlobalRegistry) };
let local_shm = shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());
// 2. Start the utilization reporter. It locates its slot by
// matching on pid, which is written during `ContainerManager::new`
// just below, so a brief retry inside the reporter is expected.
reporter.start(
global_reg,
unsafe { &*(local_shm as *const LocalContainerShmem) },
pid_for_thread,
);
// 3. Initialize the manager and enter its scheduling loop.
let mut manager =
ContainerManager::new(global_reg, local_shm, pid_for_thread as i32, p_atomic);
// 1. Open shared memory, but convert the returned mutable
// references to raw pointers immediately so we do not keep
// overlapping `&mut` and `&` bindings alive for the same
// mapped region.
let global_reg_ptr =
shm_setup::open_global_registry::<GlobalRegistry>(&global_path)
as *mut GlobalRegistry;
let local_shm_ptr =
shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str())
as *mut LocalContainerShmem;
// 2. Start the utilization reporter. It locates its slot by
// matching on pid, which is written during `ContainerManager::new`
// just below, so a brief retry inside the reporter is expected.
reporter.start(
unsafe { &*global_reg_ptr },
unsafe { &*local_shm_ptr },
pid_for_thread,
);
// 3. Initialize the manager and enter its scheduling loop.
let mut manager = unsafe {
ContainerManager::new(
&*global_reg_ptr,
&mut *local_shm_ptr,
pid_for_thread as i32,
p_atomic,
)
};

Copilot uses AI. Check for mistakes.
Comment on lines +170 to +183
let local_shm = shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());

// 2. Start the utilization reporter. It locates its slot by
// matching on pid, which is written during `ContainerManager::new`
// just below, so a brief retry inside the reporter is expected.
reporter.start(
global_reg,
unsafe { &*(local_shm as *const LocalContainerShmem) },
pid_for_thread,
);

// 3. Initialize the manager and enter its scheduling loop.
let mut manager =
ContainerManager::new(global_reg, local_shm, pid_for_thread as i32, p_atomic);
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_shm is an &'static mut LocalContainerShmem, but this passes a shared &LocalContainerShmem to the reporter via pointer cast while the mutable reference is still live. Holding & and &mut aliases to the same region is undefined behavior in Rust. Consider using raw pointers for the mapping (or changing the shmem APIs) so you never have overlapping mutable/shared references.

Suggested change
let local_shm = shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());
// 2. Start the utilization reporter. It locates its slot by
// matching on pid, which is written during `ContainerManager::new`
// just below, so a brief retry inside the reporter is expected.
reporter.start(
global_reg,
unsafe { &*(local_shm as *const LocalContainerShmem) },
pid_for_thread,
);
// 3. Initialize the manager and enter its scheduling loop.
let mut manager =
ContainerManager::new(global_reg, local_shm, pid_for_thread as i32, p_atomic);
// Use separate mappings for the reporter and manager so we never
// create overlapping `&LocalContainerShmem` / `&mut LocalContainerShmem`
// references to the same Rust object.
let local_shm_for_manager =
shm_setup::create_shmem::<LocalContainerShmem>(local_path.as_str());
let local_shm_for_reporter =
shm_setup::open_shmem::<LocalContainerShmem>(local_path.as_str());
// 2. Start the utilization reporter. It locates its slot by
// matching on pid, which is written during `ContainerManager::new`
// just below, so a brief retry inside the reporter is expected.
reporter.start(global_reg, local_shm_for_reporter, pid_for_thread);
// 3. Initialize the manager and enter its scheduling loop.
let mut manager = ContainerManager::new(
global_reg,
local_shm_for_manager,
pid_for_thread as i32,
p_atomic,
);

Copilot uses AI. Check for mistakes.
Comment on lines +218 to +220
// Soften permissions so host users can access it without sudo
if let Ok(mut perms) = std::fs::metadata(&uds_path).map(|m| m.permissions()) {
perms.set_mode(0o777);
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UDS socket is chmod'd to 0o777, making it world-writable. That allows any local user on the host to change container priority and query utilization if they can access the mount path. Consider defaulting to a less permissive mode (e.g., 0o660) and/or gating the relaxation behind an env var so operators can choose the risk level.

Suggested change
// Soften permissions so host users can access it without sudo
if let Ok(mut perms) = std::fs::metadata(&uds_path).map(|m| m.permissions()) {
perms.set_mode(0o777);
// Default to owner/group access; allow world-writable only as an explicit opt-in.
let uds_mode = match std::env::var("NPU_LIMITER_UDS_WORLD_WRITABLE") {
Ok(value) if value == "1" || value.eq_ignore_ascii_case("true") => 0o777,
_ => 0o660,
};
if let Ok(mut perms) = std::fs::metadata(&uds_path).map(|m| m.permissions()) {
perms.set_mode(uds_mode);

Copilot uses AI. Check for mistakes.
-v $HOST_PROJECT_DIR:$CONTAINER_PROJECT_DIR \
-v /mnt/nvme0/LLMs/:/models \
--name "$CONTAINER_NAME" \
-e LD_PRELOAD=$LIBRARY_PATH \
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets LD_PRELOAD=$LIBRARY_PATH at the container level, so the limiter daemon itself (started later) will run under the hook library. That can cause the manager process to initialize SchedulerClient unexpectedly (e.g., via rtSetDevice / rtMemGetInfoEx in gRPC memory reporting) and register as a worker in local shmem. Recommend removing container-wide LD_PRELOAD and only setting it for the workload processes.

Suggested change
-e LD_PRELOAD=$LIBRARY_PATH \

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants