Skip to content

Commit 3bdf4b4

Browse files
committed
scripts/rdna4/windows: Gemma 4 26B-A4B + 256K + Vulkan + MoE offload
Adds the Windows bring-up scripts for running Gemma 4 26B-A4B-it at full 256K context on an AMD Ryzen 7 9800X3D + RX 9700 XT (gfx1201, 16 GB). The model cannot fit fully on 16 GB VRAM (UD-Q4_K_XL is 17.1 GB). The strategy is to push MoE expert FFN tensors (~14 GB of ~17 GB) to CPU via -ncmoe 30, leaving only ~2.7 GB of non-expert weights on GPU. With the compute buffer (~8.3 GB at -b 128 -ub 64) and turbo3 KV cache (~1.05 GB at 256K), total GPU occupation lands at ~12.5 GB. The non-obvious flag is -b 128 -ub 64. The compute buffer is batch-driven, not context-driven — the default -b 2048 allocates a ~32 GB Metal buffer at 256K and instantly OOMs. Validated empirically on M4 Max Mac: 128K at -b 256 ub=128 takes 8.4 GB, 256K at -b 128 ub=64 takes 8.3 GB. Batch size dominates, context does not. Vulkan FA path requires symmetric K/V (op->src[1]->type == op->src[2]->type in ggml-vulkan.cpp:15371), so asymmetric q8_0/turbo3 is not available on this backend — only symmetric turbo3/turbo3. Scripts: build.ps1 — cmake configure + build with -DGGML_VULKAN=ON check-prereqs.ps1 — MSVC, Vulkan SDK, driver, GPU, RAM, model file run-gemma4-256k.ps1 — llama-server with the locked-in flag set run-claude-code.ps1 — Claude Code wrapper with env + KV-cache guardrail README.md — budget analysis, risks, troubleshooting
1 parent 0e11555 commit 3bdf4b4

File tree

5 files changed

+365
-0
lines changed

5 files changed

+365
-0
lines changed

scripts/rdna4/windows/README.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# RDNA 4 Windows — Gemma 4 26B-A4B @ 256K
2+
3+
Target hardware: AMD Ryzen 7 9800X3D + 32 GB DDR5 + RX 9700 XT (gfx1201, 16 GB).
4+
Target model: `unsloth/gemma-4-26B-A4B-it-GGUF` at `UD-Q4_K_XL` (17.1 GB).
5+
Target context: 256 K tokens.
6+
7+
## Bring-up
8+
9+
1. Install prerequisites (one-time):
10+
- **Visual Studio 2022 Build Tools** with "Desktop development with C++" workload
11+
- **CMake ≥ 3.25**: `winget install Kitware.CMake`
12+
- **Git**: `winget install Git.Git`
13+
- **Vulkan SDK**: https://vulkan.lunarg.com (installer sets `VULKAN_SDK`)
14+
- **AMD Adrenalin 25.x** driver: https://amd.com/support
15+
- **Node.js LTS** (for Claude Code): `winget install OpenJS.NodeJS.LTS`
16+
17+
Reboot after the driver install.
18+
19+
2. Open **"x64 Native Tools Command Prompt for VS 2022"**, launch PowerShell inside it (`pwsh` or `powershell`), then:
20+
21+
```powershell
22+
cd path\to\llama-cpp-turboquant
23+
.\scripts\rdna4\windows\check-prereqs.ps1 # verifies tools, driver, GPU, RAM, model
24+
.\scripts\rdna4\windows\build.ps1 # configures + builds Vulkan backend
25+
```
26+
27+
3. Download the model GGUF (17.1 GB) to `C:\models\gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf` or set `TQ_MODEL_PATH`.
28+
29+
4. Launch the server:
30+
```powershell
31+
.\scripts\rdna4\windows\run-gemma4-256k.ps1
32+
```
33+
34+
5. In a second shell, launch Claude Code:
35+
```powershell
36+
.\scripts\rdna4\windows\run-claude-code.ps1
37+
```
38+
39+
## Memory budget (why these flags matter)
40+
41+
### GPU VRAM — 16 GB target
42+
43+
| Component | Size | Notes |
44+
|---|---|---|
45+
| Non-expert model tensors | ~2.7 GB | attention Q/K/V/O, embeddings, lm_head, norms, MoE routers |
46+
| KV cache turbo3 @ 256K | ~1.05 GB | 1.0 GB global (5 layers × 256K × D=512) + 49 MB SWA (25 layers × 1280 cells) |
47+
| Compute buffer (`-b 128 -ub 64`) | ~8.3 GB | Batch-driven; does NOT scale with context. Default `-b 2048` allocates ~32 GB and instant-OOMs |
48+
| Vulkan driver overhead | ~0.5 GB | |
49+
| **Total** | **~12.5 GB** | ~3.5 GB headroom |
50+
51+
### System RAM — 32 GB target
52+
53+
| Component | Size | Notes |
54+
|---|---|---|
55+
| Expert REPACK (CPU backend) | ~8.2 GB | Q4_K expert weights reformatted for CPU-efficient access |
56+
| mmap hot working set | ~9 GB | File-backed; OS pages cold experts |
57+
| OS + drivers + desktop | ~5 GB | |
58+
| Claude Code / VSCode / browser | ~3 GB | Close heavy Electron apps during runs |
59+
| **Total active** | **~25 GB** | Tight. `--mlock` keeps experts resident |
60+
61+
## Critical flags — do not change without understanding
62+
63+
- `-b 128 -ub 64`: dropping below is fine, raising breaks VRAM budget. At `-b 256` compute buffer grows to ~16 GB (OOM).
64+
- `-ctk turbo3 -ctv turbo3`: Vulkan FA path rejects mixed K/V types (`op->src[1]->type != op->src[2]->type` → false). You cannot use asymmetric q8_0/turbo3 on Vulkan — that only works on Metal/CUDA.
65+
- `-ncmoe 30`: all 30 layers' experts go to CPU. Without this, the 14 GB of expert FFN weights force-push model into VRAM and OOM.
66+
- `--mlock`: mandatory on Windows at 32 GB. Prevents pagefile swaps of expert weights. Without it, a single page fault during decode can stall the token by seconds.
67+
- `--no-warmup`: prevents an empty-batch allocation spike that can OOM at tight VRAM.
68+
69+
## Expected performance (projected)
70+
71+
Decode: 25-50 tok/s (CPU MoE bandwidth-bound, DDR5-6000 ~80 GB/s).
72+
Prefill at 256K: 1-3 minutes (first-token latency, cached afterward).
73+
74+
Reference Mac M4 Max same config (full GPU, no CPU MoE offload):
75+
- pp2048: 710 tok/s
76+
- tg64 @ d=0: 49 tok/s
77+
- tg64 @ d=128K: 21 tok/s
78+
79+
The Mac numbers set a ceiling. Windows Vulkan scalar-path performance will be lower than Metal, and CPU MoE adds bandwidth overhead, so expect 40-70 % of Mac decode speed in practice.
80+
81+
## Known risks
82+
83+
1. **Vulkan turbo3 at D=256 / D=512 is untested on Gemma 4.** The code paths exist (`flash_attn.comp`, `flash_attn_cm1.comp`, `flash_attn_cm2.comp` all compile turbo3_0 pipeline variants via `DATA_A_TURBO3_0`; FA dispatch accepts `GGML_TYPE_TURBO3_0` for any `HSK % 8 == 0`). Metal was explicitly validated by commit `716dd77`. Vulkan was not. First smoke test after build should verify generation quality; garbled output means fall back to q8_0 and accept smaller context.
84+
85+
2. **MoE CPU offload on AMD Vulkan is unmeasured.** The `-ncmoe` flag is backend-agnostic in theory. On Mac it works functionally but regresses speed badly because Metal GPU > Mac CPU for FFN. On Windows the opposite should hold — Vulkan GPU cannot hold the experts anyway, so CPU is the only option, and Zen 5 + DDR5 + 3D V-Cache is a strong CPU MoE target.
86+
87+
3. **32 GB RAM is tight.** Active working set projects to ~25 GB. Close VSCode, Chrome, Slack, etc. before runs. A desktop with ~8 GB wired for OS leaves you at 24 GB for the inference process.
88+
89+
## Troubleshooting
90+
91+
- **OOM at load time**: check `-b / -ub`. Default is way too aggressive.
92+
- **OOM during warmup**: add `--no-warmup` (already on by default in our script).
93+
- **Garbled output at turbo3**: verify with `-ctk q8_0 -ctv q8_0` and smaller context. If q8_0 works and turbo3 doesn't, the Vulkan D=256/512 turbo3 path has a correctness issue — report upstream.
94+
- **Slow decode (< 10 tok/s)**: check `--mlock` took effect (Task Manager → Performance → Memory → Committed). If pagefile is growing, `--mlock` is silently failing due to quota; grant "Lock pages in memory" user right via `gpedit.msc` → Computer Configuration → Windows Settings → Security Settings → Local Policies → User Rights Assignment.
95+
- **Claude Code KV cache thrashing**: verify `CLAUDE_CODE_ATTRIBUTION_HEADER: "0"` is in `%USERPROFILE%\.claude\settings.json`. Without this, decode speed drops ~90 %.

scripts/rdna4/windows/build.ps1

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
#!/usr/bin/env pwsh
2+
# Build llama-cpp-turboquant with Vulkan backend for RDNA 4 (RX 9700 XT / gfx1201).
3+
# Run from "x64 Native Tools Command Prompt for VS 2022".
4+
5+
$ErrorActionPreference = "Stop"
6+
7+
$repoRoot = Resolve-Path (Join-Path $PSScriptRoot "..\..\..")
8+
$buildDir = Join-Path $repoRoot "build-vulkan"
9+
10+
if (-not $env:VULKAN_SDK -or -not (Test-Path $env:VULKAN_SDK)) {
11+
Write-Host "VULKAN_SDK not set or path missing." -ForegroundColor Red
12+
Write-Host "Install from https://vulkan.lunarg.com and restart the shell." -ForegroundColor Yellow
13+
exit 1
14+
}
15+
16+
Write-Host "Configuring cmake (Vulkan, Release)..." -ForegroundColor Cyan
17+
cmake -B $buildDir -S $repoRoot `
18+
-DCMAKE_BUILD_TYPE=Release `
19+
-DGGML_VULKAN=ON `
20+
-DGGML_NATIVE=ON `
21+
-DLLAMA_BUILD_TESTS=OFF `
22+
-DLLAMA_BUILD_EXAMPLES=OFF `
23+
-DLLAMA_BUILD_TOOLS=ON
24+
if ($LASTEXITCODE -ne 0) { exit $LASTEXITCODE }
25+
26+
Write-Host "Building..." -ForegroundColor Cyan
27+
cmake --build $buildDir --config Release --target llama-server llama-cli llama-bench --parallel
28+
if ($LASTEXITCODE -ne 0) { exit $LASTEXITCODE }
29+
30+
$llamaServer = Join-Path $buildDir "bin\Release\llama-server.exe"
31+
if (Test-Path $llamaServer) {
32+
Write-Host ""
33+
Write-Host "Built: $llamaServer" -ForegroundColor Green
34+
& $llamaServer --version 2>&1 | Select-Object -First 3
35+
} else {
36+
Write-Host "Build succeeded but llama-server.exe not found at expected path." -ForegroundColor Red
37+
exit 1
38+
}
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
#!/usr/bin/env pwsh
2+
# Preflight check for RDNA 4 Vulkan build + Gemma 4 run.
3+
# Verifies toolchain, driver, VRAM, model file, system RAM.
4+
# Run from "x64 Native Tools Command Prompt for VS 2022" to get MSVC env.
5+
6+
$ErrorActionPreference = "Stop"
7+
8+
$ok = $true
9+
function Check($label, $test, $fix) {
10+
Write-Host -NoNewline " $label ... "
11+
try {
12+
$result = & $test
13+
if ($result) {
14+
Write-Host "ok" -ForegroundColor Green
15+
return $true
16+
} else {
17+
Write-Host "MISSING" -ForegroundColor Red
18+
Write-Host " fix: $fix" -ForegroundColor Yellow
19+
$script:ok = $false
20+
return $false
21+
}
22+
} catch {
23+
Write-Host "ERROR: $_" -ForegroundColor Red
24+
Write-Host " fix: $fix" -ForegroundColor Yellow
25+
$script:ok = $false
26+
return $false
27+
}
28+
}
29+
30+
Write-Host "=== RDNA 4 / Vulkan / Gemma 4 preflight ===" -ForegroundColor Cyan
31+
Write-Host ""
32+
33+
Write-Host "Toolchain:"
34+
Check "MSVC cl.exe" { (Get-Command cl -ErrorAction SilentlyContinue) -ne $null } "Open 'x64 Native Tools Command Prompt for VS 2022', or install VS 2022 Build Tools with C++ workload"
35+
Check "cmake >= 3.25" {
36+
$v = (cmake --version 2>&1 | Select-Object -First 1) -replace 'cmake version ', ''
37+
[version]$v -ge [version]"3.25.0"
38+
} "winget install Kitware.CMake (then restart shell)"
39+
Check "git" { (Get-Command git -ErrorAction SilentlyContinue) -ne $null } "winget install Git.Git"
40+
Check "VULKAN_SDK env" { $env:VULKAN_SDK -ne $null -and (Test-Path $env:VULKAN_SDK) } "Install from https://vulkan.lunarg.com (Windows installer sets VULKAN_SDK automatically)"
41+
Check "glslc shader compiler" { (Get-Command glslc -ErrorAction SilentlyContinue) -ne $null } "Reinstall Vulkan SDK and ensure %VULKAN_SDK%\Bin is in PATH"
42+
43+
Write-Host ""
44+
Write-Host "GPU / Driver:"
45+
Check "vulkaninfo" { (Get-Command vulkaninfo -ErrorAction SilentlyContinue) -ne $null } "Install AMD Adrenalin 25.x driver from amd.com/support"
46+
Check "RX 9700 XT detected" {
47+
$info = vulkaninfo --summary 2>$null | Out-String
48+
$info -match "Radeon RX 9700 XT" -or $info -match "gfx1201"
49+
} "Check AMD Adrenalin driver is installed and GPU is seated"
50+
51+
Write-Host ""
52+
Write-Host "System:"
53+
$totalRam = [math]::Round((Get-CimInstance Win32_ComputerSystem).TotalPhysicalMemory / 1GB, 1)
54+
Check "RAM >= 32 GB (have: $totalRam GB)" { $totalRam -ge 31 } "Gemma 4 26B expert offload needs ~22 GB working set"
55+
56+
Write-Host ""
57+
Write-Host "Model file:"
58+
$defaultModel = if ($env:TQ_MODEL_PATH) { $env:TQ_MODEL_PATH } else { "C:\models\gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf" }
59+
Check "GGUF present ($defaultModel)" { Test-Path $defaultModel } "Download from https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF (UD-Q4_K_XL, ~17.1 GB) and place at the path above, or set TQ_MODEL_PATH"
60+
61+
Write-Host ""
62+
Write-Host "Build output:"
63+
$repoRoot = Resolve-Path (Join-Path $PSScriptRoot "..\..\..")
64+
$llamaServer = Join-Path $repoRoot "build-vulkan\bin\Release\llama-server.exe"
65+
Check "llama-server.exe built" { Test-Path $llamaServer } "Run .\scripts\rdna4\windows\build.ps1 first"
66+
67+
Write-Host ""
68+
if ($ok) {
69+
Write-Host "All checks passed." -ForegroundColor Green
70+
exit 0
71+
} else {
72+
Write-Host "One or more checks failed. See messages above." -ForegroundColor Red
73+
exit 1
74+
}
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
#!/usr/bin/env pwsh
2+
# Start Claude Code CLI against the local llama-server with Gemma 4.
3+
# Assumes run-gemma4-256k.ps1 is already running on the configured port.
4+
5+
$ErrorActionPreference = "Stop"
6+
7+
$port = if ($env:TQ_PORT) { $env:TQ_PORT } else { "8001" }
8+
$host_ip = if ($env:TQ_HOST) { $env:TQ_HOST } else { "127.0.0.1" }
9+
$server = "http://$host_ip`:$port"
10+
11+
# --- Sanity check server is up ---
12+
try {
13+
$null = Invoke-WebRequest -Uri "$server/health" -TimeoutSec 2 -UseBasicParsing
14+
Write-Host "llama-server reachable at $server" -ForegroundColor Green
15+
} catch {
16+
Write-Host "Cannot reach llama-server at $server" -ForegroundColor Red
17+
Write-Host "Start it first: .\scripts\rdna4\windows\run-gemma4-256k.ps1" -ForegroundColor Yellow
18+
exit 1
19+
}
20+
21+
# --- Client env ---
22+
$env:ANTHROPIC_BASE_URL = $server
23+
$env:ANTHROPIC_AUTH_TOKEN = "local"
24+
$env:ANTHROPIC_API_KEY = ""
25+
$env:ANTHROPIC_MODEL = "gemma-4-26b"
26+
27+
# --- KV cache preservation ---
28+
# Claude Code adds an attribution header by default. Every request with a new
29+
# header permutation invalidates the prompt cache on the server side. Setting
30+
# this to 0 prevents a ~90% decode-speed regression.
31+
# (Also needs CLAUDE_CODE_ATTRIBUTION_HEADER=0 in %USERPROFILE%\.claude\settings.json.)
32+
$env:CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC = "1"
33+
34+
$settingsPath = Join-Path $env:USERPROFILE ".claude\settings.json"
35+
if (-not (Test-Path $settingsPath)) {
36+
Write-Host "WARNING: $settingsPath missing." -ForegroundColor Yellow
37+
Write-Host " Create it with: { `"env`": { `"CLAUDE_CODE_ATTRIBUTION_HEADER`": `"0`" } }" -ForegroundColor Yellow
38+
} else {
39+
$settings = Get-Content $settingsPath -Raw | ConvertFrom-Json
40+
$attrib = $settings.env.CLAUDE_CODE_ATTRIBUTION_HEADER
41+
if ($attrib -ne "0") {
42+
Write-Host "WARNING: CLAUDE_CODE_ATTRIBUTION_HEADER not set to '0' in settings.json" -ForegroundColor Yellow
43+
Write-Host " Without this, Claude Code invalidates the KV cache on every request." -ForegroundColor Yellow
44+
}
45+
}
46+
47+
# --- Launch ---
48+
Write-Host "Launching Claude Code..." -ForegroundColor Cyan
49+
npx -y "@anthropic-ai/claude-code@latest" @args
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
#!/usr/bin/env pwsh
2+
# Launch llama-server with Gemma 4 26B-A4B-it at 256K context on RDNA 4 Vulkan.
3+
#
4+
# Config rationale (see scripts/rdna4/windows/README.md for full analysis):
5+
#
6+
# -ctk turbo3 -ctv turbo3 Symmetric required — Vulkan FA path does not
7+
# support mixed K/V quant types. turbo3 is the
8+
# only turbo flavor with Vulkan shaders.
9+
#
10+
# -fa on Flash attention. Scalar + cm1 + cm2 FA shaders
11+
# all ship turbo3 pipeline variants.
12+
#
13+
# -ngl 99 Offload all layers' structural tensors to GPU
14+
# (attention, embeddings, norms).
15+
#
16+
# -ncmoe 30 Override expert FFN tensors on all 30 MoE
17+
# layers back to CPU. Gemma 4 26B-A4B has ~14 GB
18+
# of expert weights, which blows the 16 GB VRAM
19+
# budget. Experts compute on CPU, attention on
20+
# GPU, activations cross via small PCIe copies.
21+
#
22+
# -c 262144 Full 256K context.
23+
#
24+
# -b 128 -ub 64 CRITICAL. Compute buffer is batch-driven, not
25+
# ctx-driven. Default (-b 2048 -ub 512) allocates
26+
# ~32 GB of compute buffer and instant-OOMs on
27+
# 16 GB VRAM. b=128 ub=64 keeps it at ~8 GB.
28+
#
29+
# --mlock Pin expert weights in RAM. Prevents Windows
30+
# from paging experts to disk under memory
31+
# pressure. Without this, per-token decode can
32+
# stall for seconds.
33+
#
34+
# --no-warmup Skip the empty-batch warmup pass. Warmup can
35+
# trigger OOM on tight VRAM budgets.
36+
#
37+
# --threads 16 Zen 5 9800X3D = 8P/16T. CPU-side expert FFN
38+
# is memory-bandwidth bound; more threads past
39+
# 8 rarely help but don't hurt.
40+
#
41+
# VRAM budget target (16 GB RX 9700 XT):
42+
# non-expert model ~2.7 GB
43+
# KV cache turbo3 ~1.0 GB (1 GB global + 49 MB SWA)
44+
# compute buffer ~8.3 GB
45+
# driver overhead ~0.5 GB
46+
# TOTAL ~12.5 GB (~3.5 GB headroom)
47+
#
48+
# System RAM target (32 GB):
49+
# expert REPACK ~8.2 GB (CPU backend reformats experts)
50+
# mmap hot set ~9.0 GB
51+
# OS + apps ~8.0 GB
52+
# TOTAL ~25.2 GB (close Electron apps during runs)
53+
54+
$ErrorActionPreference = "Stop"
55+
56+
# --- Paths ---
57+
$repoRoot = Resolve-Path (Join-Path $PSScriptRoot "..\..\..")
58+
$llamaServer = Join-Path $repoRoot "build-vulkan\bin\Release\llama-server.exe"
59+
$defaultModel = "C:\models\gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf"
60+
61+
# --- Config (override via env vars) ---
62+
$model = if ($env:TQ_MODEL_PATH) { $env:TQ_MODEL_PATH } else { $defaultModel }
63+
$port = if ($env:TQ_PORT) { $env:TQ_PORT } else { "8001" }
64+
$host_ip = if ($env:TQ_HOST) { $env:TQ_HOST } else { "127.0.0.1" }
65+
$ctx = if ($env:TQ_CTX) { $env:TQ_CTX } else { "262144" }
66+
$batch = if ($env:TQ_BATCH) { $env:TQ_BATCH } else { "128" }
67+
$ubatch = if ($env:TQ_UBATCH) { $env:TQ_UBATCH } else { "64" }
68+
$ctk = if ($env:TQ_CTK) { $env:TQ_CTK } else { "turbo3" }
69+
$ctv = if ($env:TQ_CTV) { $env:TQ_CTV } else { "turbo3" }
70+
$threads = if ($env:TQ_THREADS) { $env:TQ_THREADS } else { "16" }
71+
72+
# --- Preflight ---
73+
if (-not (Test-Path $llamaServer)) {
74+
Write-Host "llama-server.exe not found at: $llamaServer" -ForegroundColor Red
75+
Write-Host "Run .\scripts\rdna4\windows\build.ps1 first." -ForegroundColor Yellow
76+
exit 1
77+
}
78+
if (-not (Test-Path $model)) {
79+
Write-Host "Model not found at: $model" -ForegroundColor Red
80+
Write-Host "Download the GGUF and set TQ_MODEL_PATH, or place at default path." -ForegroundColor Yellow
81+
exit 1
82+
}
83+
84+
# --- Summary ---
85+
Write-Host "=== Gemma 4 26B-A4B @ 256K (Vulkan + MoE CPU offload) ===" -ForegroundColor Cyan
86+
Write-Host " model : $model"
87+
Write-Host " context : $ctx tokens"
88+
Write-Host " KV cache : K=$ctk V=$ctv (symmetric required for Vulkan FA)"
89+
Write-Host " batch/ubatch : $batch / $ubatch (DO NOT raise — compute buffer OOMs)"
90+
Write-Host " MoE offload : all 30 expert layers on CPU"
91+
Write-Host " listen : http://$host_ip`:$port"
92+
Write-Host ""
93+
94+
# --- Launch ---
95+
& $llamaServer `
96+
-m $model `
97+
--alias "gemma-4-26b" `
98+
-ctk $ctk -ctv $ctv `
99+
-fa on `
100+
-ngl 99 `
101+
-ncmoe 30 `
102+
-c $ctx `
103+
-b $batch -ub $ubatch `
104+
--mlock `
105+
--no-warmup `
106+
--threads $threads `
107+
-np 1 `
108+
--host $host_ip --port $port `
109+
@args

0 commit comments

Comments
 (0)