Add Gemma 4 RTX 4090 backend helpers by adybag14-cyber · Pull Request #152 · Luce-Org/lucebox-hub

adybag14-cyber · 2026-05-11T10:31:22Z

adybag14-cyber · 2026-05-11T10:31:55Z

RTX 4090 Gemma 4 running at 60 tk/s 31b it abliterated

cubic-dev-ai

3 issues found across 4 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/Start-LuceboxGemma4090.ps1">

<violation number="1" location="scripts/Start-LuceboxGemma4090.ps1:6">
P2: Hard-coding the default repo path to one workstation makes this helper fail by default on other machines.</violation>
</file>

<file name="scripts/verify_gemma4_4090.py">

<violation number="1" location="scripts/verify_gemma4_4090.py:117">
P2: `--runs` is unchecked, so 0/negative values can make aggregation crash on an empty result set.</violation>
</file>

<file name="scripts/lucebox-gemma4-4090.sh">

<violation number="1" location="scripts/lucebox-gemma4-4090.sh:62">
P2: Readiness timeout can be bypassed because the health probe has no curl timeout, so a single stalled request can block `wait_ready()` indefinitely.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-05-11T10:34:04Z

+    [string] $Command = 'Start',
+
+    [string] $Distro = '',
+    [string] $RepoPath = '/mnt/c/Users/adyba/src/lucebox-hub',


P2: Hard-coding the default repo path to one workstation makes this helper fail by default on other machines.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At scripts/Start-LuceboxGemma4090.ps1, line 6: <comment>Hard-coding the default repo path to one workstation makes this helper fail by default on other machines.</comment> <file context> @@ -0,0 +1,66 @@ + [string] $Command = 'Start', + + [string] $Distro = '', + [string] $RepoPath = '/mnt/c/Users/adyba/src/lucebox-hub', + [int] $WaitSeconds = 300 +) </file context>

cubic-dev-ai · 2026-05-11T10:34:04Z

+    parser = argparse.ArgumentParser()
+    parser.add_argument("--base-url", default="http://127.0.0.1:18191")
+    parser.add_argument("--threshold", type=float, default=60.0)
+    parser.add_argument("--runs", type=int, default=3)


P2: --runs is unchecked, so 0/negative values can make aggregation crash on an empty result set.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At scripts/verify_gemma4_4090.py, line 117: <comment>`--runs` is unchecked, so 0/negative values can make aggregation crash on an empty result set.</comment> <file context> @@ -0,0 +1,162 @@ + parser = argparse.ArgumentParser() + parser.add_argument("--base-url", default="http://127.0.0.1:18191") + parser.add_argument("--threshold", type=float, default=60.0) + parser.add_argument("--runs", type=int, default=3) + parser.add_argument("--n-predict", type=int, default=256) + parser.add_argument("--wait", type=float, default=300.0) </file context>

cubic-dev-ai · 2026-05-11T10:34:04Z

+}
+
+health() {
+    curl -fsS "$(url)/health"


P2: Readiness timeout can be bypassed because the health probe has no curl timeout, so a single stalled request can block wait_ready() indefinitely.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At scripts/lucebox-gemma4-4090.sh, line 62: <comment>Readiness timeout can be bypassed because the health probe has no curl timeout, so a single stalled request can block `wait_ready()` indefinitely.</comment> <file context> @@ -0,0 +1,194 @@ +} + +health() { + curl -fsS "$(url)/health" +} + </file context>

davide221 · 2026-05-11T10:35:11Z

@adybag14-cyber thanks for your contribution! @dusterbloom can you take a look at this

dusterbloom · 2026-05-11T13:12:38Z

Thanks for the contribution and the time you put into the scripts.

We can't take this one as-is. A few specific reasons:

The Gemma 4 path in this repo is dflash/, not libllama. The PR description says "DFlash runtime in this repository is still the Qwen/Laguna research path; Gemma 4 uses libllama" — that isn't accurate. dflash/src/gemma4_target_graph.cpp, dflash/src/gemma4_mtp_graph.cpp, dflash/src/gemma4_dflash_graph.cpp, and dflash/src/gemma4_target_loader.cpp are the active Gemma 4 implementation. They power our current Gemma 4 26B-A4B benches up to 1M context with MTP γ=2 — see .sisyphus/notes/gemma4-baseline/mtp-gamma/. The scripts in this PR set up a parallel path through llama-server and bypass that work rather than building on it.

The submodule pin is intentional and non-negotiable. dflash/deps/llama.cpp is pinned to our feature/tq3-kv-cache-clean branch because it carries TQ3_0 KV quantization, the graph-level FWHT contract, sparse FA, and the chunked attention path that the rest of the codebase depends on. CI builds against that pin; benchmarks reference it. We can't replace it with an arbitrary third-party llama.cpp build per script.

The scripts hardcode paths from a workstation we don't have access to — /mnt/c/Users/adyba/... and /home/tdamre/src/llama.cpp-mtp-pr22673/build-mtp-cuda124-speed-faall/bin/llama-server. We can't ship those, and we can't CI them. The same goes for the --spec-draft-n-max 4 claim presented as "the measured stable MTP window for this 31B target on the RTX 4090" — no log is cited, and our own γ-sweep on this target (γ ∈ {1, 2, 4, 8} at 4K/16K/64K, RTX 3090) shows γ=2 as the winner past short contexts; γ=4 regresses. Different GPU, different KV layout, possibly different result — but it would need a log to ground the claim.

We're moving toward a declarative config layout — a configs/backends/ registry for alternate llama.cpp builds and configs/profiles/ for per-machine deployment configs with measurement provenance — which is the shape we'd take a contribution like this in. Work-in-progress at #155 — same shape we'd take a re-submission in. If you're still interested once that lands, a re-submission as a small profile + backend descriptor would be the natural way back in.

cubic-dev-ai

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/probe_gemma4_context.py">

<violation number="1" location="scripts/probe_gemma4_context.py:155">
P2: Threshold validation can falsely pass when every run lacks a numeric `predicted_per_second` metric.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-05-11T14:44:11Z

+        "cache_type_v": args.cache_type_v,
+        "threshold": args.threshold,
+        "all_ok": all(r["ok"] for r in results),
+        "all_ge_threshold": (all(rate >= args.threshold for rate in rates) if args.threshold > 0 else None),


P2: Threshold validation can falsely pass when every run lacks a numeric predicted_per_second metric.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At scripts/probe_gemma4_context.py, line 155: <comment>Threshold validation can falsely pass when every run lacks a numeric `predicted_per_second` metric.</comment> <file context> @@ -0,0 +1,176 @@ + "cache_type_v": args.cache_type_v, + "threshold": args.threshold, + "all_ok": all(r["ok"] for r in results), + "all_ge_threshold": (all(rate >= args.threshold for rate in rates) if args.threshold > 0 else None), + "min_predicted_per_second": min(rates) if rates else None, + "avg_predicted_per_second": (sum(rates) / len(rates) if rates else None), </file context>

Add Gemma 4 RTX 4090 backend helpers

009564c

cubic-dev-ai Bot reviewed May 11, 2026

View reviewed changes

dusterbloom mentioned this pull request May 11, 2026

WIP: configs/profiles + configs/backends — declarative deployment configuration #155

Draft

4 tasks

Tune Gemma 4 q8 KV launcher

181969d

cubic-dev-ai Bot reviewed May 11, 2026

View reviewed changes

tdarei added 2 commits May 11, 2026 17:39

Fix Gemma 4 PowerShell launcher quoting

6db7de0

Document Gemma 4 higher context limits

be18e25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemma 4 RTX 4090 backend helpers#152

Add Gemma 4 RTX 4090 backend helpers#152
adybag14-cyber wants to merge 4 commits into
Luce-Org:mainfrom
adybag14-cyber:main

adybag14-cyber commented May 11, 2026

Uh oh!

adybag14-cyber commented May 11, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Uh oh!

cubic-dev-ai Bot May 11, 2026

Uh oh!

cubic-dev-ai Bot May 11, 2026

Uh oh!

davide221 commented May 11, 2026 •

edited

Loading

Uh oh!

dusterbloom commented May 11, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adybag14-cyber commented May 11, 2026

Uh oh!

adybag14-cyber commented May 11, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

davide221 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dusterbloom commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davide221 commented May 11, 2026 •

edited

Loading

dusterbloom commented May 11, 2026 •

edited

Loading