Skip to content

Fix most issues around model draft#2685

Merged
rhatdan merged 4 commits into
containers:mainfrom
mkristian:fix-model-draft
May 1, 2026
Merged

Fix most issues around model draft#2685
rhatdan merged 4 commits into
containers:mainfrom
mkristian:fix-model-draft

Conversation

@mkristian
Copy link
Copy Markdown
Contributor

@mkristian mkristian commented Apr 24, 2026

This fixes the model-draft:

  1. ramalama serve without generate
  2. ramalama serve with generate any format
  3. adding --model-draft to ramalama run as well

Does not address #2225

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds optional draft-model support end-to-end: transports resolve draft ref files and enumerate GGUF blobs, producing host→container path pairs that are threaded into Quadlet, Kube, and Compose generators which emit additional mounts/volumes for draft model blobs; CLI and tests updated to expose and validate --model-draft.

Changes

Cohort / File(s) Summary
Core generators
ramalama/compose.py, ramalama/kube.py, ramalama/quadlet.py
Add draft_model_paths parameters and handling; refactor model-volume helpers to accept explicit src/dest args; generate primary and optional draft model volumes/mounts and update volume naming and OCI vs hostPath logic.
Transport layer
ramalama/transports/base.py
Resolve draft ref file, enumerate GGUF blobs, compute per-blob host→container path pairs (raise if ref missing), skip non-gguf entries; compute and forward draft_model_paths into generation pipelines and update generator method signatures.
Stack orchestration
ramalama/stack.py
Wire args.model_draft into delegate args, include draft model mappings when present, and adjust conditional YAML generation flow and return type annotations.
Runtime plugin (CLI)
ramalama/plugins/runtimes/inference/llama_cpp.py
Register --model-draft for both run and serve commands (argument added to both parsers).
Unit tests
test/unit/test_compose.py, test/unit/test_kube.py, test/unit/test_quadlet.py
Extend test Input fixtures with draft-model fields and existence flags; mock draft-file existence; pass draft_model_paths into constructors; add parametrized cases for draft scenarios and update existing call sites.
Test fixtures / manifests
test/unit/data/test_compose/with_model_draft.yaml, test/unit/data/test_kube/draft_model_hostpath.yaml, test/unit/data/test_quadlet/draft_model/tinyllama.container
Add expected Compose, Kubernetes, and Quadlet outputs validating draft-model mounts and hostPath mounts for draft blobs.
Docs / Manpages
docs/options/model-draft.md, docs/ramalama-run.1.md.in
Document --model-draft applicability to ramalama run and add option entry to the ramalama-run manpage.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as "CLI\n(--model-draft)"
    participant Transport as "TransportBase\n(resolve draft refs)"
    participant FS as "Host FS\n(model GGUF blobs)"
    participant Generator as "Quadlet / Kube / Compose\n(generators)"
    participant Runtime as "Container / YAML\n(deployment/compose)"

    CLI->>Transport: invoke with model + optional draft arg
    Transport->>FS: locate draft ref file & list GGUF blobs
    FS-->>Transport: return blob paths (or error)
    Transport->>Generator: pass model_paths + draft_model_paths
    Generator->>Runtime: emit YAML/units/compose with per-blob mounts
    Runtime->>FS: mount host files into container runtime
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • jhjaggars
  • olliewalsh
  • maxamillion
  • mikebonnet
  • rhatdan
  • engelmi

Poem

🐇 I hop through drafts and mount each file,

GGUF blobs find homes and stack up in a pile.
Quadlet, Kube, Compose — each gets its part,
Paths stitched together like code and art.
Hooray for drafts that speed the heart!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix most issues around model draft' directly describes the main purpose of the PR—addressing multiple model-draft handling problems across serve, generate, and run commands.
Description check ✅ Passed The description is related to the changeset, specifying three concrete fixes (serve without generate, serve with any generate format, and adding --model-draft to run) and noting what is not addressed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for draft models across various deployment formats, including Docker Compose, Kubernetes, and Quadlet. It updates the initialization logic, volume generation, and command-line argument handling to accommodate draft model paths. Several critical issues were identified: the use of argument unpacking for draft_model_paths in generate_container_config can lead to TypeError exceptions, and the quadlet_kube call is missing the required model_parts argument. Additionally, a hardcoded volume name in the Kubernetes generator needs to be replaced with a dynamic variable to support draft models correctly, and several method signatures require updates to properly propagate draft model data.

Comment thread ramalama/transports/base.py Outdated
Comment thread ramalama/transports/base.py
Comment thread ramalama/kube.py Outdated
Comment thread ramalama/transports/base.py Outdated
Comment thread ramalama/transports/base.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
ramalama/compose.py (1)

25-49: ⚠️ Potential issue | 🔴 Critical

Bug: is not None check will always fire when caller passes draft_model_paths=None.

Lines 25–27 replace a None draft_model_paths with ("", ""), so self.src_draft_model_path is never actually None — it's either a non-empty string or "". Then line 48:

if self.src_draft_model_path is not None:
    volumes += self._gen_model_volume(self.src_draft_model_path, self.dest_draft_model_path)

evaluates to True even when no draft model was supplied, emitting a bogus - ":":ro" volume entry in the generated compose YAML. The existing tests (e.g. test_compose_no_port_arg at test/unit/test_compose.py:259, test_compose_no_env_vars, test_compose_no_devices) pass None but don't assert on volumes, so they don't catch this.

Use a truthiness check, consistent with the adjacent chat-template / mmproj guards:

🛠️ Proposed fix
-        # Model Volume
-        volumes += self._gen_model_volume(self.src_model_path, self.dest_model_path)
-        if self.src_draft_model_path is not None:
-            volumes += self._gen_model_volume(self.src_draft_model_path, self.dest_draft_model_path)
+        # Model Volume
+        volumes += self._gen_model_volume(self.src_model_path, self.dest_model_path)
+        if self.src_draft_model_path:
+            volumes += self._gen_model_volume(self.src_draft_model_path, self.dest_draft_model_path)

Consider also adding an assertion in one of the no-draft compose tests to guard against regressions, e.g. assert '- ":":ro"' not in result.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/compose.py` around lines 25 - 49, The guard in _gen_volumes uses "if
self.src_draft_model_path is not None" but __init__ sets draft_model_paths to
("", "") when None, so self.src_draft_model_path is never None and produces an
empty volume entry; change that check to a truthiness check (e.g. if
self.src_draft_model_path:) so _gen_model_volume is only called when a real
draft path exists, and ensure this logic is applied consistently for
src_chat_template_path and src_mmproj_path if needed; update tests (e.g.
test_compose_no_draft or test_compose_no_port_arg) to assert that '- ":":ro"'
(or equivalent empty-volume string) is not present in the generated compose
output to prevent regressions.
ramalama/quadlet.py (1)

22-39: ⚠️ Potential issue | 🔴 Critical

Type-signature mismatch for draft_model_paths[0] indexing is fragile/broken.

The original review comment is accurate. All three issues are confirmed:

  1. Signature inconsistency. Quadlet.__init__ declares draft_model_paths: Optional[list[Tuple[str, str]]], whereas Compose.__init__ and Kube.__init__ declare Optional[Tuple[str, str]]. The upstream code in ramalama/transports/base.py (lines 626–628) creates draft_model_paths as a list via _get_all_model_part_paths() or as an empty list, then passes it directly to Quadlet at line 696 but unpacks it with *draft_model_paths for Kube/Compose at lines 648/658/668. This unpacking only works safely if the list has exactly one element (or fails on empty list since the parameter is required). Please align the signature with Compose/Kube and drop the [0] index, or ensure all three generators handle list[Tuple[str, str]] consistently.

  2. Dead/erroneous code. Line 38 references the non-existent attribute self.src_draft_model_path and should be removed entirely. The variable src_draft_model_path exists only as a local at line 37; attempting to uncomment line 38 would cause a NameError.

  3. Indentation. Line 34 is indented with 3 spaces (11 characters total) while the sibling if block above (line 32) uses 4 spaces (12 characters total). This will be flagged by ruff format.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/quadlet.py` around lines 22 - 39, Quadlet.__init__ currently
mismatches Compose.__init__/Kube.__init__ for draft_model_paths and unsafely
indexes draft_model_paths[0], so either change Quadlet.__init__'s parameter to
draft_model_paths: Optional[Tuple[str,str]] (and stop indexing a list) or update
Compose/Kube to pass a list consistently; remove the dead reference to
self.src_draft_model_path (it doesn't exist) and use the local
src_draft_model_path variable only, and correct the indentation of the
model_parts else-block to use 4 spaces so formatting/ruff errors are resolved;
locate these fixes in Quadlet.__init__, and review callers that use
_get_all_model_part_paths to ensure the chosen signature is consistent.
ramalama/kube.py (1)

14-31: 🛠️ Refactor suggestion | 🟠 Major

Inconsistent draft_model_paths type across Kube / Quadlet / Compose.

Kube.__init__ declares draft_model_paths: Optional[Tuple[str, str]] (single tuple), while Quadlet.__init__ (and the logical model in base.py._get_all_model_part_paths) uses Optional[list[Tuple[str, str]]]. This is what forces the awkward *draft_model_paths unpack in ramalama/transports/base.py and limits Kube to single-part draft models. Prefer Optional[list[Tuple[str, str]]] and iterate in _gen_volumes to emit one model_draft_<n> volume per part, mirroring how Quadlet appends each part to self.model_parts.

Separately, since draft_model_paths has no default here, every call site must pass it — that's the direct cause of the TypeError risk flagged in ramalama/transports/base.py. Consider giving it a = None default for backward compatibility while the new call sites are rolled out.

As per coding guidelines: "Use type hints in Python code and ensure mypy compatibility".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/kube.py` around lines 14 - 31, Kube.__init__ currently types
draft_model_paths as Optional[Tuple[str, str]] and requires it, causing
inconsistent handling vs Quadlet and breakage in transports; change the
signature of Kube.__init__ to draft_model_paths: Optional[list[Tuple[str, str]]]
= None, adjust self.model_paths to accept multiple draft parts (e.g., append
each tuple as 'model_draft_<n>') and update _gen_volumes to iterate over
self.model_paths['model_draft'] parts and emit one volume per part (matching how
Quadlet.__init__ and base.py._get_all_model_part_paths treat multi-part models),
and ensure callers in ramalama/transports/base.py no longer need to unpack with
*draft_model_paths.
ramalama/transports/base.py (1)

625-669: ⚠️ Potential issue | 🔴 Critical

Broken pipeline when draft_model is absent or multi-part (two critical bugs).

Two related defects here will break generation for most users:

  1. Empty list vs None semantics (lines 625-628). When self.draft_model is None, you set draft_model_paths = []. That empty list then propagates to Quadlet(...) via line 638/696. Per the Quadlet.__init__ you're calling (see ramalama/quadlet.py):

    if draft_model_paths is not None:
        src_draft_model_path, dest_draft_model_path = draft_model_paths[0]

    [] is not None, so it takes the branch and raises IndexError on [0]. Anyone running ramalama ... --generate quadlet without a draft model will crash.

  2. *draft_model_paths unpacking (lines 648, 658, 668). kube(), quadlet_kube(), and compose() all declare draft_model_paths as a single required positional parameter, and you call them with *draft_model_paths. Behavior:

    • No draft model → *[] passes zero args → TypeError: missing required argument.
    • Single-part draft → *[(s,d)] passes one tuple → works.
    • Multi-part draft → *[(s1,d1),(s2,d2)…] passes too many args → TypeError.

    Only the "single-part draft" case works. The no-draft case (by far the most common) is broken.

Both bugs are masked by the unit tests, which call Kube / Quadlet directly and skip this pipeline.

Suggested fix (use None as the sentinel and stop unpacking):

🔒️ Proposed fix
         # Get all model parts (for multi-part models)
         model_parts = self._get_all_model_part_paths(False, True, args.dryrun)
-        if self.draft_model is not None:
-            draft_model_paths = self.draft_model._get_all_model_part_paths(False, True, args.dryrun)
-        else:
-            draft_model_paths = []
+        draft_model_paths = None
+        if self.draft_model is not None:
+            draft_model_paths = self.draft_model._get_all_model_part_paths(False, True, args.dryrun)
         if args.generate.gen_type == "quadlet":
             self.quadlet(
                 (model_src_path, model_dest_path),
                 (chat_template_src_path, chat_template_dest_path),
                 (mmproj_src_path, mmproj_dest_path),
                 args,
                 exec_args,
                 args.generate.output_dir,
                 model_parts,
                 draft_model_paths,
             )
         elif args.generate.gen_type == "kube":
             self.kube(
                 (model_src_path.removeprefix("oci://"), model_dest_path),
                 (chat_template_src_path, chat_template_dest_path),
                 (mmproj_src_path, mmproj_dest_path),
                 args,
                 exec_args,
                 args.generate.output_dir,
-                *draft_model_paths,
+                draft_model_paths,
             )
         elif args.generate.gen_type == "quadlet/kube":
             self.quadlet_kube(
                 (model_src_path.removeprefix("oci://"), model_dest_path),
                 (chat_template_src_path, chat_template_dest_path),
                 (mmproj_src_path, mmproj_dest_path),
                 args,
                 exec_args,
                 args.generate.output_dir,
-                *draft_model_paths,
+                draft_model_paths,
             )
         elif args.generate.gen_type == "compose":
             self.compose(
                 (model_src_path, model_dest_path),
                 (chat_template_src_path, chat_template_dest_path),
                 (mmproj_src_path, mmproj_dest_path),
                 args,
                 exec_args,
                 args.generate.output_dir,
-                *draft_model_paths,
+                draft_model_paths,
             )

Then align the downstream signatures to accept the full list (or equivalently, a single tuple for Kube/Compose), and handle None uniformly. Note that the unpacking trick also loses multi-part draft support for KubeKube.__init__ only models one draft volume in self.model_paths['model_draft'], so if you want multi-part draft parity with Quadlet, Kube needs to accept a list of tuples and emit a volume per part, similar to the main model loop.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/transports/base.py` around lines 625 - 669, When no draft_model
exists you must use None as the sentinel (not an empty list) and stop unpacking
draft_model_paths when calling the generators; change the assignment in the
pipeline to draft_model_paths = None if self.draft_model is None, then call
self.quadlet(..., draft_model_paths) and call self.kube(..., draft_model_paths)
/ self.quadlet_kube(..., draft_model_paths) / self.compose(...,
draft_model_paths) (remove the * unpack). Update the downstream
initializers/methods (Quadlet.__init__, Kube.__init__, quadlet_kube, compose) to
accept draft_model_paths: Optional[List[Tuple[src,dest]]] and handle None (no
draft), a single-part list, or multi-part lists (for Kube emit one volume per
part instead of assuming a single tuple).
🧹 Nitpick comments (5)
ramalama/stack.py (1)

33-33: Use getattr to avoid AttributeError when model_draft is not set.

args.model_draft is only populated by the argparse action for run/serve subcommands (see llama_cpp.py:345). If Stack is ever instantiated from a path where the arg parser didn't register --model-draft, this line raises AttributeError. A getattr(args, "model_draft", None) is safer and matches the defensive style already used elsewhere in this class (e.g., getattr(args, "name", None) on line 26).

🛠️ Proposed fix
-        self.draft_model = New(args.model_draft, args) if args.model_draft is not None else None
+        model_draft = getattr(args, "model_draft", None)
+        self.draft_model = New(model_draft, args) if model_draft is not None else None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/stack.py` at line 33, Replace the direct attribute access to
args.model_draft with a safe getattr to prevent AttributeError when the argparse
parser didn't register that option: in the Stack initializer where
self.draft_model is set (the New constructor call referencing args.model_draft),
use getattr(args, "model_draft", None) so the code mirrors the defensive pattern
used elsewhere (e.g., getattr(args, "name", None)) and only constructs New if
the returned value is not None.
ramalama/compose.py (1)

22-22: Consider a default value for draft_model_paths to avoid a breaking signature change.

Adding draft_model_paths as a required positional parameter forces every existing caller of Compose(...) to be updated in lockstep (as seen at test/unit/test_compose.py:259, 272, 285 which now pass a trailing None). Defaulting to None, mirroring other optional params in this constructor, is both safer for downstream callers and more consistent with the Optional[...] type hint.

🛠️ Proposed fix
-        draft_model_paths: Optional[tuple[str, str]],
+        draft_model_paths: Optional[tuple[str, str]] = None,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/compose.py` at line 22, The constructor parameter draft_model_paths
for Compose is declared Optional[tuple[str, str]] but has no default, breaking
callers; update the Compose __init__ signature (the Compose class constructor)
to give draft_model_paths a default of None (matching other optional params) so
existing callers need not pass it, and ensure any internal uses check for None
accordingly.
ramalama/kube.py (1)

49-52: Nit: unpack .items() for clarity.

Minor readability improvement:

♻️ Proposed refactor
-        for paths in self.model_paths.items():
-            src_model_path, dest_model_path = paths[1]
-            src_model_path = src_model_path.removeprefix("oci://")
-            volume_name = paths[0]
+        for volume_name, (src_model_path, dest_model_path) in self.model_paths.items():
+            src_model_path = src_model_path.removeprefix("oci://")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/kube.py` around lines 49 - 52, The loop over
self.model_paths.items() should unpack keys and tuple values for clarity: change
the iteration to unpack volume_name and the (src_model_path, dest_model_path)
pair directly (instead of using the temporary paths variable), then apply
src_model_path.removeprefix("oci://") to the unpacked src_model_path; update any
subsequent references to use these unpacked variable names (loop header and
usages in the block where src_model_path, dest_model_path, and volume_name are
used).
ramalama/transports/base.py (2)

640-669: Inconsistent oci:// stripping across generation paths.

kube and quadlet_kube strip "oci://" from model_src_path before passing it down (lines 642, 652), but quadlet (line 631) and compose (line 662) do not. Kube._gen_volumes already calls .removeprefix("oci://") internally on each source path, so the strip at the caller is redundant for kube; for quadlet / compose, verify they do the same internally and either pick "strip at caller" or "strip in callee" consistently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/transports/base.py` around lines 640 - 669, The call sites for kube,
quadlet_kube, and compose are inconsistent about stripping the "oci://" prefix:
kube and quadlet_kube currently call model_src_path.removeprefix("oci://") while
quadlet and compose do not, and Kube._gen_volumes already removes the prefix
internally; fix this by centralizing the strip behavior — remove the
removeprefix call from the callers (the calls to self.kube(...) and
self.quadlet_kube(...)) and ensure each callee (e.g., Kube._gen_volumes,
Quadlet._gen_volumes, Compose._gen_volumes or their equivalent helper methods)
performs model_src_path.removeprefix("oci://") exactly once so all generation
paths behave consistently.

453-469: Duplicated mount logic; consider a small helper.

Lines 453-469 are a near-verbatim copy of 443-451 against self.draft_model (plus a file.type != 'gguf' filter). Extracting a helper like _mount_model_files(model, only_gguf: bool = False) would remove the duplication and keep the mount-format string in one place.

Minor: file.type != 'gguf' compares a StoreFileType enum member to a raw string. It works today because StoreFileType is a StrEnum, but prefer file.type != StoreFileType.GGUF_MODEL for type clarity — or equivalently use self.draft_model.model_store.get_ref_file(...).model_files (which already filters by GGUF) to avoid the loop-plus-filter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/transports/base.py` around lines 453 - 469, The duplicate mount
logic for draft models should be refactored into a single helper to avoid
repetition and centralize the mount-format string; add a method named like
_mount_model_files(self, model, only_gguf: bool = False) that takes a
model/ref_file, iterates its model_files (or filters by StoreFileType.GGUF_MODEL
when only_gguf=True), builds container paths via
get_container_mount_path(blob_path), composes mount_path as
f"{MNT_DIR}/{file.name}" and calls self.engine.add(...) once from the helper;
replace the duplicated blocks that reference self.draft_model, get_ref_file,
get_blob_file_path, get_container_mount_path, MNT_DIR and self.engine.add with
calls to _mount_model_files(self.draft_model, only_gguf=True) (and similar for
the other occurrence) and change the raw string comparison file.type != 'gguf'
to use StoreFileType.GGUF_MODEL for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ramalama/kube.py`:
- Around line 57-65: The OCI branch currently hardcodes the mount name "model"
and always mounts to MNT_DIR (ignoring dest_model_path), causing mismatches when
volume_name differs (e.g., model_draft); update the block that builds mounts to
use the dynamic volume_name instead of "model" and compute the mountPath using
dest_model_path (falling back to MNT_DIR) and preserve subPath logic (or append
subPath to the resolved dest path) so the generated volumeMount references the
same volume_name passed to _gen_oci_volume(volume_name, src_model_path) and
mounts to the correct destination.

In `@ramalama/transports/base.py`:
- Around line 701-717: In quadlet_kube, rename the parameter model_parts back to
draft_model_paths and restore a separate model_parts parameter for the main
model so the signature matches how callers pass draft paths (and preserves main
model parts); ensure you pass draft_model_paths into Kube(...) in the position
that Kube.__init__ expects and also thread draft_model_paths into the
Quadlet(...) call instead of None (and keep model_parts passed for the main
model), splitting the long Quadlet(...) call across lines to stay under 120
characters; verify argument order matches Kube and Quadlet constructors.
- Line 686: The function signature for quadlet is too long (~140 chars); split
the def quadlet(...) into a multi-line signature so no line exceeds 120
characters by placing parameters on separate lines (e.g., after the opening
parenthesis put each of model_paths, chat_template_paths, mmproj_paths, args,
exec_args, output_dir, model_parts=None, draft_model_paths=None on their own
line or grouped sensibly) and align the continuation indentation to match
existing project style, keeping the same parameter order and defaults.

In `@test/unit/test_kube.py`:
- Around line 76-89: The test directly constructs draft_model_paths as a
Tuple[str,str] and calls Kube(...) which hides a type/usage mismatch that only
appears when code flows through TransportBase.generate_container_config →
self.kube(...) → Kube(...); update code and tests so draft_model_paths has a
single canonical type (prefer list[tuple[str,str]]) across Kube, Quadlet,
Compose and TransportBase.generate_container_config, adjust function signatures
and callers to accept and unpack a list consistently, and add integration tests
that call TransportBase.generate_container_config with (a) no draft, (b) a
single draft tuple, and (c) multiple draft tuples to exercise the real path and
surface the previous mismatch.

---

Outside diff comments:
In `@ramalama/compose.py`:
- Around line 25-49: The guard in _gen_volumes uses "if
self.src_draft_model_path is not None" but __init__ sets draft_model_paths to
("", "") when None, so self.src_draft_model_path is never None and produces an
empty volume entry; change that check to a truthiness check (e.g. if
self.src_draft_model_path:) so _gen_model_volume is only called when a real
draft path exists, and ensure this logic is applied consistently for
src_chat_template_path and src_mmproj_path if needed; update tests (e.g.
test_compose_no_draft or test_compose_no_port_arg) to assert that '- ":":ro"'
(or equivalent empty-volume string) is not present in the generated compose
output to prevent regressions.

In `@ramalama/kube.py`:
- Around line 14-31: Kube.__init__ currently types draft_model_paths as
Optional[Tuple[str, str]] and requires it, causing inconsistent handling vs
Quadlet and breakage in transports; change the signature of Kube.__init__ to
draft_model_paths: Optional[list[Tuple[str, str]]] = None, adjust
self.model_paths to accept multiple draft parts (e.g., append each tuple as
'model_draft_<n>') and update _gen_volumes to iterate over
self.model_paths['model_draft'] parts and emit one volume per part (matching how
Quadlet.__init__ and base.py._get_all_model_part_paths treat multi-part models),
and ensure callers in ramalama/transports/base.py no longer need to unpack with
*draft_model_paths.

In `@ramalama/quadlet.py`:
- Around line 22-39: Quadlet.__init__ currently mismatches
Compose.__init__/Kube.__init__ for draft_model_paths and unsafely indexes
draft_model_paths[0], so either change Quadlet.__init__'s parameter to
draft_model_paths: Optional[Tuple[str,str]] (and stop indexing a list) or update
Compose/Kube to pass a list consistently; remove the dead reference to
self.src_draft_model_path (it doesn't exist) and use the local
src_draft_model_path variable only, and correct the indentation of the
model_parts else-block to use 4 spaces so formatting/ruff errors are resolved;
locate these fixes in Quadlet.__init__, and review callers that use
_get_all_model_part_paths to ensure the chosen signature is consistent.

In `@ramalama/transports/base.py`:
- Around line 625-669: When no draft_model exists you must use None as the
sentinel (not an empty list) and stop unpacking draft_model_paths when calling
the generators; change the assignment in the pipeline to draft_model_paths =
None if self.draft_model is None, then call self.quadlet(..., draft_model_paths)
and call self.kube(..., draft_model_paths) / self.quadlet_kube(...,
draft_model_paths) / self.compose(..., draft_model_paths) (remove the * unpack).
Update the downstream initializers/methods (Quadlet.__init__, Kube.__init__,
quadlet_kube, compose) to accept draft_model_paths:
Optional[List[Tuple[src,dest]]] and handle None (no draft), a single-part list,
or multi-part lists (for Kube emit one volume per part instead of assuming a
single tuple).

---

Nitpick comments:
In `@ramalama/compose.py`:
- Line 22: The constructor parameter draft_model_paths for Compose is declared
Optional[tuple[str, str]] but has no default, breaking callers; update the
Compose __init__ signature (the Compose class constructor) to give
draft_model_paths a default of None (matching other optional params) so existing
callers need not pass it, and ensure any internal uses check for None
accordingly.

In `@ramalama/kube.py`:
- Around line 49-52: The loop over self.model_paths.items() should unpack keys
and tuple values for clarity: change the iteration to unpack volume_name and the
(src_model_path, dest_model_path) pair directly (instead of using the temporary
paths variable), then apply src_model_path.removeprefix("oci://") to the
unpacked src_model_path; update any subsequent references to use these unpacked
variable names (loop header and usages in the block where src_model_path,
dest_model_path, and volume_name are used).

In `@ramalama/stack.py`:
- Line 33: Replace the direct attribute access to args.model_draft with a safe
getattr to prevent AttributeError when the argparse parser didn't register that
option: in the Stack initializer where self.draft_model is set (the New
constructor call referencing args.model_draft), use getattr(args, "model_draft",
None) so the code mirrors the defensive pattern used elsewhere (e.g.,
getattr(args, "name", None)) and only constructs New if the returned value is
not None.

In `@ramalama/transports/base.py`:
- Around line 640-669: The call sites for kube, quadlet_kube, and compose are
inconsistent about stripping the "oci://" prefix: kube and quadlet_kube
currently call model_src_path.removeprefix("oci://") while quadlet and compose
do not, and Kube._gen_volumes already removes the prefix internally; fix this by
centralizing the strip behavior — remove the removeprefix call from the callers
(the calls to self.kube(...) and self.quadlet_kube(...)) and ensure each callee
(e.g., Kube._gen_volumes, Quadlet._gen_volumes, Compose._gen_volumes or their
equivalent helper methods) performs model_src_path.removeprefix("oci://")
exactly once so all generation paths behave consistently.
- Around line 453-469: The duplicate mount logic for draft models should be
refactored into a single helper to avoid repetition and centralize the
mount-format string; add a method named like _mount_model_files(self, model,
only_gguf: bool = False) that takes a model/ref_file, iterates its model_files
(or filters by StoreFileType.GGUF_MODEL when only_gguf=True), builds container
paths via get_container_mount_path(blob_path), composes mount_path as
f"{MNT_DIR}/{file.name}" and calls self.engine.add(...) once from the helper;
replace the duplicated blocks that reference self.draft_model, get_ref_file,
get_blob_file_path, get_container_mount_path, MNT_DIR and self.engine.add with
calls to _mount_model_files(self.draft_model, only_gguf=True) (and similar for
the other occurrence) and change the raw string comparison file.type != 'gguf'
to use StoreFileType.GGUF_MODEL for clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5fb58e90-4670-4f4d-92de-e481362c13e6

📥 Commits

Reviewing files that changed from the base of the PR and between b274315 and 8b4af86.

📒 Files selected for processing (12)
  • ramalama/compose.py
  • ramalama/kube.py
  • ramalama/plugins/runtimes/inference/llama_cpp.py
  • ramalama/quadlet.py
  • ramalama/stack.py
  • ramalama/transports/base.py
  • test/unit/data/test_compose/with_model_draft.yaml
  • test/unit/data/test_kube/draft_model_hostpath.yaml
  • test/unit/data/test_quadlet/draft_model/tinyllama.container
  • test/unit/test_compose.py
  • test/unit/test_kube.py
  • test/unit/test_quadlet.py

Comment thread ramalama/kube.py
Comment thread ramalama/transports/base.py Outdated
Comment thread ramalama/transports/base.py Outdated
Comment thread test/unit/test_kube.py
@mkristian mkristian temporarily deployed to macos-installer April 24, 2026 21:56 — with GitHub Actions Inactive
@mkristian mkristian temporarily deployed to macos-installer April 24, 2026 22:53 — with GitHub Actions Inactive
@mkristian mkristian temporarily deployed to macos-installer April 24, 2026 23:31 — with GitHub Actions Inactive
@mkristian mkristian marked this pull request as ready for review April 24, 2026 23:37
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
ramalama/stack.py (1)

33-33: Consider defensive access for args.model_draft.

Other attributes on args are accessed with getattr(..., default) (e.g., name on line 26). Using args.model_draft directly will raise AttributeError for any caller that constructs args without going through a CLI parser that defines --model-draft. A small defensive change keeps Stack resilient to that.

♻️ Suggested change
-        self.draft_model = New(args.model_draft, args) if args.model_draft is not None else None
+        model_draft = getattr(args, "model_draft", None)
+        self.draft_model = New(model_draft, args) if model_draft is not None else None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/stack.py` at line 33, Accessing args.model_draft directly can raise
AttributeError if the attribute is missing; change the creation of draft_model
to use getattr(args, "model_draft", None) and only call New(...) when that value
is not None (preserve the existing conditional) so the Stack constructor (where
draft_model is set and New is used) is resilient to args objects that lack
model_draft.
ramalama/quadlet.py (1)

31-37: Avoid mutating the caller-provided model_parts list.

self.model_parts = model_parts (line 31) keeps a reference to the caller's list, and the append on line 37 mutates it in place. Today the caller (_get_all_model_part_paths) returns a freshly built list so this is harmless, but it's an easy footgun if any caller ever passes a cached/shared list.

♻️ Suggested refactor
-        self.model_parts = model_parts if model_parts is not None else [(self.src_model_path, self.dest_model_path)]
+        self.model_parts = (
+            list(model_parts) if model_parts is not None else [(self.src_model_path, self.dest_model_path)]
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/quadlet.py` around lines 31 - 37, The code currently assigns
caller-provided model_parts directly to self.model_parts and then mutates it
when adding draft paths; instead make a defensive copy so callers' lists aren't
modified: if model_parts is not None set self.model_parts = list(model_parts)
(or model_parts.copy()) or build a new list from the tuples, then, if
draft_model_paths is present, extract
src_draft_model_path/dest_draft_model_path, strip the "oci://" prefix as before
and append the new tuple to self.model_parts; ensure this change touches the
assignment of self.model_parts and the draft_model_paths append logic so callers
(e.g., _get_all_model_part_paths) are never mutated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ramalama/kube.py`:
- Around line 29-31: The volume name key 'model_draft' in the model_paths
assignment should be changed to a DNS-1123-compliant name (use 'model-draft') so
generated Kubernetes volumes use hyphens not underscores: update the assignment
in kube.py where self.model_paths = {'model': model_paths} and the conditional
that adds 'model_draft' to instead add 'model-draft'; also update the test
fixture references in test/unit/data/test_kube/draft_model_hostpath.yaml
(replace model_draft → model-draft) so tests match the new name. Ensure any code
paths that read that key (look for references to 'model_draft' in this module)
are updated to the new 'model-draft' symbol; leave other pre-existing keys like
chat_template/mmproj as-is for now.

---

Nitpick comments:
In `@ramalama/quadlet.py`:
- Around line 31-37: The code currently assigns caller-provided model_parts
directly to self.model_parts and then mutates it when adding draft paths;
instead make a defensive copy so callers' lists aren't modified: if model_parts
is not None set self.model_parts = list(model_parts) (or model_parts.copy()) or
build a new list from the tuples, then, if draft_model_paths is present, extract
src_draft_model_path/dest_draft_model_path, strip the "oci://" prefix as before
and append the new tuple to self.model_parts; ensure this change touches the
assignment of self.model_parts and the draft_model_paths append logic so callers
(e.g., _get_all_model_part_paths) are never mutated.

In `@ramalama/stack.py`:
- Line 33: Accessing args.model_draft directly can raise AttributeError if the
attribute is missing; change the creation of draft_model to use getattr(args,
"model_draft", None) and only call New(...) when that value is not None
(preserve the existing conditional) so the Stack constructor (where draft_model
is set and New is used) is resilient to args objects that lack model_draft.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 273b3e24-5a9c-4f09-9613-32f1e196768b

📥 Commits

Reviewing files that changed from the base of the PR and between 8b4af86 and 63393e6.

📒 Files selected for processing (14)
  • docs/options/model-draft.md
  • docs/ramalama-run.1.md.in
  • ramalama/compose.py
  • ramalama/kube.py
  • ramalama/plugins/runtimes/inference/llama_cpp.py
  • ramalama/quadlet.py
  • ramalama/stack.py
  • ramalama/transports/base.py
  • test/unit/data/test_compose/with_model_draft.yaml
  • test/unit/data/test_kube/draft_model_hostpath.yaml
  • test/unit/data/test_quadlet/draft_model/tinyllama.container
  • test/unit/test_compose.py
  • test/unit/test_kube.py
  • test/unit/test_quadlet.py
✅ Files skipped from review due to trivial changes (3)
  • test/unit/data/test_quadlet/draft_model/tinyllama.container
  • test/unit/data/test_compose/with_model_draft.yaml
  • docs/ramalama-run.1.md.in
🚧 Files skipped from review as they are similar to previous changes (5)
  • ramalama/plugins/runtimes/inference/llama_cpp.py
  • ramalama/compose.py
  • test/unit/test_quadlet.py
  • test/unit/test_compose.py
  • ramalama/transports/base.py

Comment thread ramalama/kube.py Outdated
@mkristian mkristian temporarily deployed to macos-installer April 24, 2026 23:51 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
ramalama/kube.py (1)

55-63: ⚠️ Potential issue | 🟠 Major

OCI volume mounts still ignore destination path in the non-host branch.

At Line 61, mountPath is hardcoded to MNT_DIR, while the host-path branch (Line 104) correctly uses dest_model_path. This can mount draft/secondary OCI-backed models to the wrong container path.

🐛 Proposed fix
             else:
                 subPath = ""
                 if not self.artifact:
                     subPath = """
           subPath: /models"""
+                mount_path = dest_model_path or MNT_DIR
                 mounts += f"""
-        - mountPath: {MNT_DIR}{subPath}
+        - mountPath: {mount_path}{subPath}
           name: {volume_name}"""
                 volumes += self._gen_oci_volume(volume_name, src_model_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/kube.py` around lines 55 - 63, In ramalama/kube.py fix the OCI
(non-host) branch so mountPath uses the intended destination path instead of
always MNT_DIR: inside the else branch where mounts and volumes are built
(references: self.artifact, MNT_DIR, volume_name, src_model_path,
dest_model_path, _gen_oci_volume) replace the hardcoded mountPath with the same
dest_model_path logic used in the host-path branch (preserving the subPath
behavior when not self.artifact) so OCI-backed models mount to the correct
container path and keep volumes populated via self._gen_oci_volume(volume_name,
src_model_path).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ramalama/kube.py`:
- Around line 98-117: The two helper methods _gen_path_volume and
_gen_oci_volume lack type hints; update their signatures to include parameter
and return types (use volume_name: str, src_model_path: str, dest_model_path:
str for _gen_path_volume and volume_name: str, src_model_path: str for
_gen_oci_volume) and annotate both returns as Tuple[str, str] (for
_gen_path_volume) and str or Tuple[str] as appropriate—specifically change
_gen_path_volume(...) -> Tuple[str, str] and _gen_oci_volume(...) -> str (or
Tuple[str, str] if you prefer consistent tuple returns), leveraging the
already-imported Tuple type so mypy/type-checking passes.

---

Duplicate comments:
In `@ramalama/kube.py`:
- Around line 55-63: In ramalama/kube.py fix the OCI (non-host) branch so
mountPath uses the intended destination path instead of always MNT_DIR: inside
the else branch where mounts and volumes are built (references: self.artifact,
MNT_DIR, volume_name, src_model_path, dest_model_path, _gen_oci_volume) replace
the hardcoded mountPath with the same dest_model_path logic used in the
host-path branch (preserving the subPath behavior when not self.artifact) so
OCI-backed models mount to the correct container path and keep volumes populated
via self._gen_oci_volume(volume_name, src_model_path).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e54e8de9-d5c6-429b-a04b-35106387f0b1

📥 Commits

Reviewing files that changed from the base of the PR and between 63393e6 and ebc5774.

📒 Files selected for processing (1)
  • ramalama/kube.py

Comment thread ramalama/kube.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
ramalama/kube.py (1)

55-63: ⚠️ Potential issue | 🟠 Major

The OCI branch still ignores the per-model destination path.

Line 61 always mounts to MNT_DIR, while the hostPath branch mounts each model at its own dest_model_path. As soon as one of model / model-draft is OCI-backed, that shared directory mount can hide the other model, so only one path remains visible to llama-server. This branch needs a per-volume mountPath/subPath derived from dest_model_path; the new hostPath fixture will not catch this.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ramalama/kube.py` around lines 55 - 63, The OCI branch currently mounts all
OCI models to the shared MNT_DIR which hides other models; update the block that
builds mounts/volumes (the variables mounts, volumes) to compute a per-model
mountPath/subPath from the model's dest_model_path instead of always using
MNT_DIR: derive a safe relative subPath (e.g., strip the MNT_DIR prefix or
compute relative path) and set mountPath to either dest_model_path or
f"{MNT_DIR}{subPath}", then call _gen_oci_volume(volume_name, src_model_path) as
before; use the existing self.artifact flag only to decide whether to include a
subPath entry, but ensure the per-volume mountPath/subPath matches
dest_model_path so each OCI-backed model gets its own mount rather than a shared
directory.
🧹 Nitpick comments (1)
test/unit/data/test_kube/draft_model_hostpath.yaml (1)

24-25: Exercise the draft-model flag in this expected output too.

This fixture still invokes llama-server with only --model. Together with the paired input in test/unit/test_kube.py (Lines 76-89), the new case proves the draft file is mounted, but not that generated Kubernetes actually wires the draft model into the server. Please add --model-draft /mnt/models/model_draft.file in both places.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/unit/data/test_kube/draft_model_hostpath.yaml` around lines 24 - 25, The
fixture invokes llama-server with only --model, so it doesn't assert the draft
model is wired; update the expected container args for the process named or
invocations using "llama-server" (the args array in the
draft_model_hostpath.yaml fixture) to include the draft flag by adding
"--model-draft", "/mnt/models/model_draft.file"; also update the corresponding
test input in test/unit/test_kube.py that builds/compares this case (the test
block around the existing llama-server args) to include the same "--model-draft
/mnt/models/model_draft.file" so the test asserts the generated Kubernetes
mounts the draft model into the server invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@ramalama/kube.py`:
- Around line 55-63: The OCI branch currently mounts all OCI models to the
shared MNT_DIR which hides other models; update the block that builds
mounts/volumes (the variables mounts, volumes) to compute a per-model
mountPath/subPath from the model's dest_model_path instead of always using
MNT_DIR: derive a safe relative subPath (e.g., strip the MNT_DIR prefix or
compute relative path) and set mountPath to either dest_model_path or
f"{MNT_DIR}{subPath}", then call _gen_oci_volume(volume_name, src_model_path) as
before; use the existing self.artifact flag only to decide whether to include a
subPath entry, but ensure the per-volume mountPath/subPath matches
dest_model_path so each OCI-backed model gets its own mount rather than a shared
directory.

---

Nitpick comments:
In `@test/unit/data/test_kube/draft_model_hostpath.yaml`:
- Around line 24-25: The fixture invokes llama-server with only --model, so it
doesn't assert the draft model is wired; update the expected container args for
the process named or invocations using "llama-server" (the args array in the
draft_model_hostpath.yaml fixture) to include the draft flag by adding
"--model-draft", "/mnt/models/model_draft.file"; also update the corresponding
test input in test/unit/test_kube.py that builds/compares this case (the test
block around the existing llama-server args) to include the same "--model-draft
/mnt/models/model_draft.file" so the test asserts the generated Kubernetes
mounts the draft model into the server invocation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 655557d8-394b-4c4b-a003-d43baf82cf42

📥 Commits

Reviewing files that changed from the base of the PR and between ebc5774 and 73a10d6.

📒 Files selected for processing (3)
  • ramalama/kube.py
  • test/unit/data/test_kube/draft_model_hostpath.yaml
  • test/unit/data/test_kube/with_chat_template.yaml

@mikebonnet
Copy link
Copy Markdown
Collaborator

/ok-to-test

@mikebonnet
Copy link
Copy Markdown
Collaborator

When a draft model is specified, I would expect to see --model-draft in the llama-server arguments, but I don't see that in the test data. Is something missing?

@mkristian mkristian temporarily deployed to macos-installer April 29, 2026 19:39 — with GitHub Actions Inactive
Comment thread ramalama/quadlet.py Outdated
Copy link
Copy Markdown
Member

@rhatdan rhatdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mkristian
LGTM other then one nit.

@mikebonnet
Copy link
Copy Markdown
Collaborator

/ok-to-test

@mkristian
Copy link
Copy Markdown
Contributor Author

I can not see the errors on Red Hat Konflux. How can I be of help ?

Comment thread test/e2e/test_serve.py Outdated
else:
raise Exception("Invalid generate option")

assert len(re.findall(r'/mnt/models/smollm-135', content)) == 2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On big-endian systems (s390x), test_model resolves to stories-be:260k, so this would need to check for /mnt/models/stories260K-be.gguf.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a fix, but can not test it. A little research said that the draft model using Q4 quants should work for both big and little endian systems.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to use:

assert len(re.findall(r'/mnt/models/stories260K-be', content)) == 2

For reference, here one of the test errors showing the generated yaml:

  test/e2e/test_serve.py:672: in test_serve_generation
      assert len(re.findall(r'/mnt/models/stories-be', content)) == 2
  E   assert 0 == 2
  E    +  where 0 = len([])
  E    +    where [] = <function findall at 0x3ff9e4aea30>('/mnt/models/stories-be', '# Save the output of this file and use kubectl create -f to import\n# it into Kubernetes.\n#\n# Created with ramalama-0.20.0\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: test\n  labels:\n    app: test\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: test\n  template:\n    metadata:\n      labels:\n        app: test\n    spec:\n      containers:\n      - name: test\n        image: quay.io/redhat-user-workloads/ramalama-tenant/ramalama@sha256:a371324628adff9cc3a74251e2926826131100495a1d211ddc2a1ac2d6167930\n        command: ["llama-server"]\n        args: [\'--host\', \'::\', \'--port\', \'1234\', \'--model\', \'/mnt/models/stories260K-be.gguf\', \'--no-warmup\', \'--alias\', \'taronaeo/tinyllamas-BE/stories260K-be.gguf\', \'--temp\', \'0.8\', \'--threads\', \'8\']\n\n        securityContext:\n          allowPrivilegeEscalation: false\n          capabilities:\n            drop:\n            - CAP_CHOWN\n            - CAP_FOWNER\n            - CAP_FSETID\n            - CAP_KILL\n            - CAP_NET_BIND_SERVICE\n            - CAP_SETFCAP\n            - CAP_SETGID\n            - CAP_SETPCAP\n            - CAP_SETUID\n            - CAP_SYS_CHROOT\n            add:\n            - CAP_DAC_OVERRIDE\n          seLinuxOptions:\n            type: spc_t\n        ports:\n        - containerPort: 1234\n        volumeMounts:\n        - mountPath: /mnt/models/stories260K-be.gguf\n          name: model\n      volumes:\n      - hostPath:\n          path: /tmp/.local/share/ramalama/store/huggingface/taronaeo/tinyllamas-BE/stories260K-be.gguf/blobs/sha256-276d6f9d9050e79670a113492331dc52b3568690b03f5e633521cbed020bfc1c\n        name: model\n')
  E    +      where <function findall at 0x3ff9e4aea30> = re.findall
  ------------------------------------------------- Captured stderr call -------------------------------------------------
  Using cached huggingface://stories260K-be.gguf:latest ...
  Downloading hf://ggml-org/gemma-3-270m-it-qat-GGUF:Q4_0 ...
  Trying to pull hf://ggml-org/gemma-3-270m-it-qat-GGUF:Q4_0 ...
  Downloading gemma-3-270m-it-qat-Q4_0.gguf

@mikebonnet
Copy link
Copy Markdown
Collaborator

/ok-to-test

@mikebonnet
Copy link
Copy Markdown
Collaborator

/ok-to-test

@mkristian mkristian temporarily deployed to macos-installer May 1, 2026 07:12 — with GitHub Actions Inactive
mkristian added 3 commits May 1, 2026 09:13
…iles

Fixes most of containers#2642

Signed-off-by: Christian Meier <meier.kristian@gmail.com>
Signed-off-by: Christian Meier <meier.kristian@gmail.com>
fix draft-model unit tests

Signed-off-by: Christian Meier <meier.kristian@gmail.com>
Signed-off-by: Christian Meier <meier.kristian@gmail.com>
@mkristian mkristian temporarily deployed to macos-installer May 1, 2026 13:24 — with GitHub Actions Inactive
Copy link
Copy Markdown
Member

@rhatdan rhatdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhatdan rhatdan merged commit 7aa1706 into containers:main May 1, 2026
35 of 36 checks passed
@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented May 1, 2026

Thanks @mkristian

@mkristian mkristian deleted the fix-model-draft branch May 1, 2026 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants