Fix most issues around model draft#2685
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds optional draft-model support end-to-end: transports resolve draft ref files and enumerate GGUF blobs, producing host→container path pairs that are threaded into Quadlet, Kube, and Compose generators which emit additional mounts/volumes for draft model blobs; CLI and tests updated to expose and validate --model-draft. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as "CLI\n(--model-draft)"
participant Transport as "TransportBase\n(resolve draft refs)"
participant FS as "Host FS\n(model GGUF blobs)"
participant Generator as "Quadlet / Kube / Compose\n(generators)"
participant Runtime as "Container / YAML\n(deployment/compose)"
CLI->>Transport: invoke with model + optional draft arg
Transport->>FS: locate draft ref file & list GGUF blobs
FS-->>Transport: return blob paths (or error)
Transport->>Generator: pass model_paths + draft_model_paths
Generator->>Runtime: emit YAML/units/compose with per-blob mounts
Runtime->>FS: mount host files into container runtime
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces support for draft models across various deployment formats, including Docker Compose, Kubernetes, and Quadlet. It updates the initialization logic, volume generation, and command-line argument handling to accommodate draft model paths. Several critical issues were identified: the use of argument unpacking for draft_model_paths in generate_container_config can lead to TypeError exceptions, and the quadlet_kube call is missing the required model_parts argument. Additionally, a hardcoded volume name in the Kubernetes generator needs to be replaced with a dynamic variable to support draft models correctly, and several method signatures require updates to properly propagate draft model data.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
ramalama/compose.py (1)
25-49:⚠️ Potential issue | 🔴 CriticalBug:
is not Nonecheck will always fire when caller passesdraft_model_paths=None.Lines 25–27 replace a
Nonedraft_model_pathswith("", ""), soself.src_draft_model_pathis never actuallyNone— it's either a non-empty string or"". Then line 48:if self.src_draft_model_path is not None: volumes += self._gen_model_volume(self.src_draft_model_path, self.dest_draft_model_path)evaluates to
Trueeven when no draft model was supplied, emitting a bogus- ":":ro"volume entry in the generated compose YAML. The existing tests (e.g.test_compose_no_port_argattest/unit/test_compose.py:259,test_compose_no_env_vars,test_compose_no_devices) passNonebut don't assert on volumes, so they don't catch this.Use a truthiness check, consistent with the adjacent chat-template / mmproj guards:
🛠️ Proposed fix
- # Model Volume - volumes += self._gen_model_volume(self.src_model_path, self.dest_model_path) - if self.src_draft_model_path is not None: - volumes += self._gen_model_volume(self.src_draft_model_path, self.dest_draft_model_path) + # Model Volume + volumes += self._gen_model_volume(self.src_model_path, self.dest_model_path) + if self.src_draft_model_path: + volumes += self._gen_model_volume(self.src_draft_model_path, self.dest_draft_model_path)Consider also adding an assertion in one of the no-draft compose tests to guard against regressions, e.g.
assert '- ":":ro"' not in result.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/compose.py` around lines 25 - 49, The guard in _gen_volumes uses "if self.src_draft_model_path is not None" but __init__ sets draft_model_paths to ("", "") when None, so self.src_draft_model_path is never None and produces an empty volume entry; change that check to a truthiness check (e.g. if self.src_draft_model_path:) so _gen_model_volume is only called when a real draft path exists, and ensure this logic is applied consistently for src_chat_template_path and src_mmproj_path if needed; update tests (e.g. test_compose_no_draft or test_compose_no_port_arg) to assert that '- ":":ro"' (or equivalent empty-volume string) is not present in the generated compose output to prevent regressions.ramalama/quadlet.py (1)
22-39:⚠️ Potential issue | 🔴 CriticalType-signature mismatch for
draft_model_paths—[0]indexing is fragile/broken.The original review comment is accurate. All three issues are confirmed:
Signature inconsistency.
Quadlet.__init__declaresdraft_model_paths: Optional[list[Tuple[str, str]]], whereasCompose.__init__andKube.__init__declareOptional[Tuple[str, str]]. The upstream code inramalama/transports/base.py(lines 626–628) createsdraft_model_pathsas a list via_get_all_model_part_paths()or as an empty list, then passes it directly to Quadlet at line 696 but unpacks it with*draft_model_pathsfor Kube/Compose at lines 648/658/668. This unpacking only works safely if the list has exactly one element (or fails on empty list since the parameter is required). Please align the signature withCompose/Kubeand drop the[0]index, or ensure all three generators handlelist[Tuple[str, str]]consistently.Dead/erroneous code. Line 38 references the non-existent attribute
self.src_draft_model_pathand should be removed entirely. The variablesrc_draft_model_pathexists only as a local at line 37; attempting to uncomment line 38 would cause a NameError.Indentation. Line 34 is indented with 3 spaces (11 characters total) while the sibling
ifblock above (line 32) uses 4 spaces (12 characters total). This will be flagged byruff format.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/quadlet.py` around lines 22 - 39, Quadlet.__init__ currently mismatches Compose.__init__/Kube.__init__ for draft_model_paths and unsafely indexes draft_model_paths[0], so either change Quadlet.__init__'s parameter to draft_model_paths: Optional[Tuple[str,str]] (and stop indexing a list) or update Compose/Kube to pass a list consistently; remove the dead reference to self.src_draft_model_path (it doesn't exist) and use the local src_draft_model_path variable only, and correct the indentation of the model_parts else-block to use 4 spaces so formatting/ruff errors are resolved; locate these fixes in Quadlet.__init__, and review callers that use _get_all_model_part_paths to ensure the chosen signature is consistent.ramalama/kube.py (1)
14-31: 🛠️ Refactor suggestion | 🟠 MajorInconsistent
draft_model_pathstype acrossKube/Quadlet/Compose.
Kube.__init__declaresdraft_model_paths: Optional[Tuple[str, str]](single tuple), whileQuadlet.__init__(and the logical model inbase.py._get_all_model_part_paths) usesOptional[list[Tuple[str, str]]]. This is what forces the awkward*draft_model_pathsunpack inramalama/transports/base.pyand limitsKubeto single-part draft models. PreferOptional[list[Tuple[str, str]]]and iterate in_gen_volumesto emit onemodel_draft_<n>volume per part, mirroring howQuadletappends each part toself.model_parts.Separately, since
draft_model_pathshas no default here, every call site must pass it — that's the direct cause of theTypeErrorrisk flagged inramalama/transports/base.py. Consider giving it a= Nonedefault for backward compatibility while the new call sites are rolled out.As per coding guidelines: "Use type hints in Python code and ensure mypy compatibility".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/kube.py` around lines 14 - 31, Kube.__init__ currently types draft_model_paths as Optional[Tuple[str, str]] and requires it, causing inconsistent handling vs Quadlet and breakage in transports; change the signature of Kube.__init__ to draft_model_paths: Optional[list[Tuple[str, str]]] = None, adjust self.model_paths to accept multiple draft parts (e.g., append each tuple as 'model_draft_<n>') and update _gen_volumes to iterate over self.model_paths['model_draft'] parts and emit one volume per part (matching how Quadlet.__init__ and base.py._get_all_model_part_paths treat multi-part models), and ensure callers in ramalama/transports/base.py no longer need to unpack with *draft_model_paths.ramalama/transports/base.py (1)
625-669:⚠️ Potential issue | 🔴 CriticalBroken pipeline when draft_model is absent or multi-part (two critical bugs).
Two related defects here will break generation for most users:
Empty list vs
Nonesemantics (lines 625-628). Whenself.draft_model is None, you setdraft_model_paths = []. That empty list then propagates toQuadlet(...)via line 638/696. Per theQuadlet.__init__you're calling (seeramalama/quadlet.py):if draft_model_paths is not None: src_draft_model_path, dest_draft_model_path = draft_model_paths[0]
[]is notNone, so it takes the branch and raisesIndexErroron[0]. Anyone runningramalama ... --generate quadletwithout a draft model will crash.
*draft_model_pathsunpacking (lines 648, 658, 668).kube(),quadlet_kube(), andcompose()all declaredraft_model_pathsas a single required positional parameter, and you call them with*draft_model_paths. Behavior:
- No draft model →
*[]passes zero args →TypeError: missing required argument.- Single-part draft →
*[(s,d)]passes one tuple → works.- Multi-part draft →
*[(s1,d1),(s2,d2)…]passes too many args →TypeError.Only the "single-part draft" case works. The no-draft case (by far the most common) is broken.
Both bugs are masked by the unit tests, which call
Kube/Quadletdirectly and skip this pipeline.Suggested fix (use
Noneas the sentinel and stop unpacking):🔒️ Proposed fix
# Get all model parts (for multi-part models) model_parts = self._get_all_model_part_paths(False, True, args.dryrun) - if self.draft_model is not None: - draft_model_paths = self.draft_model._get_all_model_part_paths(False, True, args.dryrun) - else: - draft_model_paths = [] + draft_model_paths = None + if self.draft_model is not None: + draft_model_paths = self.draft_model._get_all_model_part_paths(False, True, args.dryrun) if args.generate.gen_type == "quadlet": self.quadlet( (model_src_path, model_dest_path), (chat_template_src_path, chat_template_dest_path), (mmproj_src_path, mmproj_dest_path), args, exec_args, args.generate.output_dir, model_parts, draft_model_paths, ) elif args.generate.gen_type == "kube": self.kube( (model_src_path.removeprefix("oci://"), model_dest_path), (chat_template_src_path, chat_template_dest_path), (mmproj_src_path, mmproj_dest_path), args, exec_args, args.generate.output_dir, - *draft_model_paths, + draft_model_paths, ) elif args.generate.gen_type == "quadlet/kube": self.quadlet_kube( (model_src_path.removeprefix("oci://"), model_dest_path), (chat_template_src_path, chat_template_dest_path), (mmproj_src_path, mmproj_dest_path), args, exec_args, args.generate.output_dir, - *draft_model_paths, + draft_model_paths, ) elif args.generate.gen_type == "compose": self.compose( (model_src_path, model_dest_path), (chat_template_src_path, chat_template_dest_path), (mmproj_src_path, mmproj_dest_path), args, exec_args, args.generate.output_dir, - *draft_model_paths, + draft_model_paths, )Then align the downstream signatures to accept the full list (or equivalently, a single tuple for
Kube/Compose), and handleNoneuniformly. Note that the unpacking trick also loses multi-part draft support forKube—Kube.__init__only models one draft volume inself.model_paths['model_draft'], so if you want multi-part draft parity withQuadlet,Kubeneeds to accept a list of tuples and emit a volume per part, similar to the main model loop.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/transports/base.py` around lines 625 - 669, When no draft_model exists you must use None as the sentinel (not an empty list) and stop unpacking draft_model_paths when calling the generators; change the assignment in the pipeline to draft_model_paths = None if self.draft_model is None, then call self.quadlet(..., draft_model_paths) and call self.kube(..., draft_model_paths) / self.quadlet_kube(..., draft_model_paths) / self.compose(..., draft_model_paths) (remove the * unpack). Update the downstream initializers/methods (Quadlet.__init__, Kube.__init__, quadlet_kube, compose) to accept draft_model_paths: Optional[List[Tuple[src,dest]]] and handle None (no draft), a single-part list, or multi-part lists (for Kube emit one volume per part instead of assuming a single tuple).
🧹 Nitpick comments (5)
ramalama/stack.py (1)
33-33: Usegetattrto avoidAttributeErrorwhenmodel_draftis not set.
args.model_draftis only populated by the argparse action forrun/servesubcommands (seellama_cpp.py:345). IfStackis ever instantiated from a path where the arg parser didn't register--model-draft, this line raisesAttributeError. Agetattr(args, "model_draft", None)is safer and matches the defensive style already used elsewhere in this class (e.g.,getattr(args, "name", None)on line 26).🛠️ Proposed fix
- self.draft_model = New(args.model_draft, args) if args.model_draft is not None else None + model_draft = getattr(args, "model_draft", None) + self.draft_model = New(model_draft, args) if model_draft is not None else None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/stack.py` at line 33, Replace the direct attribute access to args.model_draft with a safe getattr to prevent AttributeError when the argparse parser didn't register that option: in the Stack initializer where self.draft_model is set (the New constructor call referencing args.model_draft), use getattr(args, "model_draft", None) so the code mirrors the defensive pattern used elsewhere (e.g., getattr(args, "name", None)) and only constructs New if the returned value is not None.ramalama/compose.py (1)
22-22: Consider a default value fordraft_model_pathsto avoid a breaking signature change.Adding
draft_model_pathsas a required positional parameter forces every existing caller ofCompose(...)to be updated in lockstep (as seen attest/unit/test_compose.py:259,272,285which now pass a trailingNone). Defaulting toNone, mirroring other optional params in this constructor, is both safer for downstream callers and more consistent with theOptional[...]type hint.🛠️ Proposed fix
- draft_model_paths: Optional[tuple[str, str]], + draft_model_paths: Optional[tuple[str, str]] = None,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/compose.py` at line 22, The constructor parameter draft_model_paths for Compose is declared Optional[tuple[str, str]] but has no default, breaking callers; update the Compose __init__ signature (the Compose class constructor) to give draft_model_paths a default of None (matching other optional params) so existing callers need not pass it, and ensure any internal uses check for None accordingly.ramalama/kube.py (1)
49-52: Nit: unpack.items()for clarity.Minor readability improvement:
♻️ Proposed refactor
- for paths in self.model_paths.items(): - src_model_path, dest_model_path = paths[1] - src_model_path = src_model_path.removeprefix("oci://") - volume_name = paths[0] + for volume_name, (src_model_path, dest_model_path) in self.model_paths.items(): + src_model_path = src_model_path.removeprefix("oci://")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/kube.py` around lines 49 - 52, The loop over self.model_paths.items() should unpack keys and tuple values for clarity: change the iteration to unpack volume_name and the (src_model_path, dest_model_path) pair directly (instead of using the temporary paths variable), then apply src_model_path.removeprefix("oci://") to the unpacked src_model_path; update any subsequent references to use these unpacked variable names (loop header and usages in the block where src_model_path, dest_model_path, and volume_name are used).ramalama/transports/base.py (2)
640-669: Inconsistentoci://stripping across generation paths.
kubeandquadlet_kubestrip"oci://"frommodel_src_pathbefore passing it down (lines 642, 652), butquadlet(line 631) andcompose(line 662) do not.Kube._gen_volumesalready calls.removeprefix("oci://")internally on each source path, so the strip at the caller is redundant forkube; forquadlet/compose, verify they do the same internally and either pick "strip at caller" or "strip in callee" consistently.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/transports/base.py` around lines 640 - 669, The call sites for kube, quadlet_kube, and compose are inconsistent about stripping the "oci://" prefix: kube and quadlet_kube currently call model_src_path.removeprefix("oci://") while quadlet and compose do not, and Kube._gen_volumes already removes the prefix internally; fix this by centralizing the strip behavior — remove the removeprefix call from the callers (the calls to self.kube(...) and self.quadlet_kube(...)) and ensure each callee (e.g., Kube._gen_volumes, Quadlet._gen_volumes, Compose._gen_volumes or their equivalent helper methods) performs model_src_path.removeprefix("oci://") exactly once so all generation paths behave consistently.
453-469: Duplicated mount logic; consider a small helper.Lines 453-469 are a near-verbatim copy of 443-451 against
self.draft_model(plus afile.type != 'gguf'filter). Extracting a helper like_mount_model_files(model, only_gguf: bool = False)would remove the duplication and keep the mount-format string in one place.Minor:
file.type != 'gguf'compares aStoreFileTypeenum member to a raw string. It works today becauseStoreFileTypeis aStrEnum, but preferfile.type != StoreFileType.GGUF_MODELfor type clarity — or equivalently useself.draft_model.model_store.get_ref_file(...).model_files(which already filters by GGUF) to avoid the loop-plus-filter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/transports/base.py` around lines 453 - 469, The duplicate mount logic for draft models should be refactored into a single helper to avoid repetition and centralize the mount-format string; add a method named like _mount_model_files(self, model, only_gguf: bool = False) that takes a model/ref_file, iterates its model_files (or filters by StoreFileType.GGUF_MODEL when only_gguf=True), builds container paths via get_container_mount_path(blob_path), composes mount_path as f"{MNT_DIR}/{file.name}" and calls self.engine.add(...) once from the helper; replace the duplicated blocks that reference self.draft_model, get_ref_file, get_blob_file_path, get_container_mount_path, MNT_DIR and self.engine.add with calls to _mount_model_files(self.draft_model, only_gguf=True) (and similar for the other occurrence) and change the raw string comparison file.type != 'gguf' to use StoreFileType.GGUF_MODEL for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ramalama/kube.py`:
- Around line 57-65: The OCI branch currently hardcodes the mount name "model"
and always mounts to MNT_DIR (ignoring dest_model_path), causing mismatches when
volume_name differs (e.g., model_draft); update the block that builds mounts to
use the dynamic volume_name instead of "model" and compute the mountPath using
dest_model_path (falling back to MNT_DIR) and preserve subPath logic (or append
subPath to the resolved dest path) so the generated volumeMount references the
same volume_name passed to _gen_oci_volume(volume_name, src_model_path) and
mounts to the correct destination.
In `@ramalama/transports/base.py`:
- Around line 701-717: In quadlet_kube, rename the parameter model_parts back to
draft_model_paths and restore a separate model_parts parameter for the main
model so the signature matches how callers pass draft paths (and preserves main
model parts); ensure you pass draft_model_paths into Kube(...) in the position
that Kube.__init__ expects and also thread draft_model_paths into the
Quadlet(...) call instead of None (and keep model_parts passed for the main
model), splitting the long Quadlet(...) call across lines to stay under 120
characters; verify argument order matches Kube and Quadlet constructors.
- Line 686: The function signature for quadlet is too long (~140 chars); split
the def quadlet(...) into a multi-line signature so no line exceeds 120
characters by placing parameters on separate lines (e.g., after the opening
parenthesis put each of model_paths, chat_template_paths, mmproj_paths, args,
exec_args, output_dir, model_parts=None, draft_model_paths=None on their own
line or grouped sensibly) and align the continuation indentation to match
existing project style, keeping the same parameter order and defaults.
In `@test/unit/test_kube.py`:
- Around line 76-89: The test directly constructs draft_model_paths as a
Tuple[str,str] and calls Kube(...) which hides a type/usage mismatch that only
appears when code flows through TransportBase.generate_container_config →
self.kube(...) → Kube(...); update code and tests so draft_model_paths has a
single canonical type (prefer list[tuple[str,str]]) across Kube, Quadlet,
Compose and TransportBase.generate_container_config, adjust function signatures
and callers to accept and unpack a list consistently, and add integration tests
that call TransportBase.generate_container_config with (a) no draft, (b) a
single draft tuple, and (c) multiple draft tuples to exercise the real path and
surface the previous mismatch.
---
Outside diff comments:
In `@ramalama/compose.py`:
- Around line 25-49: The guard in _gen_volumes uses "if
self.src_draft_model_path is not None" but __init__ sets draft_model_paths to
("", "") when None, so self.src_draft_model_path is never None and produces an
empty volume entry; change that check to a truthiness check (e.g. if
self.src_draft_model_path:) so _gen_model_volume is only called when a real
draft path exists, and ensure this logic is applied consistently for
src_chat_template_path and src_mmproj_path if needed; update tests (e.g.
test_compose_no_draft or test_compose_no_port_arg) to assert that '- ":":ro"'
(or equivalent empty-volume string) is not present in the generated compose
output to prevent regressions.
In `@ramalama/kube.py`:
- Around line 14-31: Kube.__init__ currently types draft_model_paths as
Optional[Tuple[str, str]] and requires it, causing inconsistent handling vs
Quadlet and breakage in transports; change the signature of Kube.__init__ to
draft_model_paths: Optional[list[Tuple[str, str]]] = None, adjust
self.model_paths to accept multiple draft parts (e.g., append each tuple as
'model_draft_<n>') and update _gen_volumes to iterate over
self.model_paths['model_draft'] parts and emit one volume per part (matching how
Quadlet.__init__ and base.py._get_all_model_part_paths treat multi-part models),
and ensure callers in ramalama/transports/base.py no longer need to unpack with
*draft_model_paths.
In `@ramalama/quadlet.py`:
- Around line 22-39: Quadlet.__init__ currently mismatches
Compose.__init__/Kube.__init__ for draft_model_paths and unsafely indexes
draft_model_paths[0], so either change Quadlet.__init__'s parameter to
draft_model_paths: Optional[Tuple[str,str]] (and stop indexing a list) or update
Compose/Kube to pass a list consistently; remove the dead reference to
self.src_draft_model_path (it doesn't exist) and use the local
src_draft_model_path variable only, and correct the indentation of the
model_parts else-block to use 4 spaces so formatting/ruff errors are resolved;
locate these fixes in Quadlet.__init__, and review callers that use
_get_all_model_part_paths to ensure the chosen signature is consistent.
In `@ramalama/transports/base.py`:
- Around line 625-669: When no draft_model exists you must use None as the
sentinel (not an empty list) and stop unpacking draft_model_paths when calling
the generators; change the assignment in the pipeline to draft_model_paths =
None if self.draft_model is None, then call self.quadlet(..., draft_model_paths)
and call self.kube(..., draft_model_paths) / self.quadlet_kube(...,
draft_model_paths) / self.compose(..., draft_model_paths) (remove the * unpack).
Update the downstream initializers/methods (Quadlet.__init__, Kube.__init__,
quadlet_kube, compose) to accept draft_model_paths:
Optional[List[Tuple[src,dest]]] and handle None (no draft), a single-part list,
or multi-part lists (for Kube emit one volume per part instead of assuming a
single tuple).
---
Nitpick comments:
In `@ramalama/compose.py`:
- Line 22: The constructor parameter draft_model_paths for Compose is declared
Optional[tuple[str, str]] but has no default, breaking callers; update the
Compose __init__ signature (the Compose class constructor) to give
draft_model_paths a default of None (matching other optional params) so existing
callers need not pass it, and ensure any internal uses check for None
accordingly.
In `@ramalama/kube.py`:
- Around line 49-52: The loop over self.model_paths.items() should unpack keys
and tuple values for clarity: change the iteration to unpack volume_name and the
(src_model_path, dest_model_path) pair directly (instead of using the temporary
paths variable), then apply src_model_path.removeprefix("oci://") to the
unpacked src_model_path; update any subsequent references to use these unpacked
variable names (loop header and usages in the block where src_model_path,
dest_model_path, and volume_name are used).
In `@ramalama/stack.py`:
- Line 33: Replace the direct attribute access to args.model_draft with a safe
getattr to prevent AttributeError when the argparse parser didn't register that
option: in the Stack initializer where self.draft_model is set (the New
constructor call referencing args.model_draft), use getattr(args, "model_draft",
None) so the code mirrors the defensive pattern used elsewhere (e.g.,
getattr(args, "name", None)) and only constructs New if the returned value is
not None.
In `@ramalama/transports/base.py`:
- Around line 640-669: The call sites for kube, quadlet_kube, and compose are
inconsistent about stripping the "oci://" prefix: kube and quadlet_kube
currently call model_src_path.removeprefix("oci://") while quadlet and compose
do not, and Kube._gen_volumes already removes the prefix internally; fix this by
centralizing the strip behavior — remove the removeprefix call from the callers
(the calls to self.kube(...) and self.quadlet_kube(...)) and ensure each callee
(e.g., Kube._gen_volumes, Quadlet._gen_volumes, Compose._gen_volumes or their
equivalent helper methods) performs model_src_path.removeprefix("oci://")
exactly once so all generation paths behave consistently.
- Around line 453-469: The duplicate mount logic for draft models should be
refactored into a single helper to avoid repetition and centralize the
mount-format string; add a method named like _mount_model_files(self, model,
only_gguf: bool = False) that takes a model/ref_file, iterates its model_files
(or filters by StoreFileType.GGUF_MODEL when only_gguf=True), builds container
paths via get_container_mount_path(blob_path), composes mount_path as
f"{MNT_DIR}/{file.name}" and calls self.engine.add(...) once from the helper;
replace the duplicated blocks that reference self.draft_model, get_ref_file,
get_blob_file_path, get_container_mount_path, MNT_DIR and self.engine.add with
calls to _mount_model_files(self.draft_model, only_gguf=True) (and similar for
the other occurrence) and change the raw string comparison file.type != 'gguf'
to use StoreFileType.GGUF_MODEL for clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5fb58e90-4670-4f4d-92de-e481362c13e6
📒 Files selected for processing (12)
ramalama/compose.pyramalama/kube.pyramalama/plugins/runtimes/inference/llama_cpp.pyramalama/quadlet.pyramalama/stack.pyramalama/transports/base.pytest/unit/data/test_compose/with_model_draft.yamltest/unit/data/test_kube/draft_model_hostpath.yamltest/unit/data/test_quadlet/draft_model/tinyllama.containertest/unit/test_compose.pytest/unit/test_kube.pytest/unit/test_quadlet.py
8b4af86 to
b21cae6
Compare
b21cae6 to
b33b7da
Compare
b33b7da to
63393e6
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
ramalama/stack.py (1)
33-33: Consider defensive access forargs.model_draft.Other attributes on
argsare accessed withgetattr(..., default)(e.g.,nameon line 26). Usingargs.model_draftdirectly will raiseAttributeErrorfor any caller that constructsargswithout going through a CLI parser that defines--model-draft. A small defensive change keepsStackresilient to that.♻️ Suggested change
- self.draft_model = New(args.model_draft, args) if args.model_draft is not None else None + model_draft = getattr(args, "model_draft", None) + self.draft_model = New(model_draft, args) if model_draft is not None else None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/stack.py` at line 33, Accessing args.model_draft directly can raise AttributeError if the attribute is missing; change the creation of draft_model to use getattr(args, "model_draft", None) and only call New(...) when that value is not None (preserve the existing conditional) so the Stack constructor (where draft_model is set and New is used) is resilient to args objects that lack model_draft.ramalama/quadlet.py (1)
31-37: Avoid mutating the caller-providedmodel_partslist.
self.model_parts = model_parts(line 31) keeps a reference to the caller's list, and theappendon line 37 mutates it in place. Today the caller (_get_all_model_part_paths) returns a freshly built list so this is harmless, but it's an easy footgun if any caller ever passes a cached/shared list.♻️ Suggested refactor
- self.model_parts = model_parts if model_parts is not None else [(self.src_model_path, self.dest_model_path)] + self.model_parts = ( + list(model_parts) if model_parts is not None else [(self.src_model_path, self.dest_model_path)] + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/quadlet.py` around lines 31 - 37, The code currently assigns caller-provided model_parts directly to self.model_parts and then mutates it when adding draft paths; instead make a defensive copy so callers' lists aren't modified: if model_parts is not None set self.model_parts = list(model_parts) (or model_parts.copy()) or build a new list from the tuples, then, if draft_model_paths is present, extract src_draft_model_path/dest_draft_model_path, strip the "oci://" prefix as before and append the new tuple to self.model_parts; ensure this change touches the assignment of self.model_parts and the draft_model_paths append logic so callers (e.g., _get_all_model_part_paths) are never mutated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ramalama/kube.py`:
- Around line 29-31: The volume name key 'model_draft' in the model_paths
assignment should be changed to a DNS-1123-compliant name (use 'model-draft') so
generated Kubernetes volumes use hyphens not underscores: update the assignment
in kube.py where self.model_paths = {'model': model_paths} and the conditional
that adds 'model_draft' to instead add 'model-draft'; also update the test
fixture references in test/unit/data/test_kube/draft_model_hostpath.yaml
(replace model_draft → model-draft) so tests match the new name. Ensure any code
paths that read that key (look for references to 'model_draft' in this module)
are updated to the new 'model-draft' symbol; leave other pre-existing keys like
chat_template/mmproj as-is for now.
---
Nitpick comments:
In `@ramalama/quadlet.py`:
- Around line 31-37: The code currently assigns caller-provided model_parts
directly to self.model_parts and then mutates it when adding draft paths;
instead make a defensive copy so callers' lists aren't modified: if model_parts
is not None set self.model_parts = list(model_parts) (or model_parts.copy()) or
build a new list from the tuples, then, if draft_model_paths is present, extract
src_draft_model_path/dest_draft_model_path, strip the "oci://" prefix as before
and append the new tuple to self.model_parts; ensure this change touches the
assignment of self.model_parts and the draft_model_paths append logic so callers
(e.g., _get_all_model_part_paths) are never mutated.
In `@ramalama/stack.py`:
- Line 33: Accessing args.model_draft directly can raise AttributeError if the
attribute is missing; change the creation of draft_model to use getattr(args,
"model_draft", None) and only call New(...) when that value is not None
(preserve the existing conditional) so the Stack constructor (where draft_model
is set and New is used) is resilient to args objects that lack model_draft.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 273b3e24-5a9c-4f09-9613-32f1e196768b
📒 Files selected for processing (14)
docs/options/model-draft.mddocs/ramalama-run.1.md.inramalama/compose.pyramalama/kube.pyramalama/plugins/runtimes/inference/llama_cpp.pyramalama/quadlet.pyramalama/stack.pyramalama/transports/base.pytest/unit/data/test_compose/with_model_draft.yamltest/unit/data/test_kube/draft_model_hostpath.yamltest/unit/data/test_quadlet/draft_model/tinyllama.containertest/unit/test_compose.pytest/unit/test_kube.pytest/unit/test_quadlet.py
✅ Files skipped from review due to trivial changes (3)
- test/unit/data/test_quadlet/draft_model/tinyllama.container
- test/unit/data/test_compose/with_model_draft.yaml
- docs/ramalama-run.1.md.in
🚧 Files skipped from review as they are similar to previous changes (5)
- ramalama/plugins/runtimes/inference/llama_cpp.py
- ramalama/compose.py
- test/unit/test_quadlet.py
- test/unit/test_compose.py
- ramalama/transports/base.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
ramalama/kube.py (1)
55-63:⚠️ Potential issue | 🟠 MajorOCI volume mounts still ignore destination path in the non-host branch.
At Line 61,
mountPathis hardcoded toMNT_DIR, while the host-path branch (Line 104) correctly usesdest_model_path. This can mount draft/secondary OCI-backed models to the wrong container path.🐛 Proposed fix
else: subPath = "" if not self.artifact: subPath = """ subPath: /models""" + mount_path = dest_model_path or MNT_DIR mounts += f""" - - mountPath: {MNT_DIR}{subPath} + - mountPath: {mount_path}{subPath} name: {volume_name}""" volumes += self._gen_oci_volume(volume_name, src_model_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/kube.py` around lines 55 - 63, In ramalama/kube.py fix the OCI (non-host) branch so mountPath uses the intended destination path instead of always MNT_DIR: inside the else branch where mounts and volumes are built (references: self.artifact, MNT_DIR, volume_name, src_model_path, dest_model_path, _gen_oci_volume) replace the hardcoded mountPath with the same dest_model_path logic used in the host-path branch (preserving the subPath behavior when not self.artifact) so OCI-backed models mount to the correct container path and keep volumes populated via self._gen_oci_volume(volume_name, src_model_path).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ramalama/kube.py`:
- Around line 98-117: The two helper methods _gen_path_volume and
_gen_oci_volume lack type hints; update their signatures to include parameter
and return types (use volume_name: str, src_model_path: str, dest_model_path:
str for _gen_path_volume and volume_name: str, src_model_path: str for
_gen_oci_volume) and annotate both returns as Tuple[str, str] (for
_gen_path_volume) and str or Tuple[str] as appropriate—specifically change
_gen_path_volume(...) -> Tuple[str, str] and _gen_oci_volume(...) -> str (or
Tuple[str, str] if you prefer consistent tuple returns), leveraging the
already-imported Tuple type so mypy/type-checking passes.
---
Duplicate comments:
In `@ramalama/kube.py`:
- Around line 55-63: In ramalama/kube.py fix the OCI (non-host) branch so
mountPath uses the intended destination path instead of always MNT_DIR: inside
the else branch where mounts and volumes are built (references: self.artifact,
MNT_DIR, volume_name, src_model_path, dest_model_path, _gen_oci_volume) replace
the hardcoded mountPath with the same dest_model_path logic used in the
host-path branch (preserving the subPath behavior when not self.artifact) so
OCI-backed models mount to the correct container path and keep volumes populated
via self._gen_oci_volume(volume_name, src_model_path).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ebc5774 to
73a10d6
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
ramalama/kube.py (1)
55-63:⚠️ Potential issue | 🟠 MajorThe OCI branch still ignores the per-model destination path.
Line 61 always mounts to
MNT_DIR, while the hostPath branch mounts each model at its owndest_model_path. As soon as one ofmodel/model-draftis OCI-backed, that shared directory mount can hide the other model, so only one path remains visible tollama-server. This branch needs a per-volumemountPath/subPathderived fromdest_model_path; the new hostPath fixture will not catch this.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ramalama/kube.py` around lines 55 - 63, The OCI branch currently mounts all OCI models to the shared MNT_DIR which hides other models; update the block that builds mounts/volumes (the variables mounts, volumes) to compute a per-model mountPath/subPath from the model's dest_model_path instead of always using MNT_DIR: derive a safe relative subPath (e.g., strip the MNT_DIR prefix or compute relative path) and set mountPath to either dest_model_path or f"{MNT_DIR}{subPath}", then call _gen_oci_volume(volume_name, src_model_path) as before; use the existing self.artifact flag only to decide whether to include a subPath entry, but ensure the per-volume mountPath/subPath matches dest_model_path so each OCI-backed model gets its own mount rather than a shared directory.
🧹 Nitpick comments (1)
test/unit/data/test_kube/draft_model_hostpath.yaml (1)
24-25: Exercise the draft-model flag in this expected output too.This fixture still invokes
llama-serverwith only--model. Together with the paired input intest/unit/test_kube.py(Lines 76-89), the new case proves the draft file is mounted, but not that generated Kubernetes actually wires the draft model into the server. Please add--model-draft /mnt/models/model_draft.filein both places.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/unit/data/test_kube/draft_model_hostpath.yaml` around lines 24 - 25, The fixture invokes llama-server with only --model, so it doesn't assert the draft model is wired; update the expected container args for the process named or invocations using "llama-server" (the args array in the draft_model_hostpath.yaml fixture) to include the draft flag by adding "--model-draft", "/mnt/models/model_draft.file"; also update the corresponding test input in test/unit/test_kube.py that builds/compares this case (the test block around the existing llama-server args) to include the same "--model-draft /mnt/models/model_draft.file" so the test asserts the generated Kubernetes mounts the draft model into the server invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@ramalama/kube.py`:
- Around line 55-63: The OCI branch currently mounts all OCI models to the
shared MNT_DIR which hides other models; update the block that builds
mounts/volumes (the variables mounts, volumes) to compute a per-model
mountPath/subPath from the model's dest_model_path instead of always using
MNT_DIR: derive a safe relative subPath (e.g., strip the MNT_DIR prefix or
compute relative path) and set mountPath to either dest_model_path or
f"{MNT_DIR}{subPath}", then call _gen_oci_volume(volume_name, src_model_path) as
before; use the existing self.artifact flag only to decide whether to include a
subPath entry, but ensure the per-volume mountPath/subPath matches
dest_model_path so each OCI-backed model gets its own mount rather than a shared
directory.
---
Nitpick comments:
In `@test/unit/data/test_kube/draft_model_hostpath.yaml`:
- Around line 24-25: The fixture invokes llama-server with only --model, so it
doesn't assert the draft model is wired; update the expected container args for
the process named or invocations using "llama-server" (the args array in the
draft_model_hostpath.yaml fixture) to include the draft flag by adding
"--model-draft", "/mnt/models/model_draft.file"; also update the corresponding
test input in test/unit/test_kube.py that builds/compares this case (the test
block around the existing llama-server args) to include the same "--model-draft
/mnt/models/model_draft.file" so the test asserts the generated Kubernetes
mounts the draft model into the server invocation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 655557d8-394b-4c4b-a003-d43baf82cf42
📒 Files selected for processing (3)
ramalama/kube.pytest/unit/data/test_kube/draft_model_hostpath.yamltest/unit/data/test_kube/with_chat_template.yaml
|
/ok-to-test |
|
When a draft model is specified, I would expect to see |
73a10d6 to
afef4f7
Compare
rhatdan
left a comment
There was a problem hiding this comment.
Thanks @mkristian
LGTM other then one nit.
2421b44 to
e7cc56c
Compare
|
/ok-to-test |
|
I can not see the errors on Red Hat Konflux. How can I be of help ? |
| else: | ||
| raise Exception("Invalid generate option") | ||
|
|
||
| assert len(re.findall(r'/mnt/models/smollm-135', content)) == 2 |
There was a problem hiding this comment.
On big-endian systems (s390x), test_model resolves to stories-be:260k, so this would need to check for /mnt/models/stories260K-be.gguf.
There was a problem hiding this comment.
I pushed a fix, but can not test it. A little research said that the draft model using Q4 quants should work for both big and little endian systems.
There was a problem hiding this comment.
You want to use:
assert len(re.findall(r'/mnt/models/stories260K-be', content)) == 2
For reference, here one of the test errors showing the generated yaml:
test/e2e/test_serve.py:672: in test_serve_generation
assert len(re.findall(r'/mnt/models/stories-be', content)) == 2
E assert 0 == 2
E + where 0 = len([])
E + where [] = <function findall at 0x3ff9e4aea30>('/mnt/models/stories-be', '# Save the output of this file and use kubectl create -f to import\n# it into Kubernetes.\n#\n# Created with ramalama-0.20.0\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: test\n labels:\n app: test\nspec:\n replicas: 1\n selector:\n matchLabels:\n app: test\n template:\n metadata:\n labels:\n app: test\n spec:\n containers:\n - name: test\n image: quay.io/redhat-user-workloads/ramalama-tenant/ramalama@sha256:a371324628adff9cc3a74251e2926826131100495a1d211ddc2a1ac2d6167930\n command: ["llama-server"]\n args: [\'--host\', \'::\', \'--port\', \'1234\', \'--model\', \'/mnt/models/stories260K-be.gguf\', \'--no-warmup\', \'--alias\', \'taronaeo/tinyllamas-BE/stories260K-be.gguf\', \'--temp\', \'0.8\', \'--threads\', \'8\']\n\n securityContext:\n allowPrivilegeEscalation: false\n capabilities:\n drop:\n - CAP_CHOWN\n - CAP_FOWNER\n - CAP_FSETID\n - CAP_KILL\n - CAP_NET_BIND_SERVICE\n - CAP_SETFCAP\n - CAP_SETGID\n - CAP_SETPCAP\n - CAP_SETUID\n - CAP_SYS_CHROOT\n add:\n - CAP_DAC_OVERRIDE\n seLinuxOptions:\n type: spc_t\n ports:\n - containerPort: 1234\n volumeMounts:\n - mountPath: /mnt/models/stories260K-be.gguf\n name: model\n volumes:\n - hostPath:\n path: /tmp/.local/share/ramalama/store/huggingface/taronaeo/tinyllamas-BE/stories260K-be.gguf/blobs/sha256-276d6f9d9050e79670a113492331dc52b3568690b03f5e633521cbed020bfc1c\n name: model\n')
E + where <function findall at 0x3ff9e4aea30> = re.findall
------------------------------------------------- Captured stderr call -------------------------------------------------
Using cached huggingface://stories260K-be.gguf:latest ...
Downloading hf://ggml-org/gemma-3-270m-it-qat-GGUF:Q4_0 ...
Trying to pull hf://ggml-org/gemma-3-270m-it-qat-GGUF:Q4_0 ...
Downloading gemma-3-270m-it-qat-Q4_0.gguf
16c2e95 to
e6fd768
Compare
|
/ok-to-test |
|
/ok-to-test |
…iles Fixes most of containers#2642 Signed-off-by: Christian Meier <meier.kristian@gmail.com>
Signed-off-by: Christian Meier <meier.kristian@gmail.com>
fix draft-model unit tests Signed-off-by: Christian Meier <meier.kristian@gmail.com>
Signed-off-by: Christian Meier <meier.kristian@gmail.com>
|
Thanks @mkristian |
This fixes the model-draft:
Does not address #2225