feat(moe): parallel execution prefetch queue for SSD expert streaming by solderzzc · Pull Request #8 · SharpAI/mlx-swift-lm

solderzzc · 2026-04-10T05:56:24Z

No description provided.

* Add doc comment verification script and CI step * Discover doc verification targets dynamically and report all failures

Support multiple parallel tool calls and buffering for Llama 3 Llama 3 natively supports tool calling through an ipython environment which generates arrays for multiple parallel tool invocations. Depending on the model size and prompt, it generates either a JSON list of function objects or a python-style array of function calls. - Sets `startTag` to `<|python_tag|>` to ensure `ToolCallProcessor` correctly buffers tool output without leaking it to the streaming UI. - Upgrades `Llama3ToolCallParser` to parse multiple parallel tool calls from JSON array payloads `[{"name": ...}]` during `parseEOS`. - Upgrades `PythonicToolCallParser` to extract multiple sequential pythonic function calls `[func1(), func2()]` via `parseEOS`. - Refactors `PythonicToolCallParser` to use modern high-performance Swift 5.7+ Regex literals instead of legacy NSRegularExpression. - Add integration unit tests for both parsers to verify multi-call arrays.

…models (ml-explore#142) * Add Mistral3, Nemotron, and Qwen3.5 tool call integration test helpers * Add MLXLMIntegrationTests * Update documentation * Use actual asserts in IntegrationTestHelpers.swift --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…tions spm linkage

Avoid converting decoded JSON arguments through Any before constructing ToolCall.Function.\n\nThis fixes Swift 6 Sendable compilation errors in Llama3ToolCallParser during Release iOS builds.\n\nWritten by Codex.

…ations, and Gemma4VL support

… for Swift 6

…git Softcapping in Gemma4 configuration

…red KV cache

…ausal mask

…rmer mapping

…ping and KVCache alignments

…ent dense layout allocation crashes

…o VLM logic

…iformly

…er support

…ompt anchoring

…resolve missing xctest failure

…MLXArrays to natively resolve 'scale' unpacking

* Add gemma 4 model (text, vision, MoE) =

…ashes This commit resolves the persistent linguistic corruption issues by: 1. Fixing LCache retrieval for shared KV RoPE logic to properly align positional phases. 2. Aligning the Per-Layer Embedding application and RoPE proportional parameters with vLLM references. 3. Ripping out manual lm_head.weight injection from Gemma4VL sanitize that caused unhandled key fatal errors. 4. Enhancing KVCache dimensions and routing handling for Gemma4 architecture.

…utions and PLE conditioning scale preservation

* Add Gemma 4 text model support (E2B and E4B) Port of gemma4.py and gemma4_text.py from mlx-lm. Adds support for Gemma 4's text-only architecture including: - Per-Layer Embeddings (PLE) with gated residual - Shared KV cache across later layers - Dual RoPE (proportional for full attention, default for sliding) - ProportionalRoPE with partial_rotary_factor support - Global head dimensions (512) for full-attention layers - Double-wide MLP for KV-shared layers - Logit softcapping - LoRA support Registers gemma4 and gemma4_text model types plus E2B/E4B 4-bit model configurations. * Fix Gemma 4 model IDs and EOS tokens - Model IDs: use correct HuggingFace repo names (lowercase, no "-lm-") - gemma-4-E2B-it-lm-4bit → gemma-4-e2b-it-4bit - gemma-4-E4B-it-lm-4bit → gemma-4-e4b-it-4bit - EOS token: <end_of_turn> (Gemma 3) → <turn|> (Gemma 4, token ID 106) MLXArray.ones([1]) defaults to float32, which can cause dtype promotion when multiplied with bfloat16/float16 model tensors. Specify .float16 explicitly to avoid hidden AsType nodes. * Fix weight key mapping for language_model property Add @ModuleInfo(key: "language_model") so the property matches the snake_case key in checkpoint weight files. Without this, weight loading fails with keyNotFound for the language_model subtree. Reported-by: john-rocky (PR ml-explore#185 comment) * Address review feedback: remove force unwraps, use shared ProportionalRoPE - Make layerTypes non-optional in config (decode or derive from pattern) - Replace vProj! force unwrap with if let binding - Switch from local ProportionalRoPE to shared initializeRope() factory - Remove 60-line local ProportionalRoPE class (now in RoPEUtils.swift) --------- Co-authored-by: Stefan Geens <stefan.geens@gmail.com>

…scription interpretation natively

…wrappers

…erflow shattering on Apple Silicon without compromising base magnitudes

- see ml-explore#189 - download jinja files in all paths - macros use fully qualified type names

…ing structures before upstream sync

…s Gemma4 structures

…wift (MLXVLM) Upstream ml-explore/mlx-swift-lm now ships native Gemma4 VLM support with clean text/vision separation. Our custom Gemma4VL.swift is no longer needed. SSD streaming, speculative decoding, and Load.swift router patches are preserved.

…N-decoded configuration

…sed dependencies

…invalid identifiers)

…s on macOS 15

…de xctest bundles

…rrency crashes

…ppleParavirtCommandBuffer concurrency assertions on macOS runners

…solve Metal Paravirt concurrency faults

… prevent interleaving parameterized Metal invocations

…MLXTestingSuite trait to definitively fix Paravirt bounds assertion

…te using matmul instead of element-wise multiplication

… Swift 6 preconcurrency warnings

DePasqualeOrg and others added 30 commits April 6, 2026 13:55

Fix doc comments and verify in CI (ml-explore#176)

7d9a6ab

* Add doc comment verification script and CI step * Discover doc verification targets dynamically and report all failures

Add more documentation for integrations to readme (ml-explore#201)

780048f

feat(moe): parallel execution prefetch queue for SSD expert streaming

e53b701

fix(ci): restore remote url dependency for mlx-swift to fix github ac…

ad6bb69

…tions spm linkage

Fix links in readme (ml-explore#204)

490aad2

Preserve JSONValue in Llama3 Tool Calls (ml-explore#203)

7f169f9

Avoid converting decoded JSON arguments through Any before constructing ToolCall.Function.\n\nThis fixes Swift 6 Sendable compilation errors in Llama3ToolCallParser during Release iOS builds.\n\nWritten by Codex.

feat: Expose Omni modality UserInput.Audio structures, Chat represent…

b0fb4de

…ations, and Gemma4VL support

fix: Resolve missing brace syntax error in Gemma4 and add any KVCache…

b050edf

… for Swift 6

feat: Support global partial RoPE Base Frequencies and explicit QK-Lo…

325bfb2

…git Softcapping in Gemma4 configuration

fix: Resolve potential merge conflicts for Gemma4 softcapping and sha…

5de772e

…red KV cache

fix(gemma4): stabilize logit projection accuracy and mm weight sync

b55297b

fix(kvcache): handle empty states gracefully on reload and add safe c…

7d4bd1a

…ausal mask

feat(gemma4): implement native message generator for multimodal tokens

5c3f6e5

feat(audio): implement native Gemma-4 Omni Audio processing via Confo…

2a9283f

…rmer mapping

feat(vlm): stabilize Gemma 4 Omni integration bounds for Audio Subsam…

168c134

…ping and KVCache alignments

fix(gemma4): enforce explicit enableMoeBlock fallback mapping to prev…

c9227c1

…ent dense layout allocation crashes

feat: migrate and restructure STFT audio processing into MLXLMCommon

1193966

refactor(gemma4): integrate native MEL spectrogram audio pipeline int…

bd103cb

…o VLM logic

fix(vlm): mask per-layer text streams and scale multimodal vectors un…

689b6ae

…iformly

fix(gemma): use exact reshape for multimodal mask

d7b3426

chore: merge main into papps-ssd-streaming

6a66ca5

fix(gemma4): restore manual normalizations and inputEmbedding paramet…

93f23f7

…er support

fix(gemma4): prevent double scaling of inputEmbedding and fix omni pr…

11a43be

…ompt anchoring

fix(gemma4): update camelCase config coding keys to match JSON schema

a6c98c0

fix: safely unwrap Gemma 4 VLM configuration options

b871e30

fix: make LanguageModel and associated types Sendable for Swift 6

c0c2624

fix: use @unchecked Sendable for MLX-backed types in LanguageModel

9691a5b

revert: Sendable changes in LanguageModel to restore build sanity

1635f3d

Aegis-AI and others added 27 commits April 12, 2026 16:34

fix(sync): resolve overlapping omni-model tied word embedding logic

1ff327c

fix(gemma4): resolve missing MoE router projection weight during load

87f854e

fix: build tests in release mode and execute them with -c release to …

0fc3152

…resolve missing xctest failure

fix(gemma4): remove explicit ModuleInfo strings from internal Router …

71d6188

…MLXArrays to natively resolve 'scale' unpacking

Add gemma 4 model (text, vision, MoE) (ml-explore#180)

67b146e

* Add gemma 4 model (text, vision, MoE) =

fix(gemma4): resolve multimodal blindness via accurate spatial convol…

b5833b9

…utions and PLE conditioning scale preservation

fix(gemma4audio): remove manual audio bounding tokens inhibiting tran…

bc9c956

…scription interpretation natively

fix(gemma4): restore multimodal scale magnitudes and stabilize token …

112c45e

…wrappers

fix(gemma4): implement float32 safe RMSNorm wrapper, resolving NaN ov…

d16b6c0

…erflow shattering on Apple Silicon without compromising base magnitudes

small v3 api fixes (ml-explore#190)

c1ff9f8

- see ml-explore#189 - download jinja files in all paths - macros use fully qualified type names

fix(gemma4): finalize omni enhancements, audio logic, and native mask…

12664a1

…ing structures before upstream sync

Merge upstream/main into feature/papps-ssd-streaming, adopting Apple'…

c658591

…s Gemma4 structures

test(gemma4): rewrite Gemma4Tests to use upstream Gemma4Model and JSO…

8e5084e

…N-decoded configuration

fix(ci): support @testable imports in release builds and clean up unu…

fb5e9ba

…sed dependencies

fix(tests): resolve syntax errors in SpeculativeDecodingTests.swift (…

f47be0d

…invalid identifiers)

fix(ci): aggressive mlx.metallib injection for Metal accelerated test…

dc91a4e

…s on macOS 15

fix(ci): robust metallib injection into MacOS and Resources dirs insi…

8f106a4

…de xctest bundles

fix(tests): serialize all MLX-eval test suites to prevent Metal concu…

94ce982

…rrency crashes

fix(ci): disable parallel testing and serialize all suites to avoid A…

56c7966

…ppleParavirtCommandBuffer concurrency assertions on macOS runners

fix(tests): serialize remaining MLX-eval tests (KVCache, Pooling) to …

b80f257

…solve Metal Paravirt concurrency faults

fix(tests): fully encapsulate KVCacheTests inside serialized suite to…

e975a7d

… prevent interleaving parameterized Metal invocations

fix(tests): perfectly serialize all MLX suites by nesting them under …

91589f5

…MLXTestingSuite trait to definitively fix Paravirt bounds assertion

fix(tests): resolve matrix rank broadcasting logic in performance sui…

f9fdac5

…te using matmul instead of element-wise multiplication

fix(layers): explicitly capture arrays in concurrent loops to resolve…

d96b95b

… Swift 6 preconcurrency warnings

solderzzc merged commit 8c4a4f2 into main Apr 14, 2026
4 checks passed

solderzzc deleted the feature/papps-ssd-streaming branch April 14, 2026 05:06

solderzzc mentioned this pull request Apr 14, 2026

ci: automate semver tagging and github releases on main branch merge #11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(moe): parallel execution prefetch queue for SSD expert streaming#8

feat(moe): parallel execution prefetch queue for SSD expert streaming#8
solderzzc merged 66 commits into
mainfrom
feature/papps-ssd-streaming

solderzzc commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Uh oh!

Conversation

solderzzc commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants