Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
1040e68
feat: add initial dflash implementation
0xClandestine Apr 21, 2026
e1ea48f
fix(dflash): load hiddenNorm weight + streaming + prefetch + asyncEval
0xClandestine Apr 21, 2026
7820436
feat: selective safetensors loader — skip expert weight data with SSD…
0xClandestine Apr 21, 2026
9b91b4d
feat: add timings (tok/s, token count, duration) to all API responses
0xClandestine Apr 21, 2026
d6fdef4
feat: add bench_35b.sh benchmark script
0xClandestine Apr 21, 2026
485a929
feat: add Qwen3Next SSD streaming + DFlash support
0xClandestine Apr 21, 2026
c1b90f1
feat: Gemma-4 QuantizedKVCache fix + Test 9 regression (mlx-swift-lm …
github-actions[bot] Apr 22, 2026
f007b3b
docs+fix: kv_bits README docs + address Copilot review on PR #73
github-actions[bot] Apr 22, 2026
ccccdeb
docs: expand Supported Models section to full architecture list
github-actions[bot] Apr 22, 2026
ed5f8f6
docs: remove GLM 5.1 from supported models (still on feature branch, …
github-actions[bot] Apr 22, 2026
e4a2036
Merge pull request #73 from SharpAI/fix/gemma4-quantizedkv-b440
solderzzc Apr 22, 2026
39015cf
make OpenAI streaming strict by default
jankaderabek Apr 22, 2026
64cbdfc
test: implement OpenCode e2e validation testing with gemma-4
github-actions[bot] Apr 22, 2026
762cfd3
fix: address Copilot review — defer heartbeat cleanup, tighten tests,…
github-actions[bot] Apr 22, 2026
73fcd44
fix(ci): guard grep/jq with || true to prevent set -e abort on no-mat…
github-actions[bot] Apr 22, 2026
743b1a1
Merge pull request #74 from jankaderabek/codex/openai-streaming-compat
solderzzc Apr 23, 2026
1005d3e
chore(agents): add review-github-pr workflow skill
github-actions[bot] Apr 23, 2026
975db48
chore(agents): document /opt/homebrew/bin/gh path in review-github-pr…
github-actions[bot] Apr 23, 2026
95303a5
fix(ssd-stream): prevent RAM explosion when --draft-model + --stream-…
github-actions[bot] Apr 23, 2026
8a04b2b
test(ssd-stream): add regression suite for Issue #72 SSD budget with …
github-actions[bot] Apr 23, 2026
9b0a31c
fix(ssd-stream): address Copilot review on PR #76
github-actions[bot] Apr 23, 2026
336c8a8
Merge pull request #76 from SharpAI/fix/issue-72-draft-model-ssd-ram
solderzzc Apr 23, 2026
5390216
fix(ssd-stream): prevent inference-time swap explosion with --draft-m…
github-actions[bot] Apr 23, 2026
f2ab918
refactor(dflash/kernels): branchless mask via metal::select + 2D kern…
0xClandestine Apr 23, 2026
464b959
feat(dflash): add MambaSnapshotCache + dflashUseTapeRollback protocol…
0xClandestine Apr 23, 2026
a2c8102
feat: add DFlashKernelBench micro-benchmark target
0xClandestine Apr 23, 2026
0d96a5e
feat(bench): add JSON result export to bench_35b.sh; add bench_coder_…
0xClandestine Apr 23, 2026
108f0c2
test: reorganize DFlash test suite into tests/DFlash/
0xClandestine Apr 23, 2026
dfd0935
fix(ssd-stream): auto-cap draft tokens to 1 when --stream-experts + -…
github-actions[bot] Apr 23, 2026
7a14a67
test(benchmark): add Test 10 — Issue #72 SSD + draft model RAM regres…
github-actions[bot] Apr 23, 2026
3f6bad5
ci: add ssd-draft-memory-guard job + vm_stat readings for Issue #72
github-actions[bot] Apr 23, 2026
bb29e36
docs: document --stream-experts + --draft-model auto-cap strategy (Is…
github-actions[bot] Apr 23, 2026
be8353f
fix(ci): repair YAML corruption in ci.yml (retention-days merged with…
github-actions[bot] Apr 23, 2026
c8b236d
ci: trigger run after YAML fix
github-actions[bot] Apr 23, 2026
7d150f9
refactor(Qwen3Next): move DFlashTargetModel conformance to SwiftLM ex…
0xClandestine Apr 23, 2026
58249c2
fix(ci): use bash variable for PID in ssd-draft-memory-guard
github-actions[bot] Apr 23, 2026
7b0bfd4
fix: address Copilot review feedback on PR #77
github-actions[bot] Apr 23, 2026
8385350
fix: allow custom model selection in benchmark test 10
github-actions[bot] Apr 23, 2026
c680a47
Merge remote-tracking branch 'upstream/main' into feat/add-dflash
0xClandestine Apr 23, 2026
b33801a
Merge pull request #77 from SharpAI/fix/issue-72-draft-model-ssd-ram
solderzzc Apr 23, 2026
a52bd07
fix: resolve DFlash protocol conformance and build blockers
github-actions[bot] Apr 23, 2026
2ea4e96
fix: address Copilot review on PR #78
0xClandestine Apr 23, 2026
602f940
fix(bench): increase server wait timeout to 3600s to allow large mode…
github-actions[bot] Apr 23, 2026
6f0c670
docs: add DFlash parameters to README CLI options list
github-actions[bot] Apr 23, 2026
7dcdaf4
chore: bump mlx-swift-lm submodule to b447
github-actions[bot] Apr 23, 2026
60d88e4
fix: restore DFlashRollbackCache protocol and clean dead extension
github-actions[bot] Apr 23, 2026
0360ea9
Merge remote-tracking branch 'origin/main' into pr-78
github-actions[bot] Apr 23, 2026
f629f63
test(dflash): fix submodule pin and add E2E tests
github-actions[bot] Apr 23, 2026
7e7ccd1
fix(benchmark): exit early on DFlash tests to avoid model prompt
github-actions[bot] Apr 23, 2026
fd84f80
chore: move dflash benchmark scripts to profiling dir
github-actions[bot] Apr 23, 2026
5553bf5
fix: disable prompt cache for MambaCache hybrid models (Qwen3Next)
github-actions[bot] Apr 23, 2026
2d537d6
fix: use SUITE_OPT env var to bypass menu in matrix sub-processes
github-actions[bot] Apr 23, 2026
0dba57a
fix: suppress interactive menu in sub-process invocations
github-actions[bot] Apr 23, 2026
b7dcd53
fix: remove stray banner echo outside SUITE_OPT guard
github-actions[bot] Apr 23, 2026
5581f38
fix: add 'Using speculative decoding' log line for CI test assertions
github-actions[bot] Apr 23, 2026
4c042a6
fix: add required log lines to DFlash draft model load path
github-actions[bot] Apr 24, 2026
069a75f
feat: add DFlashTargetModel conformance for Qwen3, Qwen3MoE, and Llama
0xClandestine Apr 24, 2026
9fc993c
fix(ci): skip omni test gracefully when RAM is insufficient
github-actions[bot] Apr 24, 2026
b224692
Revert "fix(ci): skip omni test gracefully when RAM is insufficient"
github-actions[bot] Apr 24, 2026
313fa91
feat: add DeepSeek V3 and Kimi Linear DFlash support (Option B)
0xClandestine Apr 24, 2026
0e79358
fix: resolve CI GPU timeouts on 7GB runners by fixing Memory limit sp…
github-actions[bot] Apr 24, 2026
13505e6
Merge 313fa91 from clandestine
github-actions[bot] Apr 24, 2026
d6bcf66
fix: correct weight key paths for DeepseekV3 and KimiLinear models
0xClandestine Apr 24, 2026
b5037f6
fix: strip language_model. prefix, remove stale expert keys, raise FD…
0xClandestine Apr 24, 2026
91e32af
fix: cap Metal command buffer size during swap-assisted inference to …
github-actions[bot] Apr 24, 2026
2707be9
fix: prevent Metal GPU Watchdog timeout on low-RAM CI runners
github-actions[bot] Apr 24, 2026
9533e45
feat: DeepSeek-V4 support via mlx-swift-lm b463
solderzzc Apr 24, 2026
0212b14
fix: README table shows physical RAM, not misleading virtual allocati…
solderzzc Apr 24, 2026
05d0b6c
fix: remove virtual allocation reference from DeepSeek key takeaways …
solderzzc Apr 24, 2026
65d74a9
Merge origin/main to resolve conflicts
github-actions[bot] Apr 24, 2026
29f3816
Merge pull request #78 from 0xClandestine/feat/add-dflash
solderzzc Apr 24, 2026
53b040d
fix(server): prompt-cache bleed fixes + perf table (#85)
github-actions[bot] Apr 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
216 changes: 216 additions & 0 deletions .agents/workflows/review-github-pr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
---
description: Review a GitHub Issue or PR for SharpAI/SwiftLM — fetch, analyze, implement fixes, address review comments, and push back to the correct branch
---

# Review GitHub Issue / PR

This workflow guides end-to-end handling of a GitHub Issue or Pull Request for the
`SharpAI/SwiftLM` repository: from fetching context, through implementing or
reviewing code changes, to pushing a clean commit back to the correct fork branch.

---

## Prerequisites

- `gh` CLI path on macOS: **`/opt/homebrew/bin/gh`**
```bash
export PATH="/opt/homebrew/bin:$PATH"
which gh # → /opt/homebrew/bin/gh
```
- `gh` must be authenticated (`gh auth status`)
- Working directory: `/Users/simba/workspace/mlx-server`
- Remote `fork` may need to be added if pushing to a contributor's fork:
```bash
git remote add fork https://github.com/<contributor>/SwiftLM.git
```

---

## Steps

### 1. Fetch the Issue or PR

Determine whether the user supplied an **Issue number** or a **PR number**, then
pull the full context using `gh`:

```bash
# For a PR
gh pr view <NUMBER> --repo SharpAI/SwiftLM \
--json number,title,body,state,baseRefName,headRefName,headRepository,commits,files

# For an Issue
gh issue view <NUMBER> --repo SharpAI/SwiftLM \
--json number,title,body,state,labels,comments
```

Note the **`headRepository`** field — if it is not `SharpAI/SwiftLM`, the PR comes
from a fork. You must push back to the fork's branch (see Step 6).

---

### 2. Understand the Scope

Read the PR/Issue body and associated comments carefully. Identify:

- **Category** — bug fix, feature, test improvement, CI/CD, documentation.
- **Files touched** — run `gh pr diff <NUMBER> --repo SharpAI/SwiftLM` or read
the `files` field.
- **CI status** — check the latest run:
```bash
gh run list --repo SharpAI/SwiftLM --branch <headRefName> --limit 3
```
- **Review comments** — if Copilot or a human left inline review comments, read
them all before writing a single line of code:
```bash
gh pr view <NUMBER> --repo SharpAI/SwiftLM --comments
```

---

### 3. Check Out the Branch Locally

```bash
# If the PR is from SharpAI directly
git fetch origin
git checkout <headRefName>

# If the PR is from a fork
git remote add fork https://github.com/<forkOwner>/SwiftLM.git # once only
git fetch fork <headRefName>
git checkout -b <headRefName> fork/<headRefName>
```

Verify you are on the correct branch:
```bash
git status
git log --oneline -5
```

---

### 4. Triage Review Comments (for PRs)

For each Copilot or human review comment:

1. **Classify** the severity:
- 🔴 **Must fix** — correctness bugs, resource leaks, race conditions, broken CI.
- 🟡 **Should fix** — test coverage gaps, false-pass logic, missing imports.
- 🟢 **Optional** — style, wording, architecture refactors beyond the PR scope.

2. **Implement** all 🔴 and 🟡 items. For 🟢 items, document them as follow-up
work in a code comment or GitHub comment but do not expand the PR scope.

3. **Key patterns learned from SwiftLM history**:
- Shell scripts use `set -euo pipefail` — every `grep`, `jq`, or pipeline that
may produce no output **must** be guarded with `|| true` or placed inside an
`if` condition to prevent silent script abort.
- Heartbeat / background `Task` objects in Swift **must** be cancelled via
`defer { task?.cancel() }` so all exit paths (including client disconnect)
are covered — not just the happy path.
- CORS-related shell tests must target the dedicated `--cors` server instance,
not the main server started without the flag.
- Concurrent-request tests must use `--parallel N` (N ≥ 2) to actually exercise
parallel code paths.
- When adding new Swift test files that use `Data` / `JSONSerialization`,
always add `import Foundation` — XCTest does not re-export it in all SPM environments.

---

### 5. Verify Locally

Build and run the relevant test suite before pushing:

```bash
# Swift unit tests
swift test --filter SwiftLMTests

# Integration tests (server)
./tests/test-server.sh .build/release/SwiftLM 15413

# OpenCode / SDK compatibility test
./tests/test-opencode.sh .build/release/SwiftLM 15414
```

If CI previously failed with a specific test number, reproduce it locally first:
```bash
gh run view <RUN_ID> --repo SharpAI/SwiftLM --log-failed 2>&1 | grep -E "FAIL|error|Test [0-9]+"
```

---

### 6. Commit and Push to the Correct Remote

> [!IMPORTANT]
> Always push to the **fork's branch** when updating a fork-originated PR.
> Pushing to `origin` (SharpAI) creates a new branch and does NOT update the PR.

```bash
git add <files>
git commit -m "<type>(<scope>): <summary>

<body: what changed and why>"

# PR from a fork → push to fork
git push fork <headRefName>:<headRefName>

# PR from SharpAI directly → push to origin
git push origin <headRefName>
```

Verify the PR was updated:
```bash
gh pr view <NUMBER> --repo SharpAI/SwiftLM --json commits --jq '.commits[].messageHeadline'
```

---

### 7. Monitor CI

After pushing, monitor the triggered workflow:

```bash
# List recent runs on the branch
gh run list --repo SharpAI/SwiftLM --branch <headRefName> --limit 5

# Stream logs for the latest run
gh run view <RUN_ID> --repo SharpAI/SwiftLM --log

# Pull only failed steps
gh run view <RUN_ID> --repo SharpAI/SwiftLM --log-failed 2>&1 | grep -E "FAIL|error|exit code"
```

If tests fail, go back to Step 4. Iterate until CI is green.

---

### 8. Respond to Reviewers (Optional)

If a human or Copilot reviewer left inline comments that you have addressed,
leave a reply comment summarising what was changed and why each item was handled
(or deferred):

```bash
gh pr comment <NUMBER> --repo SharpAI/SwiftLM \
--body "Addressed all 🔴/🟡 review comments in commit <SHA>:
- heartbeat leak: added defer cleanup in both streaming handlers
- import Foundation: added to ServerSSETests.swift
- CORS test: redirected to CORS_PORT server
- parallel test: dedicated --parallel 2 server on PORT+3
- set -e trap: guarded grep/jq pipelines with || true"
```

---

## Quick Reference

| Task | Command |
|------|---------|
| View PR | `gh pr view <N> --repo SharpAI/SwiftLM` |
| View PR diff | `gh pr diff <N> --repo SharpAI/SwiftLM` |
| View PR comments | `gh pr view <N> --repo SharpAI/SwiftLM --comments` |
| View Issue | `gh issue view <N> --repo SharpAI/SwiftLM` |
| List CI runs | `gh run list --repo SharpAI/SwiftLM --branch <branch>` |
| Failed CI logs | `gh run view <ID> --repo SharpAI/SwiftLM --log-failed` |
| Push to fork | `git push fork <branch>:<branch>` |
| Push to SharpAI | `git push origin <branch>` |
| Verify PR commits | `gh pr view <N> --repo SharpAI/SwiftLM --json commits --jq '.commits[].messageHeadline'` |
Loading
Loading