Skip to content

Commit 7f30880

Browse files
authored
Merge pull request #57 from FluffyAIcode/AgentMemory/v030-pr-e2-mac-m4-self-hosted-runner-8e7f
PR-E2 (ADR 0008 §6.5): self-hosted Mac M4 GitHub Actions integration workflow
2 parents 5481ffa + a0397c9 commit 7f30880

3 files changed

Lines changed: 362 additions & 0 deletions

File tree

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
name: Auto-label needs-mac-m4
2+
3+
# Auto-applies the ``needs-mac-m4`` label to PRs that touch
4+
# verifier-dependent code paths so the integration workflow
5+
# (.github/workflows/integration.yaml) runs without contributors
6+
# remembering to apply the label by hand.
7+
#
8+
# A PR opts INTO Mac M4 review by editing files under any of:
9+
# inference_engine/ — runtime, scheduler, session, server, etc.
10+
# sdks/ — Python + TypeScript SDK
11+
# proto/ — protobuf wire contract
12+
# tests/integration/ — the integration suite itself
13+
# kv_cache_proposer/ — verifier + decoder
14+
#
15+
# A doc-only PR (touching only docs/, README.md, etc.) does NOT
16+
# trigger Mac M4 review, saving runner time.
17+
#
18+
# Once labelled, the integration workflow auto-fires; once a PR
19+
# lands without the label, the integration workflow auto-skips.
20+
21+
on:
22+
pull_request_target:
23+
types: [opened, synchronize, reopened]
24+
branches: [main]
25+
26+
permissions:
27+
pull-requests: write
28+
29+
jobs:
30+
label:
31+
runs-on: ubuntu-latest
32+
steps:
33+
- name: Label PRs touching verifier-dependent paths
34+
uses: actions/github-script@v7
35+
with:
36+
github-token: ${{ secrets.GITHUB_TOKEN }}
37+
script: |
38+
const pr = context.payload.pull_request;
39+
const labelName = "needs-mac-m4";
40+
41+
// Pull the diff file list. github-script gives us the
42+
// full octokit; pagination matters for >100-file PRs
43+
// but in practice the v0.3 PRs are well under that.
44+
const { data: files } = await github.rest.pulls.listFiles({
45+
owner: context.repo.owner,
46+
repo: context.repo.repo,
47+
pull_number: pr.number,
48+
per_page: 100,
49+
});
50+
51+
const triggers = [
52+
"inference_engine/",
53+
"sdks/",
54+
"proto/",
55+
"tests/integration/",
56+
"kv_cache_proposer/",
57+
];
58+
59+
const matched = files.some(f =>
60+
triggers.some(t => f.filename.startsWith(t))
61+
);
62+
63+
const hasLabel = pr.labels.some(l => l.name === labelName);
64+
65+
if (matched && !hasLabel) {
66+
core.info(`Adding ${labelName} (PR touches verifier-dependent paths).`);
67+
await github.rest.issues.addLabels({
68+
owner: context.repo.owner,
69+
repo: context.repo.repo,
70+
issue_number: pr.number,
71+
labels: [labelName],
72+
});
73+
} else if (!matched && hasLabel) {
74+
// PR was previously labelled; subsequent push removed
75+
// all verifier-dependent file edits. Drop the label
76+
// so the integration workflow doesn't burn runner
77+
// time on doc-only updates.
78+
core.info(`Removing ${labelName} (no verifier-dependent paths).`);
79+
await github.rest.issues.removeLabel({
80+
owner: context.repo.owner,
81+
repo: context.repo.repo,
82+
issue_number: pr.number,
83+
name: labelName,
84+
});
85+
} else {
86+
core.info(
87+
`No-op: matched=${matched} hasLabel=${hasLabel}.`
88+
);
89+
}

.github/workflows/integration.yaml

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
name: Integration (Mac M4)
2+
3+
# Self-hosted runner workflow that runs the integration suite under
4+
# tests/integration/ against real Qwen3-0.6B on Apple Silicon.
5+
#
6+
# Trigger model:
7+
# - Pull-request events. Only fires when the PR carries the
8+
# ``needs-mac-m4`` label (auto-applied by .github/workflows/
9+
# auto-label-mac.yaml when a PR touches inference_engine/,
10+
# sdks/, proto/, or tests/integration/). PRs that don't touch
11+
# verifier-dependent code skip this gate entirely so the runner
12+
# pool isn't burned on doc-only or CI-only PRs.
13+
# - Manual workflow_dispatch for re-runs from the Actions UI.
14+
#
15+
# Runner requirements (self-hosted):
16+
# - macOS 14+ on Apple Silicon (M-series).
17+
# - Labels: [self-hosted, macOS, ARM64, kakeya-mac-m4].
18+
# - Pre-warmed HF cache containing Qwen/Qwen3-0.6B at
19+
# ~/.cache/huggingface/hub/ (avoids 10-minute first-run download).
20+
# - Python 3.12+ on PATH.
21+
# - At least 24 GB unified memory and ~50 GB free disk.
22+
#
23+
# See docs/ops/mac-m4-runner-setup.md for the one-time runner setup.
24+
25+
on:
26+
pull_request:
27+
# Only run on PR events for branches targeting main.
28+
types: [opened, synchronize, reopened, labeled]
29+
branches: [main]
30+
workflow_dispatch: {}
31+
32+
# Cancel superseded runs on the same PR — saves runner time when
33+
# the contributor pushes a new commit before the previous run
34+
# finishes.
35+
concurrency:
36+
group: integration-${{ github.ref }}
37+
cancel-in-progress: true
38+
39+
jobs:
40+
integration:
41+
name: pytest -m integration on Mac M4
42+
# Only fire on labeled PRs (this saves the runner pool from
43+
# doc-only / CI-only PRs that don't touch verifier-dependent
44+
# code). The auto-label workflow adds 'needs-mac-m4' on file
45+
# paths that warrant the GA gate.
46+
if: |
47+
github.event_name == 'workflow_dispatch' ||
48+
contains(github.event.pull_request.labels.*.name, 'needs-mac-m4')
49+
runs-on: [self-hosted, macOS, ARM64, kakeya-mac-m4]
50+
timeout-minutes: 90
51+
steps:
52+
- uses: actions/checkout@v4
53+
with:
54+
# Full history so the runner can compare against base for
55+
# any future rebase-based gating.
56+
fetch-depth: 0
57+
58+
- name: Verify host shape
59+
run: |
60+
echo "=== sysctl ==="
61+
sysctl -n hw.model || true
62+
sysctl -n hw.memsize || true
63+
sysctl -n machdep.cpu.brand_string || true
64+
echo "=== python ==="
65+
python3 --version
66+
python3 -c "import platform; print(platform.machine(), platform.platform())"
67+
68+
- name: Verify Qwen3-0.6B in HF cache
69+
run: |
70+
# Don't download here; the runner is expected to be
71+
# pre-warmed. If the model isn't cached the test loads
72+
# would hit HF and exceed the 90-min timeout. Surface a
73+
# clear error early.
74+
set -e
75+
MODEL_DIR="$HOME/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B"
76+
if [ ! -d "$MODEL_DIR" ]; then
77+
echo "::error::HF cache miss for Qwen/Qwen3-0.6B."
78+
echo "::error::Pre-warm the runner: python3 -c 'from transformers import AutoModelForCausalLM, AutoTokenizer; AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen3-0.6B\"); AutoTokenizer.from_pretrained(\"Qwen/Qwen3-0.6B\")'"
79+
exit 1
80+
fi
81+
echo "Found $MODEL_DIR"
82+
83+
- name: Install Python dependencies
84+
run: |
85+
# The runner is expected to have a long-lived venv.
86+
# If a per-run venv is preferred, swap to ``python3 -m venv .venv``.
87+
python3 -m pip install --upgrade pip
88+
python3 -m pip install -e .
89+
python3 -m pip install pytest pytest-asyncio pytest-timeout coverage
90+
91+
- name: Run integration suite
92+
env:
93+
PYTHONPATH: .:sdks/python
94+
# No HF download in tests; if we hit a cache miss it's a
95+
# bug or a stale runner.
96+
HF_HUB_OFFLINE: "1"
97+
run: |
98+
mkdir -p results/platform-tests
99+
stamp=$(date +%s)
100+
python3 -m pytest \
101+
-m integration \
102+
tests/integration/ \
103+
--junitxml="results/platform-tests/integration-mac-m4-${stamp}.junit.xml" \
104+
-v
105+
# Record the artifact path for the upload step below.
106+
echo "artifact_stamp=${stamp}" >> "$GITHUB_OUTPUT"
107+
id: pytest_run
108+
109+
- name: Upload JUnit + log artifacts
110+
if: always()
111+
uses: actions/upload-artifact@v4
112+
with:
113+
name: integration-mac-m4-${{ steps.pytest_run.outputs.artifact_stamp || github.run_id }}
114+
path: |
115+
results/platform-tests/integration-mac-m4-*.junit.xml
116+
retention-days: 30
117+
118+
- name: Surface failure summary
119+
if: failure()
120+
run: |
121+
# Tail the last few lines of the JUnit so the failure is
122+
# visible in the action log, not just inside the artifact.
123+
for f in results/platform-tests/integration-mac-m4-*.junit.xml; do
124+
echo "=== $f ==="
125+
python3 - "$f" <<'PY'
126+
import sys, xml.etree.ElementTree as ET
127+
r = ET.parse(sys.argv[1]).getroot()
128+
for tc in r.iter("testcase"):
129+
for child in tc:
130+
if child.tag in ("failure", "error"):
131+
print(f"[{child.tag.upper()}] {tc.get('classname')}::{tc.get('name')}")
132+
msg = (child.get("message") or "").splitlines()
133+
if msg:
134+
print(f" {msg[0][:180]}")
135+
PY
136+
done

docs/ops/mac-m4-runner-setup.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Mac M4 self-hosted runner setup
2+
3+
This runner backs the **Integration (Mac M4)** GitHub Actions workflow
4+
(`.github/workflows/integration.yaml`). It runs `pytest -m integration`
5+
against real Qwen3-0.6B on every PR labelled `needs-mac-m4`
6+
(auto-applied by `.github/workflows/auto-label-mac.yaml` when a PR
7+
touches `inference_engine/`, `sdks/`, `proto/`, `tests/integration/`,
8+
or `kv_cache_proposer/`).
9+
10+
## Hardware requirements
11+
12+
| Resource | Minimum |
13+
| --- | --- |
14+
| Chip | Apple Silicon (M-series); M4 or newer recommended |
15+
| Unified memory | 24 GB (16 GB works for Qwen3-0.6B alone but no headroom for concurrent work) |
16+
| Free disk | ~50 GB (HF cache + venv + checkout history) |
17+
| Network | Reachable to github.com for runner registration; outbound to HF Hub for the one-time pre-warm |
18+
| OS | macOS 14 (Sonoma) or newer |
19+
20+
## One-time setup
21+
22+
### 1. Register the self-hosted runner
23+
24+
Follow [GitHub's docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners) to add a runner to the repository:
25+
26+
1. Repository → Settings → Actions → Runners → New self-hosted runner.
27+
2. Choose macOS / ARM64.
28+
3. Run the install + configure commands GitHub provides.
29+
4. **Important**: when prompted for labels, add `kakeya-mac-m4`
30+
in addition to the default `self-hosted, macOS, ARM64`. The
31+
workflow's `runs-on:` clause specifically requires that label.
32+
5. Run the runner as a launchd service (`./svc.sh install && ./svc.sh start`)
33+
so it survives reboots.
34+
35+
### 2. Pre-warm the HF cache
36+
37+
The integration workflow runs with `HF_HUB_OFFLINE=1` so it never
38+
hits HuggingFace at test time (avoids 90-min runs blocking on a 4 GB
39+
download). Pre-warm the cache once per runner:
40+
41+
```bash
42+
python3 -c "
43+
from transformers import AutoModelForCausalLM, AutoTokenizer
44+
AutoModelForCausalLM.from_pretrained('Qwen/Qwen3-0.6B')
45+
AutoTokenizer.from_pretrained('Qwen/Qwen3-0.6B')
46+
"
47+
```
48+
49+
The model lands at `~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/`.
50+
The workflow's "Verify Qwen3-0.6B in HF cache" step fails fast with
51+
a clear error if that directory is missing.
52+
53+
If a future test adds a new model id, update the pre-warm command
54+
(and the workflow's verify step) accordingly.
55+
56+
### 3. Install Python toolchain
57+
58+
The runner needs Python 3.12+. Use Homebrew or pyenv:
59+
60+
```bash
61+
brew install python@3.12
62+
# or:
63+
pyenv install 3.12.7
64+
pyenv global 3.12.7
65+
```
66+
67+
Confirm `python3 --version` returns 3.12.x and `python3 -c 'import platform; print(platform.machine())'` returns `arm64`.
68+
69+
### 4. (Optional) long-lived venv
70+
71+
The workflow currently does `pip install -e .` per run, which is
72+
~30 s on a warm pip cache. If you want to skip even that, create a
73+
venv at `~/kakeya-runner-venv` and add a step to the workflow that
74+
activates it before `pytest`. v0.3 keeps the per-run install for
75+
simplicity.
76+
77+
## Runtime expectations
78+
79+
| Phase | Wall time on M4 24 GB |
80+
| --- | --- |
81+
| Checkout + verify host | <5 s |
82+
| Verify HF cache | <1 s |
83+
| `pip install -e .` (warm pip) | 20-40 s |
84+
| `pytest -m integration` (80 tests, post-PR-N1..N4) | 60-120 s |
85+
| Artifact upload | <5 s |
86+
| **Total** | **~2-3 min** |
87+
88+
The 90-minute timeout in the workflow is a safety margin. A run
89+
that exceeds 5 min should be investigated — likely a model-load
90+
regression or a runaway test.
91+
92+
## Maintenance
93+
94+
### Cache hygiene
95+
96+
The runner's HF cache and pip cache grow over time. Recommend a
97+
monthly cron:
98+
99+
```bash
100+
# ~/clean-kakeya-runner.sh
101+
find ~/.cache/huggingface/hub -type d -mtime +60 -prune -name 'models--*' -exec rm -rf {} +
102+
python3 -m pip cache purge
103+
```
104+
105+
The Qwen3-0.6B cache is touched on every run, so `mtime +60` only
106+
prunes models added by future test additions that aren't currently
107+
exercised.
108+
109+
### Runner upgrades
110+
111+
GitHub publishes new runner versions ~monthly. Update via:
112+
113+
```bash
114+
cd ~/actions-runner
115+
./svc.sh stop
116+
./config.sh remove --token <repo-config-token>
117+
# download the new tarball per GitHub UI instructions
118+
./config.sh --url https://github.com/<owner>/<repo> --token <new-token>
119+
./svc.sh install && ./svc.sh start
120+
```
121+
122+
### Failure triage
123+
124+
Workflow failures are visible at `Actions → Integration (Mac M4)`. The "Surface failure summary" step inlines the test names + first-line error messages so triage doesn't require downloading the JUnit XML.
125+
126+
If the runner itself is offline (queue depth grows, no jobs pick up), check on the Mac:
127+
128+
```bash
129+
cd ~/actions-runner
130+
sudo ./svc.sh status
131+
tail -200 ~/Library/Logs/actions-runner/Runner_*.log
132+
```
133+
134+
Common causes:
135+
- macOS auto-update rebooted the host; service didn't auto-start (rare with `launchd` but possible).
136+
- HF cache was purged; the verify step fails. Re-warm.
137+
- Disk full from accumulated pip downloads; clear cache.

0 commit comments

Comments
 (0)