This runner backs the Integration (Mac M4) GitHub Actions workflow
(.github/workflows/integration.yaml). It runs pytest -m integration
against real Qwen3-0.6B on every PR labelled needs-mac-m4
(auto-applied by .github/workflows/auto-label-mac.yaml when a PR
touches inference_engine/, sdks/, proto/, tests/integration/,
or kv_cache_proposer/).
| Resource | Minimum |
|---|---|
| Chip | Apple Silicon (M-series); M4 or newer recommended |
| Unified memory | 24 GB (16 GB works for Qwen3-0.6B alone but no headroom for concurrent work) |
| Free disk | ~50 GB (HF cache + venv + checkout history) |
| Network | Reachable to github.com for runner registration; outbound to HF Hub for the one-time pre-warm |
| OS | macOS 14 (Sonoma) or newer |
Follow GitHub's docs to add a runner to the repository:
- Repository → Settings → Actions → Runners → New self-hosted runner.
- Choose macOS / ARM64.
- Run the install + configure commands GitHub provides.
- Important: when prompted for labels, add
kakeya-mac-m4in addition to the defaultself-hosted, macOS, ARM64. The workflow'sruns-on:clause specifically requires that label. - Run the runner as a launchd service (
./svc.sh install && ./svc.sh start) so it survives reboots.
The integration workflow runs with HF_HUB_OFFLINE=1 so it never
hits HuggingFace at test time (avoids 90-min runs blocking on a 4 GB
download). Pre-warm the cache once per runner:
python3 -c "
from transformers import AutoModelForCausalLM, AutoTokenizer
AutoModelForCausalLM.from_pretrained('Qwen/Qwen3-0.6B')
AutoTokenizer.from_pretrained('Qwen/Qwen3-0.6B')
"The model lands at ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/.
The workflow's "Verify Qwen3-0.6B in HF cache" step fails fast with
a clear error if that directory is missing.
If a future test adds a new model id, update the pre-warm command (and the workflow's verify step) accordingly.
The runner needs Python 3.12+. Use Homebrew or pyenv:
brew install python@3.12
# or:
pyenv install 3.12.7
pyenv global 3.12.7Confirm python3 --version returns 3.12.x and python3 -c 'import platform; print(platform.machine())' returns arm64.
The workflow currently does pip install -e . per run, which is
~30 s on a warm pip cache. If you want to skip even that, create a
venv at ~/kakeya-runner-venv and add a step to the workflow that
activates it before pytest. v0.3 keeps the per-run install for
simplicity.
| Phase | Wall time on M4 24 GB |
|---|---|
| Checkout + verify host | <5 s |
| Verify HF cache | <1 s |
pip install -e . (warm pip) |
20-40 s |
pytest -m integration (80 tests, post-PR-N1..N4) |
60-120 s |
| Artifact upload | <5 s |
| Total | ~2-3 min |
The 90-minute timeout in the workflow is a safety margin. A run that exceeds 5 min should be investigated — likely a model-load regression or a runaway test.
The runner's HF cache and pip cache grow over time. Recommend a monthly cron:
# ~/clean-kakeya-runner.sh
find ~/.cache/huggingface/hub -type d -mtime +60 -prune -name 'models--*' -exec rm -rf {} +
python3 -m pip cache purgeThe Qwen3-0.6B cache is touched on every run, so mtime +60 only
prunes models added by future test additions that aren't currently
exercised.
GitHub publishes new runner versions ~monthly. Update via:
cd ~/actions-runner
./svc.sh stop
./config.sh remove --token <repo-config-token>
# download the new tarball per GitHub UI instructions
./config.sh --url https://github.com/<owner>/<repo> --token <new-token>
./svc.sh install && ./svc.sh startWorkflow failures are visible at Actions → Integration (Mac M4). The "Surface failure summary" step inlines the test names + first-line error messages so triage doesn't require downloading the JUnit XML.
If the runner itself is offline (queue depth grows, no jobs pick up), check on the Mac:
cd ~/actions-runner
sudo ./svc.sh status
tail -200 ~/Library/Logs/actions-runner/Runner_*.logCommon causes:
- macOS auto-update rebooted the host; service didn't auto-start (rare with
launchdbut possible). - HF cache was purged; the verify step fails. Re-warm.
- Disk full from accumulated pip downloads; clear cache.
The same runner also serves the Mac bridge
(.github/workflows/mac-bridge.yaml): pushes to mac-bridge/**
branches execute an allowlisted preset (see
inference_engine/bridge/manifest.py) and commit logs/results back to
the request branch. Full protocol + security model:
docs/design/mac-bridge-cloud-agent-access.md.
One-click setup: on the Mac, from the repo root —
bash scripts/mac_bridge/setup_mac.sh (add
--runner-token <TOKEN> --repo-url <URL> on a fresh machine to install
and register the Actions runner too). The script is idempotent and ends
with a bridge self-test.
Operator setup beyond the standard runner install:
- Model locations (used by the
k3-*harness presets) are read from the environment / repo Actions variables, never from the request manifest. Set repo variables (Settings → Secrets and variables → Actions → Variables) when the on-disk layout differs from the defaults:KAKEYA_MAC_VERIFIER_PATH— MLX 4-bit Gemma-4 verifier directoryKAKEYA_MAC_DRAFTER_ID— DFlash drafter HF id or local pathKAKEYA_MAC_FTHETA_DIR— trained f_θ checkpoint directory
- Bridge runs are serialized (
concurrency: mac-bridge) and capped at 150 minutes; cancel stuck runs from the Actions UI. - K3 acceptance reports produced by bridge runs are validated by the evidence gate on this machine; a non-conforming report fails the bridge run (exit ≠ 0) by design.