readme: document findings — Ninja rationale, Metal verification, warning interpretation

apocryphx · claude · apocryphx · commit f62eb3caf1f4 · 2026-04-15T12:19:13.000-07:00
Adds information that was produced during the Apple-platform build investigation
but hadn't made it into the repo:

- Full text of the -G Xcode failure on CMake 4.x + iOS/tvOS/visionOS, with
  verification date, so future readers can search for the exact error string
- Note that the Xcode-specific -- -quiet build flag was dropped alongside the
  generator switch, and that -DCMAKE_XCODE_ATTRIBUTE_* args were kept as
  harmless Ninja no-ops
- Concrete commands to verify Metal shader embedding (nm | grep ggml_metallib)
- Explanation of the "Unknown CPU architecture" CMake warning — it's the x86_64
  CPU backend falling back to generic kernels, not a Metal fallback
- Why the smoke-test recipe uses llama-bench (llama-cli is coupled to
  LLAMA_BUILD_SERVER=ON in this upstream) and what good output looks like,
  with reference throughput numbers from the 2026-04-15 verification
- Post-rebase spot-check commands for future upstream syncs

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -6,12 +6,25 @@ The goal of this fork is narrow: keep a shippable xcframework building on a mode
 
 ## What this fork changes
 
-Two commits on top of upstream:
+Three commits on top of upstream (plus this README):
 
-1. **`cmake_minimum_required` bump** (`CMakeLists.txt`, `ggml/CMakeLists.txt`) — widen the accepted version range to `3.5...4.2` so CMake 4.x stops warning about removed policies.
-2. **`build-xcframework.sh` → Ninja generator** — the upstream script uses `-G Xcode`, which fails on CMake 4.x when cross-compiling for iOS/tvOS/visionOS (`The C compiler identification is unknown`). Switching to `-G Ninja` resolves it. `combine_static_libraries` call sites updated to drop the `Release-<sdk>/` subpath that Ninja (single-config) doesn't produce.
+1. **`cmake_minimum_required` bump** (`CMakeLists.txt`, `ggml/CMakeLists.txt`) — widens the accepted version range to `3.5...4.2` so CMake 4.x stops warning about removed policies. Upstream still pins to `3.14...3.28`.
+2. **`build-xcframework.sh` → Ninja generator** — see "Why Ninja" below. All 7 `cmake -B` invocations in the script now use `-G Ninja` instead of `-G Xcode`. The Xcode-only `-- -quiet` build argument was dropped. `combine_static_libraries` call sites now pass `.` as the `release_dir` because Ninja is single-config and emits archives directly under `src/`, not `src/Release-<sdk>/`.
+3. **Fork-focused README** — this file; upstream README moved to [README.upstream.md](README.upstream.md).
 
-Everything else is vanilla upstream.
+No C/C++/Objective-C source has been touched. No APIs added, removed, or renamed. No ggml backend modifications. Library behavior is byte-for-byte identical to upstream `b8802` for the same inputs.
+
+### Why Ninja
+
+On CMake 4.x with Xcode 26, the Xcode generator fails when cross-compiling to iOS/tvOS/visionOS SDKs:
+
+```
+-- The C compiler identification is unknown
+CMake Error at ggml/src/ggml-cpu/CMakeLists.txt:57 (target_compile_features):
+  target_compile_features no known features for C compiler "" version .
+```
+
+The failure reproduces against `upstream/master`, verified 2026-04-15. Ninja bypasses it entirely because it does not rely on Xcode's toolchain detection for cross-SDK builds. The resulting xcframework is equivalent — the Xcode-specific `-DCMAKE_XCODE_ATTRIBUTE_*` arguments in `COMMON_CMAKE_ARGS` are harmless no-ops under Ninja, so they were left alone rather than stripped.
 
 ## Building the xcframework
 
@@ -33,7 +46,25 @@ Output: `build-apple/llama.xcframework/` containing 7 slices:
 
 Mac Catalyst is **not** in the xcframework — CMake's cross-compile flags conflict when combining both Catalyst architectures in a single configure step. See [APPLE-PLATFORMS-BUILD.md](APPLE-PLATFORMS-BUILD.md) for the manual lipo workflow.
 
-Every slice links `Metal.framework` and `Accelerate.framework`, and embeds the full Metal shader library (110 MSL kernels) via `GGML_METAL_EMBED_LIBRARY=ON`. No external `.metallib` file is required at runtime.
+Every slice links `Metal.framework` and `Accelerate.framework`, and embeds the full Metal shader library (110 MSL kernels) via `GGML_METAL_EMBED_LIBRARY=ON`. No external `.metallib` file is required at runtime. You can verify this on any slice:
+
+```bash
+nm build-apple/llama.xcframework/ios-arm64/llama.framework/llama \
+  | grep ggml_metallib
+# 000000000032cc20 S _ggml_metallib_start
+# 00000000003bf6d3 S _ggml_metallib_end
+```
+
+### Expected build-time warnings
+
+During configuration of simulator/multi-arch slices you will see:
+
+```
+CMake Warning at ggml/src/ggml-cpu/CMakeLists.txt:558 (message):
+  Unknown CPU architecture.  Falling back to generic implementations.
+```
+
+This is **not** a Metal fallback. It fires only when x86_64 is part of the architecture list (iOS sim, macOS, visionOS sim, tvOS sim) and means the x86_64 **CPU backend** slice uses generic scalar kernels instead of AVX/AVX2. The arm64 CPU backend and the Metal backend are unaffected. For shipping on Apple Silicon devices this warning is cosmetic — no one runs production inference on an x86_64 simulator.
 
 ### Requirements
 
@@ -45,7 +76,7 @@ Last verified: 2026-04-15 against upstream tag `b8802` with Xcode 26.4, CMake 4.
 
 ## Verifying Metal works
 
-A quick smoke test using `llama-bench` against a host macOS build:
+The xcframework is a library — it doesn't ship a runnable binary. To prove Metal is functional end-to-end against the same source the xcframework was built from, do a parallel host-macOS build of `llama-bench`:
 
 ```bash
 cmake -B build-host -G Ninja \
@@ -56,7 +87,18 @@ cmake --build build-host --target llama-bench -j
 ./build-host/bin/llama-bench -m <model>.gguf -p 64 -n 32 -ngl 99
 ```
 
-Look for `ggml_metal_library_init: using embedded metal library` and the `MTL,BLAS` backend column. The same code path runs in the xcframework slices.
+> **Note:** `llama-cli` is only built when `LLAMA_BUILD_SERVER=ON` in this upstream (`tools/CMakeLists.txt`). `llama-bench` is always available and is a more informative smoke test anyway — it prints tokens/sec per backend.
+
+Look for:
+
+- `ggml_metal_library_init: using embedded metal library` — the embedded metallib loaded, not a disk `.metallib`.
+- `GPU family: MTLGPUFamilyApple*` — real Apple Silicon GPU detected.
+- Backend column `MTL,BLAS` — Metal is the compute backend.
+- tg (token generation) rates in the hundreds of t/s on a small model; CPU-only would be 10× slower.
+
+Reference numbers from the 2026-04-15 verification (SmolLM2-135M-Instruct Q4_K_M on an M-series Mac): `pp64 ≈ 8098 t/s`, `tg32 ≈ 403 t/s`, backend `MTL,BLAS`, family `MTLGPUFamilyApple9`.
+
+The xcframework slices contain identical Metal backend code — same `_ggml_metallib_start`/`_end` symbols, same 110 kernels — so a working host Metal build is a reliable proxy for the framework slices.
 
 ## Syncing with upstream
 
@@ -67,7 +109,16 @@ git rebase "$LATEST_TAG"
 ./build-xcframework.sh     # re-verify
 ```
 
-The fork's two commits rebase cleanly onto upstream tags. `ggml/CMakeLists.txt` occasionally conflicts when upstream moves code near `cmake_minimum_required`; resolve by keeping both the version bump and whatever upstream added.
+The fork's commits rebase cleanly onto upstream tags with one known pinch point: `ggml/CMakeLists.txt` conflicts whenever upstream adds code near `cmake_minimum_required` (e.g. the CMP0194 policy block added around tag `b8802`). Resolve by keeping **both** the fork's version-range bump and whatever upstream added adjacent to it.
+
+After rebasing, run `./build-xcframework.sh` and spot-check one slice before force-pushing:
+
+```bash
+lipo -info build-apple/llama.xcframework/ios-arm64_x86_64-simulator/llama.framework/llama
+# Architectures in the fat file: ... are: x86_64 arm64
+nm build-apple/llama.xcframework/ios-arm64/llama.framework/llama | grep ggml_metallib
+# Expect _ggml_metallib_start and _ggml_metallib_end symbols.
+```
 
 ## Further reading