Skip to content

Commit dc6a2cf

Browse files
committed
docs: add README "Choosing the right classifier" section
Closes the documentation gap for issue #86 (does the CUDA jar fall back to CPU?) and the 32-bit Android tail of #121 (armeabi-v7a not published). The new section enumerates the three published classifiers (default CPU, cuda13-linux-x86-64, opencl-android-aarch64), their backends, target platforms, and runtime requirements. It explicitly states that the CUDA JAR is CUDA-only at runtime — it dlopens libcudart.so.13/libcublas.so.13 and has no automatic CPU fallback — and that Android armeabi-v7a is not shipped as a released artifact. Updates docs/history/49be664_open_issues.md to mark #86 as FIXED-AS-DOCUMENTED and #121 as FIXED (64-bit) with the 32-bit limitation now documented. https://claude.ai/code/session_01R3jVWHsB3zymwAQtj8GT43
1 parent d0d1591 commit dc6a2cf

2 files changed

Lines changed: 66 additions & 10 deletions

File tree

README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,60 @@ We support CPU inference for the following platforms out of the box:
136136

137137
If any of these match your platform, you can include the Maven dependency and get started.
138138

139+
### Choosing the right classifier
140+
141+
The Maven coordinate `net.ladenthin:llama` publishes one default JAR (CPU-only)
142+
plus two optional GPU/accelerator JARs selected via a Maven `<classifier>`.
143+
Pick at most one — they are mutually exclusive.
144+
145+
| Classifier | Backend | Target platform | Runtime requirement |
146+
|---|---|---|---|
147+
| _(none)_ | CPU | Linux x86-64 / aarch64, macOS x86-64 / aarch64, Windows x86-64, Android aarch64 (CPU) | None beyond a JDK 8+ JVM |
148+
| `cuda13-linux-x86-64` | CUDA 13 | Linux x86-64 with NVIDIA GPU | NVIDIA driver + CUDA 13 runtime libraries (`libcudart.so.13`, `libcublas.so.13`) installed on the host. The shared library is dynamically linked against them and will fail to `dlopen` if they are absent — there is no automatic fallback to CPU. |
149+
| `opencl-android-aarch64` | OpenCL (Adreno) | Android aarch64 with Qualcomm Adreno GPU | A device-supplied OpenCL ICD (`libOpenCL.so`). Devices without an ICD (e.g. most non-Snapdragon Android hardware) must use the default CPU JAR. |
150+
151+
```xml
152+
<!-- CPU (default) -->
153+
<dependency>
154+
<groupId>net.ladenthin</groupId>
155+
<artifactId>llama</artifactId>
156+
<version>5.0.1</version>
157+
</dependency>
158+
159+
<!-- CUDA on Linux x86-64 (requires CUDA 13 runtime on the host) -->
160+
<dependency>
161+
<groupId>net.ladenthin</groupId>
162+
<artifactId>llama</artifactId>
163+
<version>5.0.1</version>
164+
<classifier>cuda13-linux-x86-64</classifier>
165+
</dependency>
166+
167+
<!-- OpenCL/Adreno on Android (requires device-provided OpenCL ICD) -->
168+
<dependency>
169+
<groupId>net.ladenthin</groupId>
170+
<artifactId>llama</artifactId>
171+
<version>5.0.1</version>
172+
<classifier>opencl-android-aarch64</classifier>
173+
</dependency>
174+
```
175+
176+
> [!IMPORTANT]
177+
> The CUDA JAR is **CUDA-only at runtime**. On a CPU-only host (no NVIDIA
178+
> driver or no CUDA 13 runtime libraries installed) the JVM will fail at
179+
> native-library load time with `UnsatisfiedLinkError` caused by an
180+
> underlying `dlopen` failure on `libcudart.so.13`. If you want to ship a
181+
> single artifact that works on both CPU and CUDA hosts, depend on the
182+
> default (CPU) JAR; users who want GPU acceleration must compile locally
183+
> with `-DGGML_CUDA=ON` (see [Setup required](#setup-required)).
184+
185+
> [!NOTE]
186+
> Android `armeabi-v7a` (32-bit ARM) is **not** published. Only 64-bit
187+
> `aarch64` Android binaries are shipped, both as the CPU-only default JAR
188+
> and as `opencl-android-aarch64`. 32-bit Android devices are unsupported
189+
> by the released artifacts; building from source via the
190+
> `.github/dockcross/dockcross-android-arm` toolchain is possible but not
191+
> wired into CI.
192+
139193
### Setup required
140194

141195
If none of the above listed platforms matches yours, currently you have to compile the library yourself (also if you

docs/history/49be664_open_issues.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,9 @@ After a second-pass analysis of every `LIKELY FIXED` and `PARTIALLY FIXED` issue
2929
CI; manual macOS-host builds use the same Android-aware CMake logic.
3030
- #86 — CUDA jar / CPU fallback: the CUDA jar **requires** `libcudart.so.13` at
3131
runtime; there is no automatic dynamic fallback to CPU within one jar. Users
32-
must pick the `cpu` vs `cuda13-linux-x86-64` classifier. Verdict stays
33-
PARTIALLY FIXED (documentation gap).
32+
must pick the `cpu` vs `cuda13-linux-x86-64` classifier. Now documented in
33+
the README "Choosing the right classifier" section &#x2192; verdict
34+
FIXED-AS-DOCUMENTED.
3435

3536
- **Confirmable with one targeted JUnit test (no model retraining, no platform
3637
reproduction):** all four JUnit tests below landed on `master` via PR #185
@@ -67,11 +68,12 @@ After a second-pass analysis of every `LIKELY FIXED` and `PARTIALLY FIXED` issue
6768
All five depend on architecture/runtime emulation defects or platform-specific
6869
CRT behaviour that no amount of source-tree inspection can resolve.
6970

70-
Bottom line: out of 9 `LIKELY/PARTIALLY FIXED` issues, **4 are now FIXED via
71-
JUnit regression tests merged in PR #185** (#80, #95, #98, #102), **3 stay
72-
PARTIALLY FIXED pending Java-side enhancements** (typed image API #103/#34,
73-
32-bit Android tail of #121, CUDA-jar documentation #86), and **0 require
74-
platform reproduction**.
71+
Bottom line: out of 9 `LIKELY/PARTIALLY FIXED` issues, **4 are FIXED via JUnit
72+
regression tests merged in PR #185** (#80, #95, #98, #102), **#86 and the
73+
32-bit Android tail of #121 are FIXED-AS-DOCUMENTED via the README "Choosing
74+
the right classifier" section**, **2 stay PARTIALLY FIXED pending Java-side
75+
enhancements** (typed image API #103/#34), and **0 require platform
76+
reproduction**.
7577

7678
---
7779

@@ -455,7 +457,7 @@ cache / context between iterations.
455457
Asks whether the CUDA-classified JAR supports CPU fallback when no GPU is
456458
present, and requests example code / dependencies for an auto-fallback setup.
457459

458-
**Status in fork:** PARTIALLY FIXED. The CUDA classifier `cuda13-linux-x86-64` is built via `.github/build_cuda_linux.sh` (see `CLAUDE.md` "Upgrading CUDA Version" section), and the CUDA jar contains a CUDA-enabled `libjllama.so` that gracefully falls back to CPU when no GPU is present (upstream `ggml-cuda` returns 0 devices, then CPU backend is used). Commit `91b4ae1 Always build and publish CUDA artifacts` confirms the dual-artifact strategy. Next steps: add Javadoc / README guidance documenting the fallback.
460+
**Status in fork:** FIXED-AS-DOCUMENTED. The CUDA classifier `cuda13-linux-x86-64` is built via `.github/build_cuda_linux.sh` (see `CLAUDE.md` "Upgrading CUDA Version" section), and the dual-artifact strategy is documented in the README "Choosing the right classifier" section, which explicitly states that the CUDA JAR is CUDA-only at runtime (requires `libcudart.so.13` / `libcublas.so.13` on the host) and does not auto-fall back to CPU. CPU users must pick the default classifier.
459461

460462
**Deep-dive analysis:** This is a documentation gap, not a code defect. Behaviorally: the CUDA-built `libjllama.so` dynamically links against `libcudart.so.13` and `libcublas.so.13`. On a CPU-only host these libraries may be absent — in which case the shared object **fails to dlopen**, not "falls back to CPU". So the answer to the original question depends on whether the user's host has the CUDA runtime libs installed. Confirmable next step (no model inference required): on a CPU-only Linux box with no CUDA, run `LD_DEBUG=libs java -cp ... net.ladenthin.llama.LlamaModel`; if dlopen of `libcudart.so.13` fails, the CUDA jar **cannot** load. **Path to definitive verdict:** either (a) build a single jar with both CUDA-conditional code paths and runtime `dlopen` of CUDA libs (similar to onnxruntime-gpu), or (b) document that users must pick `cpu` vs `cuda13-linux-x86-64` classifiers explicitly. The current `91b4ae1` strategy is (b). Verdict for the original question: the CUDA jar is **CUDA-only at runtime**; CPU users must pick the default classifier. Update to FIXED-AS-DOCUMENTED once a README note is added.
461463

@@ -713,7 +715,7 @@ Feature request: add multimodal input support (referencing
713715
|---|---|---|---|
714716
| 124 | FIXED | Continuous version bumps; pinned to b9284 | `CLAUDE.md:11`, `git log` upgrade commits |
715717
| 123 | FIXED | b9284 includes Qwen3-VL; mtmd linked | `CMakeLists.txt:255`, `CLAUDE.md:11` |
716-
| 121 | PARTIALLY FIXED → FIXED (64-bit) | aarch64 path consistent between CI build and loader; no 32-bit publish | `publish.yml:133`, `OSInfo.java:256-259,350` |
718+
| 121 | FIXED (64-bit) | aarch64 path consistent between CI build and loader; 32-bit `armeabi-v7a` limitation documented in README "Choosing the right classifier" | `publish.yml:133`, `OSInfo.java:256-259,350`, `README.md` |
717719
| 120 | FIXED | Architecture support comes from b9284 | `CLAUDE.md:11` |
718720
| 119 | FIXED | Per-release bump cadence to b9284 | `git log --oneline` Upgrade commits |
719721
| 117 | NEEDS INVESTIGATION | Upstream backend-device crash; reproduce | `b9284` is current; reproduce on emulator |
@@ -734,7 +736,7 @@ Feature request: add multimodal input support (referencing
734736
| 89 | NOT APPLICABLE | Hand-port `server.hpp` removed | upstream server compiled directly |
735737
| 88 | FIXED | `chatComplete` accepts OAI messages JSON | `LlamaModel.java:215-238` |
736738
| 87 | FIXED | `setCachePrompt` + per-slot KV semantics | `InferenceParameters.java:116` |
737-
| 86 | PARTIALLY FIXED | CUDA jar is CUDA-runtime-required; user must pick classifier | `.github/build_cuda_linux.sh`, commit `91b4ae1` |
739+
| 86 | FIXED-AS-DOCUMENTED | CUDA jar is CUDA-runtime-required; user must pick classifier. README "Choosing the right classifier" documents this. | `.github/build_cuda_linux.sh`, commit `91b4ae1`, `README.md` |
738740
| 85 | NEEDS INVESTIGATION | Rosetta-2 emulation defect; arm64 builds ship | `Mac/aarch64/` artifact |
739741
| 84 | FIXED | `rerank()` API + RerankingModelTest | `LlamaModel.java:170,187` |
740742
| 83 | NEEDS INVESTIGATION | Fresh Windows artifact; reproduce | `compat/ggml_x86_compat.c` |

0 commit comments

Comments
 (0)