You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: add README "Choosing the right classifier" section
Closes the documentation gap for issue #86 (does the CUDA jar fall back to
CPU?) and the 32-bit Android tail of #121 (armeabi-v7a not published).
The new section enumerates the three published classifiers (default CPU,
cuda13-linux-x86-64, opencl-android-aarch64), their backends, target
platforms, and runtime requirements. It explicitly states that the CUDA
JAR is CUDA-only at runtime — it dlopens libcudart.so.13/libcublas.so.13
and has no automatic CPU fallback — and that Android armeabi-v7a is not
shipped as a released artifact.
Updates docs/history/49be664_open_issues.md to mark #86 as
FIXED-AS-DOCUMENTED and #121 as FIXED (64-bit) with the 32-bit limitation
now documented.
https://claude.ai/code/session_01R3jVWHsB3zymwAQtj8GT43
|_(none)_| CPU | Linux x86-64 / aarch64, macOS x86-64 / aarch64, Windows x86-64, Android aarch64 (CPU) | None beyond a JDK 8+ JVM |
148
+
|`cuda13-linux-x86-64`| CUDA 13 | Linux x86-64 with NVIDIA GPU | NVIDIA driver + CUDA 13 runtime libraries (`libcudart.so.13`, `libcublas.so.13`) installed on the host. The shared library is dynamically linked against them and will fail to `dlopen` if they are absent — there is no automatic fallback to CPU. |
149
+
|`opencl-android-aarch64`| OpenCL (Adreno) | Android aarch64 with Qualcomm Adreno GPU | A device-supplied OpenCL ICD (`libOpenCL.so`). Devices without an ICD (e.g. most non-Snapdragon Android hardware) must use the default CPU JAR. |
150
+
151
+
```xml
152
+
<!-- CPU (default) -->
153
+
<dependency>
154
+
<groupId>net.ladenthin</groupId>
155
+
<artifactId>llama</artifactId>
156
+
<version>5.0.1</version>
157
+
</dependency>
158
+
159
+
<!-- CUDA on Linux x86-64 (requires CUDA 13 runtime on the host) -->
160
+
<dependency>
161
+
<groupId>net.ladenthin</groupId>
162
+
<artifactId>llama</artifactId>
163
+
<version>5.0.1</version>
164
+
<classifier>cuda13-linux-x86-64</classifier>
165
+
</dependency>
166
+
167
+
<!-- OpenCL/Adreno on Android (requires device-provided OpenCL ICD) -->
168
+
<dependency>
169
+
<groupId>net.ladenthin</groupId>
170
+
<artifactId>llama</artifactId>
171
+
<version>5.0.1</version>
172
+
<classifier>opencl-android-aarch64</classifier>
173
+
</dependency>
174
+
```
175
+
176
+
> [!IMPORTANT]
177
+
> The CUDA JAR is **CUDA-only at runtime**. On a CPU-only host (no NVIDIA
178
+
> driver or no CUDA 13 runtime libraries installed) the JVM will fail at
179
+
> native-library load time with `UnsatisfiedLinkError` caused by an
180
+
> underlying `dlopen` failure on `libcudart.so.13`. If you want to ship a
181
+
> single artifact that works on both CPU and CUDA hosts, depend on the
182
+
> default (CPU) JAR; users who want GPU acceleration must compile locally
183
+
> with `-DGGML_CUDA=ON` (see [Setup required](#setup-required)).
184
+
185
+
> [!NOTE]
186
+
> Android `armeabi-v7a` (32-bit ARM) is **not** published. Only 64-bit
187
+
> `aarch64` Android binaries are shipped, both as the CPU-only default JAR
188
+
> and as `opencl-android-aarch64`. 32-bit Android devices are unsupported
189
+
> by the released artifacts; building from source via the
190
+
> `.github/dockcross/dockcross-android-arm` toolchain is possible but not
191
+
> wired into CI.
192
+
139
193
### Setup required
140
194
141
195
If none of the above listed platforms matches yours, currently you have to compile the library yourself (also if you
PARTIALLY FIXED pending Java-side enhancements** (typed image API #103/#34,
73
-
32-bit Android tail of #121, CUDA-jar documentation #86), and **0 require
74
-
platform reproduction**.
71
+
Bottom line: out of 9 `LIKELY/PARTIALLY FIXED` issues, **4 are FIXED via JUnit
72
+
regression tests merged in PR #185** (#80, #95, #98, #102), **#86 and the
73
+
32-bit Android tail of #121 are FIXED-AS-DOCUMENTED via the README "Choosing
74
+
the right classifier" section**, **2 stay PARTIALLY FIXED pending Java-side
75
+
enhancements** (typed image API #103/#34), and **0 require platform
76
+
reproduction**.
75
77
76
78
---
77
79
@@ -455,7 +457,7 @@ cache / context between iterations.
455
457
Asks whether the CUDA-classified JAR supports CPU fallback when no GPU is
456
458
present, and requests example code / dependencies for an auto-fallback setup.
457
459
458
-
**Status in fork:**PARTIALLY FIXED. The CUDA classifier `cuda13-linux-x86-64` is built via `.github/build_cuda_linux.sh` (see `CLAUDE.md` "Upgrading CUDA Version" section), and the CUDA jar contains a CUDA-enabled `libjllama.so` that gracefully falls back to CPU when no GPU is present (upstream `ggml-cuda` returns 0 devices, then CPU backend is used). Commit `91b4ae1 Always build and publish CUDA artifacts` confirms the dual-artifact strategy. Next steps: add Javadoc / README guidance documenting the fallback.
460
+
**Status in fork:** FIXED-AS-DOCUMENTED. The CUDA classifier `cuda13-linux-x86-64` is built via `.github/build_cuda_linux.sh` (see `CLAUDE.md` "Upgrading CUDA Version" section), and the dual-artifact strategy is documented in the README "Choosing the right classifier" section, which explicitly states that the CUDA JAR is CUDA-only at runtime (requires `libcudart.so.13` / `libcublas.so.13` on the host) and does not auto-fall back to CPU. CPU users must pick the default classifier.
459
461
460
462
**Deep-dive analysis:** This is a documentation gap, not a code defect. Behaviorally: the CUDA-built `libjllama.so` dynamically links against `libcudart.so.13` and `libcublas.so.13`. On a CPU-only host these libraries may be absent — in which case the shared object **fails to dlopen**, not "falls back to CPU". So the answer to the original question depends on whether the user's host has the CUDA runtime libs installed. Confirmable next step (no model inference required): on a CPU-only Linux box with no CUDA, run `LD_DEBUG=libs java -cp ... net.ladenthin.llama.LlamaModel`; if dlopen of `libcudart.so.13` fails, the CUDA jar **cannot** load. **Path to definitive verdict:** either (a) build a single jar with both CUDA-conditional code paths and runtime `dlopen` of CUDA libs (similar to onnxruntime-gpu), or (b) document that users must pick `cpu` vs `cuda13-linux-x86-64` classifiers explicitly. The current `91b4ae1` strategy is (b). Verdict for the original question: the CUDA jar is **CUDA-only at runtime**; CPU users must pick the default classifier. Update to FIXED-AS-DOCUMENTED once a README note is added.
| 121 |PARTIALLY FIXED → FIXED (64-bit) | aarch64 path consistent between CI build and loader; no 32-bit publish |`publish.yml:133`, `OSInfo.java:256-259,350`|
718
+
| 121 | FIXED (64-bit) | aarch64 path consistent between CI build and loader; 32-bit `armeabi-v7a` limitation documented in README "Choosing the right classifier" |`publish.yml:133`, `OSInfo.java:256-259,350`, `README.md`|
717
719
| 120 | FIXED | Architecture support comes from b9284 |`CLAUDE.md:11`|
| 86 |PARTIALLY FIXED | CUDA jar is CUDA-runtime-required; user must pick classifier|`.github/build_cuda_linux.sh`, commit `91b4ae1`|
739
+
| 86 | FIXED-AS-DOCUMENTED| CUDA jar is CUDA-runtime-required; user must pick classifier. README "Choosing the right classifier" documents this. |`.github/build_cuda_linux.sh`, commit `91b4ae1`, `README.md`|
0 commit comments