Skip to content

Misc. bug: llama update on Windows may replace CUDA build with Vulkan/CPU binary #24744

Description

@Carbaz

Name and Version

llama version
b9631-6e14286ed

llama-server --version
version: 9692 (f3e1828)
built with Clang 20.1.8 for Windows x86_64

llama-cli --version
version: 9692 (f3e1828)
built with Clang 20.1.8 for Windows x86_64

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

Other (Please specify in the next section)

Command line

llama update

Problem description & steps to reproduce

When installing llama.cpp on Windows from a CUDA release zip, the resulting llama.exe includes a llama update subcommand. Looking at the source in app/llama.cpp, on Windows this simply runs:

irm https://llama.app/install.ps1 | iex

However, install.ps1 only probes for Vulkan and CPU, unlike install.sh

curl -fsSL https://llama.app/install.sh

which has a probe_cuda() function.

This seems that running llama update on Windows will apparently replace a CUDA build with a Vulkan or CPU binary, silently losing GPU acceleration.

First Bad Commit

5a46b46 included the update command, not sure the date of each script, ps1 and sh.

Relevant log output

Logs
llama update                                                                        
Version: b9631
Probing Vulkan...
Downloading vulkan-probe.exe...
Downloading unzstd.exe...
Downloading featcode.exe...
Found: bmi2
Found: avxvnni
Found: avx512vl
Found: avx512cd
Found: avx512dq
Found: avx512vnni
Found: avx512vbmi
Found: avx512bf16
Downloading llama.exe...
Installation completed successfully

Please run the following command to start it:

  llama.exe serve

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions