You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model: Qwen3.6-35B-A3B-Q4_K_M.gguf
Steps to reproduce
Build llama-server with CUDA support
Run: ./llama-server -m Qwen3.6-35B-A3B-Q4_K_M.gguf --port 1234 --host 0.0.0.0 --ctx-size 2048
Expected behavior
Server starts and responds to API requests
Actual behavior
Model loads successfully
Server hangs during warmup phase
All API requests return HTTP 503 "Loading model"
Server process eventually crashes/exits
Key warnings from logs
Environment
Steps to reproduce
Expected behavior
Server starts and responds to API requests
Actual behavior
Key warnings from logs
Notes