|
| 1 | +# 2026-04-25 RunPod 4x RTX PRO 6000 Q2 Attempts |
| 2 | + |
| 3 | +## Goal |
| 4 | + |
| 5 | +Try the `unsloth/Kimi-K2.6-GGUF:UD-Q2_K_XL` path on a non-AMD node that still meets the smallest practical aggregate VRAM target. |
| 6 | + |
| 7 | +## Why This Topology |
| 8 | + |
| 9 | +- Each `RTX PRO 6000 Blackwell Server Edition` has `96 GB` VRAM. |
| 10 | +- `4x` gives `384 GB` aggregate VRAM, which is above the `340 GB` GGUF artifact size and keeps the topology in the requested even-GPU pattern. |
| 11 | +- llama.cpp supports multi-GPU model sharding through `--split-mode` and `--tensor-split`, so GGUF is a valid multi-GPU path. |
| 12 | + |
| 13 | +Sources: |
| 14 | + |
| 15 | +- https://github.com/ggml-org/llama.cpp |
| 16 | +- https://github.com/ggml-org/llama.cpp/discussions/6046 |
| 17 | +- https://github.com/ggml-org/llama.cpp/discussions/11784 |
| 18 | + |
| 19 | +## Launch Configuration |
| 20 | + |
| 21 | +| Field | Value | |
| 22 | +| --- | --- | |
| 23 | +| Cloud | `COMMUNITY` | |
| 24 | +| Image | `runpod/pytorch:1.0.3-cu1281-torch291-ubuntu2404` | |
| 25 | +| Model | `unsloth/Kimi-K2.6-GGUF:UD-Q2_K_XL` | |
| 26 | +| Context length | `2048` | |
| 27 | +| GPU layers | `999` | |
| 28 | +| Split mode | `layer` | |
| 29 | +| Tensor split | `1,1,1,1` | |
| 30 | +| Volume | `500 GB` | |
| 31 | +| Port | `30000/http` | |
| 32 | + |
| 33 | +The launcher embedded `scripts/serve/start_llamacpp_kimi_k26_gguf_cuda.sh` directly into the pod startup command. |
| 34 | + |
| 35 | +## Attempt 1 |
| 36 | + |
| 37 | +| Field | Value | |
| 38 | +| --- | --- | |
| 39 | +| Pod name | `kimi-k26-gguf-q2-rtxpro6000x4-20260425-1` | |
| 40 | +| Pod ID | `eleak5xoojla2a` | |
| 41 | +| Cost | `$6.76/hr` | |
| 42 | +| Machine ID | `9ti6j8484pn1` | |
| 43 | + |
| 44 | +Observed behavior: |
| 45 | + |
| 46 | +- Pod allocated |
| 47 | +- `desiredStatus: RUNNING` |
| 48 | +- `uptimeSeconds: 0` |
| 49 | +- `publicIp: null` |
| 50 | +- SSH remained `pod not ready` |
| 51 | + |
| 52 | +## Attempt 2 |
| 53 | + |
| 54 | +| Field | Value | |
| 55 | +| --- | --- | |
| 56 | +| Pod name | `kimi-k26-gguf-q2-rtxpro6000x4-20260425-2` | |
| 57 | +| Pod ID | `qm93vevzo0cz1j` | |
| 58 | +| Cost | `$6.76/hr` | |
| 59 | +| Machine ID | `9ti6j8484pn1` | |
| 60 | + |
| 61 | +Observed behavior: |
| 62 | + |
| 63 | +- Pod allocated again onto the same machine ID as attempt 1 |
| 64 | +- RunPod later exposed SSH metadata: `107.150.186.62:13340` |
| 65 | +- Direct SSH attempts still returned `Connection refused` |
| 66 | +- `uptimeSeconds` remained `0` |
| 67 | +- No benchmark was possible because the runtime never became reachable |
| 68 | + |
| 69 | +## Interpretation |
| 70 | + |
| 71 | +This is another provider readiness failure. The second attempt is especially useful because it shows that even after RunPod exposed SSH metadata, the host still was not accepting TCP connections. |
| 72 | + |
| 73 | +That means: |
| 74 | + |
| 75 | +- the issue is not the GGUF format |
| 76 | +- the issue is not the multi-GPU split configuration |
| 77 | +- the issue is not SSH key auth |
| 78 | +- the issue is the allocated RunPod host failing to transition into a live runtime |
| 79 | + |
| 80 | +## Cleanup |
| 81 | + |
| 82 | +Both pods were deleted after the failed readiness windows. |
0 commit comments