Skip to content

Commit eda72d3

Browse files
committed
Log cheapest RunPod search
1 parent ca400e7 commit eda72d3

2 files changed

Lines changed: 128 additions & 0 deletions

File tree

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# 2026-04-25 RunPod Cheapest Search for Kimi K2.6 GGUF Q2
2+
3+
## Goal
4+
5+
Find the cheapest currently available RunPod topology that can plausibly run `unsloth/Kimi-K2.6-GGUF:UD-Q2_K_XL`, then start it and benchmark if the runtime becomes reachable.
6+
7+
## Selection Rule
8+
9+
Use even-GPU shapes only, and prefer the lowest-cost topology that clears the rough aggregate-VRAM floor for the `340 GB` GGUF artifact.
10+
11+
## Live Cheap-First Search
12+
13+
### 1. 8x A40
14+
15+
- Status in inventory: available, low stock
16+
- Result: `HTTP 500 There are no instances currently available`
17+
- Outcome: could not allocate
18+
19+
### 2. 8x RTX A6000
20+
21+
- Status in inventory: advertised in several regions, but without a strong stock signal
22+
- Result: `HTTP 500 There are no instances currently available`
23+
- Outcome: could not allocate
24+
25+
### 3. 2x MI300X
26+
27+
| Field | Value |
28+
| --- | --- |
29+
| Pod ID | `k9p5qwst0txevv` |
30+
| Cost | `$3.98/hr` |
31+
| Machine ID | `j03rnq2tcsxu` |
32+
| Result | allocated |
33+
34+
Observed behavior:
35+
36+
- `desiredStatus: RUNNING`
37+
- `uptimeSeconds: 0`
38+
- no public routing
39+
- SSH never reached `ready`
40+
41+
Outcome: cheapest allocatable option, but dead host.
42+
43+
### 4. 4x H100 80GB
44+
45+
- Status in inventory: high-level stock existed, but not in the REST-allowed regions that were tested
46+
- Result: `HTTP 500 There are no instances currently available`
47+
- Outcome: could not allocate
48+
49+
### 5. 4x H100 NVL
50+
51+
| Field | Value |
52+
| --- | --- |
53+
| Pod ID | `j6q4iu80tj922e` |
54+
| Cost | `$10.36/hr` |
55+
| Machine ID | `o7h99o28jtin` |
56+
| Result | allocated |
57+
58+
Observed behavior:
59+
60+
- new machine ID, unlike the recycled RTX PRO 6000 community host
61+
- SSH metadata appeared at `38.143.35.131:12908`
62+
- direct SSH still returned `Connection refused`
63+
- `uptimeSeconds` remained `0`
64+
65+
Outcome: allocates, but still dead before runtime.
66+
67+
### 6. 4x H200
68+
69+
- Status in inventory: low stock in some regions
70+
- Result: `HTTP 500 There are no instances currently available`
71+
- Outcome: could not allocate
72+
73+
### 7. 4x RTX PRO 6000 Secure
74+
75+
| Field | Value |
76+
| --- | --- |
77+
| Pod ID | `c3j21r1pd9wpa2` |
78+
| Cost | `$7.56/hr` |
79+
| Machine ID | `67fbuhb2qnz1` |
80+
| Datacenter | `EUR-IS-1` |
81+
| Result | allocated |
82+
83+
Observed behavior:
84+
85+
- first Secure RTX PRO 6000 host tested, so this was a different pool than the recycled dead community host
86+
- SSH metadata appeared at `157.157.221.30:52123`
87+
- direct SSH still returned `Connection refused`
88+
- `uptimeSeconds` remained `0`
89+
90+
Outcome: cheapest allocatable NVIDIA path found today, but still dead before runtime.
91+
92+
## Practical Conclusion
93+
94+
As of 2026-04-25, the cheapest allocatable RunPod topology found for this model was:
95+
96+
- `2x MI300X` at `$3.98/hr`
97+
98+
The cheapest allocatable NVIDIA topology found was:
99+
100+
- `4x RTX PRO 6000 Secure` at `$7.56/hr`
101+
102+
Neither became reachable enough to run inference, so no successful benchmark could be produced from the cheap-first search.
103+
104+
## Cleanup
105+
106+
Every failed pod from this search was deleted after its readiness window.

experiments/002-kimi-k26-gguf-q2/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,28 @@ Outcome:
9292

9393
These attempts confirm that the current RunPod `4x RTX PRO 6000` allocator is also returning a non-ready host for this workflow.
9494

95+
## 2026-04-25 Cheapest-Available Search
96+
97+
An explicit cheapest-first search was run on 2026-04-25 against the live RunPod inventory for the `UD-Q2_K_XL` GGUF path. The target requirement was the smallest practical even-GPU topology with enough aggregate VRAM to plausibly serve the 340 GB artifact.
98+
99+
Observed order from the live market:
100+
101+
| Candidate | Result |
102+
| --- | --- |
103+
| `8x A40` | cheapest likely fit, but no allocatable instances |
104+
| `8x RTX A6000` | no allocatable instances |
105+
| `2x MI300X` | allocated at `$3.98/hr`, but host never transitioned into a live runtime |
106+
| `4x H100 80GB` | no allocatable instances in REST-allowed H100 regions |
107+
| `4x H100 NVL` | allocated at `$10.36/hr`, but host never transitioned into a live runtime |
108+
| `4x H200` | no allocatable instances |
109+
| `4x RTX PRO 6000 Secure` | allocated at `$7.56/hr`, new host, but still never transitioned into a live runtime |
110+
111+
Conclusion:
112+
113+
- The cheapest allocatable shape today was `2x MI300X` at `$3.98/hr`.
114+
- The cheapest allocatable NVIDIA shape today was `4x RTX PRO 6000 Secure` at `$7.56/hr`.
115+
- None of the allocatable shapes actually became reachable enough to run inference, so there is still no successful cheap RunPod baseline for this model on this date.
116+
95117
## Launch Runbook
96118

97119
Capacity probe with a disposable pod volume:

0 commit comments

Comments
 (0)