Skip to content

Commit ca400e7

Browse files
committed
Retry RTX PRO 6000 GGUF deploy
1 parent eb58a69 commit ca400e7

3 files changed

Lines changed: 47 additions & 4 deletions

File tree

experiments/002-kimi-k26-gguf-q2/2026-04-25-runpod-rtxpro6000x4-q2-attempts.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,9 +66,31 @@ Observed behavior:
6666
- `uptimeSeconds` remained `0`
6767
- No benchmark was possible because the runtime never became reachable
6868

69+
## Attempt 3
70+
71+
| Field | Value |
72+
| --- | --- |
73+
| Pod name | `kimi-k26-gguf-q2-rtxpro6000x4-20260425-3` |
74+
| Pod ID | `2qlu7nwua4ndd8` |
75+
| Cost | `$6.76/hr` |
76+
| Requested datacenter | `CA-MTL-3` |
77+
| Machine ID | `9ti6j8484pn1` |
78+
79+
Observed behavior:
80+
81+
- Pod allocated again onto the same machine ID as attempts 1 and 2
82+
- RunPod later exposed SSH metadata: `107.150.186.62:13262`
83+
- Direct SSH attempts still returned `Connection refused`
84+
- `uptimeSeconds` remained `0`
85+
- No benchmark was possible because the runtime never became reachable
86+
87+
## Launcher Fix
88+
89+
During the third retry, the pod launcher was corrected to pass runtime overrides such as `HF_MODEL`, `CONTEXT_LENGTH`, `SPLIT_MODE`, and `TENSOR_SPLIT` into the pod environment. That means later GGUF retries now accurately reflect the requested serving topology instead of silently falling back to startup-script defaults.
90+
6991
## Interpretation
7092

71-
This is another provider readiness failure. The second attempt is especially useful because it shows that even after RunPod exposed SSH metadata, the host still was not accepting TCP connections.
93+
This is another provider readiness failure. The second and third attempts are especially useful because they show that even after RunPod exposed SSH metadata, the host still was not accepting TCP connections.
7294

7395
That means:
7496

experiments/002-kimi-k26-gguf-q2/README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ This failure mode points to RunPod host readiness, not model fit. The deployment
6262

6363
## 2026-04-25 4x RTX PRO 6000 Attempts
6464

65-
Two `4x RTX PRO 6000 Blackwell Server Edition` attempts were made on 2026-04-25 for the same `UD-Q2_K_XL` GGUF using llama.cpp CUDA with explicit multi-GPU sharding enabled.
65+
Three `4x RTX PRO 6000 Blackwell Server Edition` attempts were made on 2026-04-25 for the same `UD-Q2_K_XL` GGUF using llama.cpp CUDA with explicit multi-GPU sharding enabled.
6666

6767
Configuration:
6868

@@ -83,10 +83,12 @@ Outcome:
8383

8484
- First pod: `eleak5xoojla2a`
8585
- Second pod: `qm93vevzo0cz1j`
86-
- Both landed on the same machine family: `9ti6j8484pn1`
86+
- Third pod: `2qlu7nwua4ndd8`
87+
- All three landed on the same machine family: `9ti6j8484pn1`
8788
- First attempt never exposed reachable SSH
8889
- Second attempt later exposed SSH metadata (`107.150.186.62:13340`) but direct SSH still returned `Connection refused`
89-
- In both cases `uptimeSeconds` remained `0`, so the runtime never transitioned into a usable state
90+
- Third attempt later exposed SSH metadata (`107.150.186.62:13262`) but direct SSH still returned `Connection refused`
91+
- In all three cases `uptimeSeconds` remained `0`, so the runtime never transitioned into a usable state
9092

9193
These attempts confirm that the current RunPod `4x RTX PRO 6000` allocator is also returning a non-ready host for this workflow.
9294

scripts/runpod/create_kimi_k26_gguf_pod_rest.sh

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,25 @@ if [[ -n "${HF_TOKEN:-}" ]]; then
7272
pod_env="$(jq --arg token "$HF_TOKEN" '. + {HF_TOKEN: $token}' <<<"$pod_env")"
7373
fi
7474

75+
for key in \
76+
HF_MODEL \
77+
HOST \
78+
PORT \
79+
CONTEXT_LENGTH \
80+
GPU_LAYERS \
81+
PARALLEL \
82+
SPLIT_MODE \
83+
TENSOR_SPLIT \
84+
LLAMA_CPP_DIR \
85+
CUDA_ARCHITECTURES \
86+
BUILD_THREADS \
87+
ROCR_VISIBLE_DEVICES
88+
do
89+
if [[ -n "${!key:-}" ]]; then
90+
pod_env="$(jq --arg key "$key" --arg value "${!key}" '. + {($key): $value}' <<<"$pod_env")"
91+
fi
92+
done
93+
7594
payload="$(
7695
jq -n \
7796
--arg name "$NAME" \

0 commit comments

Comments
 (0)