Bug Description
Calling sample_async() with any LoRA checkpoint hangs indefinitely (times out after 60s), while base model sampling works normally (~2.4s response time).
Environment
- Tinker SDK: 0.18.2 (latest from PyPI)
- Base model:
Qwen/Qwen3-4B-Instruct-2507
- LoRA rank: 16
- 5 training runs, 31 checkpoints total, all show "Never" expiry in console
Reproduction
from tinker import ServiceClient, ModelInput, SamplingParams
import asyncio
async def test():
sc = ServiceClient()
# Base model — works, ~2.4s
sampler = await sc.create_sampling_client_async(base_model="Qwen/Qwen3-4B-Instruct-2507")
# Any LoRA checkpoint — permanently hangs, 60s timeout
sampler = await sc.create_sampling_client_async(
model_path="tinker://0ac02541-f609-540f-860e-0bf885b7292e:train:0/sampler_weights/dpo_final"
)
# create_sampling_client_async() returns OK (0.5s)
# but sample_async() never returns — times out after 60s with CancelledError
Tested checkpoints (all hang)
tinker://0ac02541-f609-540f-860e-0bf885b7292e:train:0/sampler_weights/dpo_final (created 23 min ago, 136 MB)
tinker://a735527d-a478-59b1-b510-e3b0211f4989:train:0/sampler_weights/dpo_final (created 2 weeks ago, 136 MB)
- All show "Never" expiry in the Tinker console
Call stack at timeout
sample_async() → _sample_async_impl() → _APIFuture.result_async()
→ asyncio.wait_for(future, timeout=60) → CancelledError
No server response is returned — not a 400/404/500 error. The request appears to be queued server-side and never processed.
Related Issues
- GitHub Issue #234 ("Stuck when sampling") was closed on 2026-03-27 by @YujiaBao with comment "Closing for now — if still experiencing, please reopen." The issue described the same 7200s permanent hang with
sample_async() in a CUA RL workflow. That fix does not appear to have resolved this issue.
Impact
All inference via LoRA checkpoints is blocked. DPO/SFT training works fine and checkpoints save successfully, but sampling from trained checkpoints is completely unusable.
Bug Description
Calling
sample_async()with any LoRA checkpoint hangs indefinitely (times out after 60s), while base model sampling works normally (~2.4s response time).Environment
Qwen/Qwen3-4B-Instruct-2507Reproduction
Tested checkpoints (all hang)
tinker://0ac02541-f609-540f-860e-0bf885b7292e:train:0/sampler_weights/dpo_final(created 23 min ago, 136 MB)tinker://a735527d-a478-59b1-b510-e3b0211f4989:train:0/sampler_weights/dpo_final(created 2 weeks ago, 136 MB)Call stack at timeout
No server response is returned — not a 400/404/500 error. The request appears to be queued server-side and never processed.
Related Issues
sample_async()in a CUA RL workflow. That fix does not appear to have resolved this issue.Impact
All inference via LoRA checkpoints is blocked. DPO/SFT training works fine and checkpoints save successfully, but sampling from trained checkpoints is completely unusable.