Summary
A Qwen/Qwen3.5-35B-A3B-Base LoRA checkpoint saved via save_weights_for_sampler(...) cannot be sampled through the OpenAI-compat endpoint (/oai/api/v1/completions and /oai/api/v1/chat/completions) — both return HTTP 404 {"detail": "Invalid session"}. The same checkpoint samples cleanly through the Python SDK (ServiceClient.create_sampling_client(model_path=...).sample(...)), and a Llama-3.1-70B checkpoint owned by the same org / accessed with the same API key works fine through the OAI-compat endpoint. This looks like a Tinker-side issue on the OAI-compat router, not a weights or save-path issue.
Affected checkpoint
- Path:
tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280
- Base model:
Qwen/Qwen3.5-35B-A3B-Base (VL MoE)
- Created: 2026-05-20
- LoRA rank: 64
- Saved via
training_client.save_weights_for_sampler(name=...) from the Tinker SDK
Working control checkpoint (same org, same API key, same endpoint)
- Path:
tinker://47f6bf3f-1319-56ec-91c8-4b5f4860f652:train:0/sampler_weights/step_000280
- Base model:
meta-llama/Llama-3.1-70B
- Created: ~one month earlier than the Qwen checkpoint above
- Saved via the same code path (
save_weights_for_sampler)
Reproduction
All four calls were made from the same machine, within seconds of each other, using the same API key.
1. GET /v1/models — Qwen checkpoint is listed correctly
curl -s "$BASE/v1/models" -H "Authorization: Bearer $TINKER_API_KEY"
Response includes:
{
"id": "tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280",
"object": "model",
"created": 1779276501,
"owned_by": "tml:organization_user:0fd650fc-****-****-****-cdbde33f1d5b",
"base_model": "Qwen/Qwen3.5-35B-A3B-Base"
}
2. POST /v1/completions against the Qwen checkpoint → 404
curl -X POST "$BASE/v1/completions" \
-H "Authorization: Bearer $TINKER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280",
"prompt": "ping",
"max_tokens": 16,
"temperature": 0.7
}'
Response:
{"detail": "Invalid session"}
Status: HTTP 404.
3. POST /v1/chat/completions against the Qwen checkpoint → 404 (same response)
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $TINKER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280",
"messages": [{"role":"user","content":"ping"}],
"max_tokens": 16
}'
Response: {"detail": "Invalid session"}, HTTP 404.
4. POST /v1/completions against the Llama control checkpoint → 200 OK
Same key, same endpoint, same request shape, only the model field changed:
curl -X POST "$BASE/v1/completions" \
-H "Authorization: Bearer $TINKER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tinker://47f6bf3f-1319-56ec-91c8-4b5f4860f652:train:0/sampler_weights/step_000280",
"prompt": "ping",
"max_tokens": 16,
"temperature": 0.7
}'
Response (truncated for readability):
{
"id": "c0d1eb73890acb858a9f94da00a16f4c:sample:...",
"object": "text_completion",
"model": "tinker://47f6bf3f-1319-56ec-91c8-4b5f4860f652:train:0/sampler_weights/step_000280",
"choices": [{"index": 0, "text": "ouin.circ_r - Pingouin\nChi-Square Goodness of", "finish_reason": "length", "logprobs": null}],
"usage": {"prompt_tokens": 2, "completion_tokens": 16, "total_tokens": 18}
}
Status: HTTP 200.
5. Python SDK against the same Qwen checkpoint → works
import tinker
from tinker import types
from tinker_cookbook.tokenizer_utils import get_tokenizer
from tinker_cookbook import renderers
service_client = tinker.ServiceClient()
sampling_client = service_client.create_sampling_client(
model_path="tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280"
)
tokenizer = get_tokenizer("Qwen/Qwen3.5-35B-A3B-Base")
renderer = renderers.get_renderer("qwen3", tokenizer)
model_input = renderer.build_generation_prompt([{"role": "user", "content": "ping"}])
stop = renderer.get_stop_sequences()
result = sampling_client.sample(
model_input,
sampling_params=types.SamplingParams(max_tokens=16, temperature=0.7, top_p=0.9, stop=stop),
num_samples=1,
).result()
print(tokenizer.decode(result.sequences[0].tokens, skip_special_tokens=True))
# -> "🏓 Ping!" (6 tokens, stop_reason=stop, EOT at position 5)
The SDK call was made ~30 seconds before reproduction step 2 above, so the OAI-compat 404 is not "SDK session not yet warm" — even with an active SDK session against the same checkpoint, the OAI router returns Invalid session. SDK and OAI sessions appear to be independent.
Expected vs actual
- Expected:
POST /v1/completions and /v1/chat/completions against a checkpoint that appears in /v1/models return a normal OpenAI-compat completion, identical behavior to what we see for the Llama-3.1-70B checkpoint.
- Actual: Returns
HTTP 404 {"detail": "Invalid session"} consistently for the Qwen3.5-35B-A3B-Base (MoE) checkpoint.
Environment
- Tinker SDK: latest installed (Python 3.14 venv, with the
pydantic.v1 compatibility warning)
- Endpoint base:
https://tinker.thinkingmachines.dev/services/tinker-prod/oai/api/v1
- Date of repro: 2026-05-21
Summary
A
Qwen/Qwen3.5-35B-A3B-BaseLoRA checkpoint saved viasave_weights_for_sampler(...)cannot be sampled through the OpenAI-compat endpoint (/oai/api/v1/completionsand/oai/api/v1/chat/completions) — both returnHTTP 404 {"detail": "Invalid session"}. The same checkpoint samples cleanly through the Python SDK (ServiceClient.create_sampling_client(model_path=...).sample(...)), and a Llama-3.1-70B checkpoint owned by the same org / accessed with the same API key works fine through the OAI-compat endpoint. This looks like a Tinker-side issue on the OAI-compat router, not a weights or save-path issue.Affected checkpoint
tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280Qwen/Qwen3.5-35B-A3B-Base(VL MoE)training_client.save_weights_for_sampler(name=...)from the Tinker SDKWorking control checkpoint (same org, same API key, same endpoint)
tinker://47f6bf3f-1319-56ec-91c8-4b5f4860f652:train:0/sampler_weights/step_000280meta-llama/Llama-3.1-70Bsave_weights_for_sampler)Reproduction
All four calls were made from the same machine, within seconds of each other, using the same API key.
1.
GET /v1/models— Qwen checkpoint is listed correctlyResponse includes:
{ "id": "tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280", "object": "model", "created": 1779276501, "owned_by": "tml:organization_user:0fd650fc-****-****-****-cdbde33f1d5b", "base_model": "Qwen/Qwen3.5-35B-A3B-Base" }2.
POST /v1/completionsagainst the Qwen checkpoint → 404Response:
{"detail": "Invalid session"}Status:
HTTP 404.3.
POST /v1/chat/completionsagainst the Qwen checkpoint → 404 (same response)Response:
{"detail": "Invalid session"},HTTP 404.4.
POST /v1/completionsagainst the Llama control checkpoint → 200 OKSame key, same endpoint, same request shape, only the
modelfield changed:Response (truncated for readability):
{ "id": "c0d1eb73890acb858a9f94da00a16f4c:sample:...", "object": "text_completion", "model": "tinker://47f6bf3f-1319-56ec-91c8-4b5f4860f652:train:0/sampler_weights/step_000280", "choices": [{"index": 0, "text": "ouin.circ_r - Pingouin\nChi-Square Goodness of", "finish_reason": "length", "logprobs": null}], "usage": {"prompt_tokens": 2, "completion_tokens": 16, "total_tokens": 18} }Status:
HTTP 200.5. Python SDK against the same Qwen checkpoint → works
The SDK call was made ~30 seconds before reproduction step 2 above, so the OAI-compat 404 is not "SDK session not yet warm" — even with an active SDK session against the same checkpoint, the OAI router returns
Invalid session. SDK and OAI sessions appear to be independent.Expected vs actual
POST /v1/completionsand/v1/chat/completionsagainst a checkpoint that appears in/v1/modelsreturn a normal OpenAI-compat completion, identical behavior to what we see for the Llama-3.1-70B checkpoint.HTTP 404 {"detail": "Invalid session"}consistently for the Qwen3.5-35B-A3B-Base (MoE) checkpoint.Environment
pydantic.v1compatibility warning)https://tinker.thinkingmachines.dev/services/tinker-prod/oai/api/v1