OAI-compat endpoint returns 404 "Invalid session" for Qwen3.5-35B-A3B (MoE) checkpoints

## Summary

A `Qwen/Qwen3.5-35B-A3B-Base` LoRA checkpoint saved via `save_weights_for_sampler(...)` cannot be sampled through the OpenAI-compat endpoint (`/oai/api/v1/completions` and `/oai/api/v1/chat/completions`) — both return `HTTP 404 {"detail": "Invalid session"}`. The same checkpoint samples cleanly through the Python SDK (`ServiceClient.create_sampling_client(model_path=...).sample(...)`), and a Llama-3.1-70B checkpoint owned by the same org / accessed with the same API key works fine through the OAI-compat endpoint. This looks like a Tinker-side issue on the OAI-compat router, not a weights or save-path issue.

## Affected checkpoint

- Path: `tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280`
- Base model: `Qwen/Qwen3.5-35B-A3B-Base` (VL MoE)
- Created: 2026-05-20
- LoRA rank: 64
- Saved via `training_client.save_weights_for_sampler(name=...)` from the Tinker SDK

## Working control checkpoint (same org, same API key, same endpoint)

- Path: `tinker://47f6bf3f-1319-56ec-91c8-4b5f4860f652:train:0/sampler_weights/step_000280`
- Base model: `meta-llama/Llama-3.1-70B`
- Created: ~one month earlier than the Qwen checkpoint above
- Saved via the same code path (`save_weights_for_sampler`)

## Reproduction

All four calls were made from the same machine, within seconds of each other, using the same API key.

### 1. `GET /v1/models` — Qwen checkpoint is listed correctly

```bash
curl -s "$BASE/v1/models" -H "Authorization: Bearer $TINKER_API_KEY"
```

Response includes:

```json
{
  "id": "tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280",
  "object": "model",
  "created": 1779276501,
  "owned_by": "tml:organization_user:0fd650fc-****-****-****-cdbde33f1d5b",
  "base_model": "Qwen/Qwen3.5-35B-A3B-Base"
}
```

### 2. `POST /v1/completions` against the Qwen checkpoint → 404

```bash
curl -X POST "$BASE/v1/completions" \
  -H "Authorization: Bearer $TINKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280",
    "prompt": "ping",
    "max_tokens": 16,
    "temperature": 0.7
  }'
```

Response:

```json
{"detail": "Invalid session"}
```

Status: `HTTP 404`.

### 3. `POST /v1/chat/completions` against the Qwen checkpoint → 404 (same response)

```bash
curl -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $TINKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280",
    "messages": [{"role":"user","content":"ping"}],
    "max_tokens": 16
  }'
```

Response: `{"detail": "Invalid session"}`, `HTTP 404`.

### 4. `POST /v1/completions` against the Llama control checkpoint → 200 OK

Same key, same endpoint, same request shape, only the `model` field changed:

```bash
curl -X POST "$BASE/v1/completions" \
  -H "Authorization: Bearer $TINKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinker://47f6bf3f-1319-56ec-91c8-4b5f4860f652:train:0/sampler_weights/step_000280",
    "prompt": "ping",
    "max_tokens": 16,
    "temperature": 0.7
  }'
```

Response (truncated for readability):

```json
{
  "id": "c0d1eb73890acb858a9f94da00a16f4c:sample:...",
  "object": "text_completion",
  "model": "tinker://47f6bf3f-1319-56ec-91c8-4b5f4860f652:train:0/sampler_weights/step_000280",
  "choices": [{"index": 0, "text": "ouin.circ_r - Pingouin\nChi-Square Goodness of", "finish_reason": "length", "logprobs": null}],
  "usage": {"prompt_tokens": 2, "completion_tokens": 16, "total_tokens": 18}
}
```

Status: `HTTP 200`.

### 5. Python SDK against the same Qwen checkpoint → works

```python
import tinker
from tinker import types
from tinker_cookbook.tokenizer_utils import get_tokenizer
from tinker_cookbook import renderers

service_client = tinker.ServiceClient()
sampling_client = service_client.create_sampling_client(
    model_path="tinker://60e46cd5-35d0-586f-8e68-53fafb9163d8:train:0/sampler_weights/step_000280"
)

tokenizer = get_tokenizer("Qwen/Qwen3.5-35B-A3B-Base")
renderer = renderers.get_renderer("qwen3", tokenizer)
model_input = renderer.build_generation_prompt([{"role": "user", "content": "ping"}])
stop = renderer.get_stop_sequences()

result = sampling_client.sample(
    model_input,
    sampling_params=types.SamplingParams(max_tokens=16, temperature=0.7, top_p=0.9, stop=stop),
    num_samples=1,
).result()

print(tokenizer.decode(result.sequences[0].tokens, skip_special_tokens=True))
# -> "🏓 Ping!"  (6 tokens, stop_reason=stop, EOT at position 5)
```

The SDK call was made ~30 seconds before reproduction step 2 above, so the OAI-compat 404 is not "SDK session not yet warm" — even with an active SDK session against the same checkpoint, the OAI router returns `Invalid session`. SDK and OAI sessions appear to be independent.

## Expected vs actual

- **Expected:** `POST /v1/completions` and `/v1/chat/completions` against a checkpoint that appears in `/v1/models` return a normal OpenAI-compat completion, identical behavior to what we see for the Llama-3.1-70B checkpoint.
- **Actual:** Returns `HTTP 404 {"detail": "Invalid session"}` consistently for the Qwen3.5-35B-A3B-Base (MoE) checkpoint.

## Environment

- Tinker SDK: latest installed (Python 3.14 venv, with the `pydantic.v1` compatibility warning)
- Endpoint base: `https://tinker.thinkingmachines.dev/services/tinker-prod/oai/api/v1`
- Date of repro: 2026-05-21


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OAI-compat endpoint returns 404 "Invalid session" for Qwen3.5-35B-A3B (MoE) checkpoints #117

Summary

Affected checkpoint

Working control checkpoint (same org, same API key, same endpoint)

Reproduction

1. `GET /v1/models` — Qwen checkpoint is listed correctly

2. `POST /v1/completions` against the Qwen checkpoint → 404

3. `POST /v1/chat/completions` against the Qwen checkpoint → 404 (same response)

4. `POST /v1/completions` against the Llama control checkpoint → 200 OK

5. Python SDK against the same Qwen checkpoint → works

Expected vs actual

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

OAI-compat endpoint returns 404 "Invalid session" for Qwen3.5-35B-A3B (MoE) checkpoints #117

Description

Summary

Affected checkpoint

Working control checkpoint (same org, same API key, same endpoint)

Reproduction

1. GET /v1/models — Qwen checkpoint is listed correctly

2. POST /v1/completions against the Qwen checkpoint → 404

3. POST /v1/chat/completions against the Qwen checkpoint → 404 (same response)

4. POST /v1/completions against the Llama control checkpoint → 200 OK

5. Python SDK against the same Qwen checkpoint → works

Expected vs actual

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `GET /v1/models` — Qwen checkpoint is listed correctly

2. `POST /v1/completions` against the Qwen checkpoint → 404

3. `POST /v1/chat/completions` against the Qwen checkpoint → 404 (same response)

4. `POST /v1/completions` against the Llama control checkpoint → 200 OK