fix(routes): default empty content to "" so message.content key always present by raullenchai · Pull Request #467 · raullenchai/Rapid-MLX

raullenchai · 2026-05-23T11:20:01Z

Summary

When the engine returned an empty completion, routes/chat.py left final_content = None. Combined with model_dump_json(exclude_none=True), the content field was silently DROPPED from the response JSON.
OpenAI clients (LangChain, Vercel AI SDK) that key off choices[0].message.content crash with KeyError / AttributeError on the missing field.
Fix: default final_content = "" when None and there are no tool_calls (tool-call responses are exempt per OpenAI spec — content may legitimately be null when tool_calls is populated).
3 regression tests pin the invariant.

Repro (pre-fix)

# Surfaced by iter15 onboarding on unsloth/Qwen3.6-27B-MLX-8bit.
# Same prompt+seed sent twice — second call hits prefix cache, samples
# EOS first, finish_reason="stop" with no content tokens.
curl -X POST http://127.0.0.1:8501/v1/chat/completions \
  -H "Authorization: Bearer SEKRET" \
  -H "Content-Type: application/json" \
  -d '{"model":"unsloth/Qwen3.6-27B-MLX-8bit","messages":[{"role":"user","content":"Give me a creative 20-word story."}],"seed":42,"temperature":0.7,"max_tokens":50}'

# pre-fix run #2: {"choices":[{"message":{"role":"assistant"},...}]}
#                                                    ^^^ "content" key absent
# post-fix run #2: {"choices":[{"message":{"role":"assistant","content":"",...}}]}

Why this matters

Confirmed via Pydantic v2 trace:

AssistantMessage(content=None).model_dump_json(exclude_none=True)
# → '{"role":"assistant"}'   ← content key dropped
AssistantMessage(content="").model_dump_json(exclude_none=True)
# → '{"role":"assistant","content":""}'   ← OK

The empty-completion path is rare in normal usage but reproducible 5/5 on cache replay with seed sampling. Any prefix-cache hit that samples EOS first triggers it.

Test plan

python3.12 -m pytest tests/test_empty_content_key_present.py -v — 3/3 pass
python3.12 -m pytest tests/test_routes.py tests/test_parallel_tool_calls.py tests/test_legacy_functions_field.py tests/test_text_only_media_gate.py tests/test_chat_logprobs_channel_routing.py tests/test_anthropic_stop_sequences.py — 79/79 pass (no regressions in adjacent route suites)
ruff check + ruff format --check on both files — clean

🤖 Generated with Claude Code

…s present When the engine returned an empty completion (e.g. prefix-cache hit that sampled EOS first, or a reasoning-only response), routes/chat.py left ``final_content = None``. Combined with ``model_dump_json(exclude_none=True)``, the ``content`` field was silently DROPPED from the response JSON, breaking OpenAI clients (LangChain, Vercel AI SDK) that key off ``choices[0].message.content`` with KeyError / AttributeError. OpenAI's spec requires ``message.content`` to always be present in non-tool-call responses — empty string ``""`` when there is no text. Tool-call responses are exempt (content may legitimately be null when tool_calls is populated). Fix: default ``final_content = ""`` when None and there are no tool_calls so the key always survives serialization. Surfaced by iter15 onboarding sweep on unsloth/Qwen3.6-27B-MLX-8bit — same prompt+seed replay produced ``{"role":"assistant"}`` with no content key on the second call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on test Three DeepSeek V4 Pro adversarial findings on PR #467: 1. **Add test for tool_call exemption** — OpenAI permits content:null when tool_calls is populated. Pre-fix behavior already satisfied this (content stayed None → dropped) but no test pinned the invariant. Added ``test_tool_call_response_content_may_be_null`` that monkeypatches the tool parser to return a tool_call and asserts content remains null (not empty string). 2. **Move guard closer to AssistantMessage construction** — DeepSeek noted the fix relies on ``tool_calls`` being final at the guard point. Verified no mutation after line 860 today, but moved guard from line 918 to immediately before ``AssistantMessage`` construction so a future refactor that adds tool_call mutation in the intervening logprobs/response_format block cannot silently regress. 3. **Streaming path** — DeepSeek asked whether the stream chat completion has the same bug. Investigated: streaming uses SSE deltas per OpenAI's spec; clients accumulate ``content`` across chunks. The role chunk itself has no content key (intentional, matches OpenAI). Well-behaved streaming clients use ``.get("content", "")`` per delta. Adding the default on streaming would emit empty ``"content":""`` keys on every reasoning/tool_call delta, which is wrong per spec. Documented this decision inline in the comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ntinel + reasoning decision) DeepSeek round-2 raised 5 findings on the iter1 review changes. Addressed: **#2 (accepted)** — assertion ambiguity: ``msg.get("content") in (None,)`` returned True for both omitted-key and explicit-null. Changed to ``"content" not in msg`` so a future flip from omitted → explicit null (or worse, ``""``) fails loudly. **#1 (accepted via sentinel)** — verify monkeypatched parser actually fires. Added ``parse_call_count`` counter + assertion so if a future refactor adds an ``if cfg.tool_parser is not None`` guard to routes/chat.py:850, the test fails loudly instead of silently passing on the wrong code path. **#4 (declined with clarified comment)** — DeepSeek suggested adding ``not reasoning_content`` to the guard. Declined: OpenAI's o1/o3 reasoning models always emit ``content`` (often empty string) alongside reasoning, and the user-facing bug we're fixing was specifically reasoning-only responses dropping the key. Adding the exemption would re-introduce the crash for reasoning models. Clarified the comment to explain why ``reasoning_content`` is intentionally NOT an exemption. **#3, #5 (declined)** — double-reset and minimal tokenizer mock are defensive style only. Current behavior is correct; defensive padding adds test complexity without catching a real regression class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DeepSeek round-3 caught a genuine spec misread: my round-2 test asserted ``"content" not in msg`` for tool-call responses, claiming this matched "OpenAI's own response shape." That is wrong — api.openai.com emits ``"content": null`` (key present, value null), not key-absent. Clients that distinguish ``content is None`` from ``"content" not in msg`` would see a divergence between rapid-mlx and OpenAI. Fix: - Route now does a post-dump fixup: after ``chat_response.model_dump(exclude_none=True)``, walk ``choices[*].message`` and re-inject ``content: None`` when the ``exclude_none`` pass dropped it. Tool-call responses now emit ``"content": null`` matching OpenAI exactly. - Test renamed ``test_tool_call_response_content_is_explicit_null`` and flipped to assert ``"content" in msg AND msg["content"] is None``. - Comments and module docstring updated to reflect the actual OpenAI behavior (key present + null, not key-omitted). Non-tool-call empty responses continue to emit ``"content": ""`` (set by the pre-existing default at line 939) — also matches OpenAI for non-tool-call empty completions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…fixup loop) Four findings on the round-3 post-dump fixup: **#1 (accepted, functional)** — ``json.dumps(response_dict, ...)`` bypasses Pydantic's JSON encoder chain (datetimes, UUIDs, Decimals, ``@field_serializer``). Today's response schema is all primitives so it worked, but a future field with a non-primitive type would TypeError at runtime with no test coverage. Switched to ``model_dump(mode="json", exclude_none=True)`` — same JSON-safe encoding that ``model_dump_json`` uses internally, just yielding a dict so we can post-process before serializing. **#2 (accepted, defensive)** — replaced silent ``isinstance(_msg, dict)`` guard with ``assert isinstance(_msg, dict)``. If a future Pydantic version ever returns a non-dict for nested model dumps, fail loudly instead of silently reverting to the dropped-content-key bug. **#3 (accepted, defensive)** — ``response_dict.get("choices", ())`` returns ``None`` if ``choices`` is explicitly None (not absent); for-loop would crash with TypeError. Changed to ``get("choices") or ()`` so explicit None collapses to empty iteration like absent does. **#4 (declined)** — ``_client()`` calls ``reset_config()`` before the caller's try/finally. Theoretical leak if engine constructor raises; no observable bug since ``reset_config()`` is idempotent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Round 4 of self-validation surfaced one trivial NIT: `^update\.?$` and `^updates?\.?$` are redundant — the `s?` form covers both. Consolidate into the single inclusive pattern with an inline comment so the intent is grep-able. Round 4 verdict: PASS (1 nit, 0 blocking). Tiering converged in 4 rounds: 1→1→0 blocking findings across rounds 1/2/3/4. Without tiering the same 4 rounds would have been 5/5/2/1 "failures" by the old binary verdict — exactly the spiral PR #467 hit. Mission accomplished. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#474) * feat(pr_validate): integrate Google eng-practices code-review tiering Adds two new gates inspired by Google's eng-practices and refactors the DeepSeek-review verdict to converge instead of spiral. ## What changed 1. **DeepSeek prompt now requires `[BLOCKING]` or `[NIT]` on every finding** (prompts/deepseek_review.md). Only `[BLOCKING]` fails the gate; `[NIT]`s surface in the scorecard so the author can decide. Caps: 5 BLOCKING + 5 NIT per review forces triage over padding. Untagged findings default to `[BLOCKING]` so a forgotten prefix can't silently downgrade a real bug. 2. **New `cl_description_quality` step** (steps/cl_description_quality.py). Title + body hygiene gate from Google's CL-descriptions guidance: - Title not empty, >=3 words after CC-prefix strip, not in the known-bad blacklist (`fix bug`, `wip`, `update`, `tweaks`, ...). - Body not empty. - Body contains a rationale signal: `## Why`/`## Summary`/etc heading, `Closes #NNN`, inline `Why:`, or `because`-clause. - Override: `PR_VALIDATE_SKIP_DESC=1` for trivial dep-bumps. ## Why PR #467 (chat-route empty-content fix) spent 5 DeepSeek rounds in a spiral because every reply produced fresh "could be more defensive" findings and the previous prompt had no way to mark them as nits. Google's standard explicitly addresses this: "Reviewers should favor approving a CL once it is in a state where it definitely improves the overall code health, even if the CL isn't perfect." Tiered findings encode that principle — substantive bugs still block, but style preferences and future-proofing suggestions don't hold up merges. The CL-description gate addresses a parallel issue: PRs with bad titles or empty bodies are hard to bisect later, and the cost of fixing it (30 seconds) is much smaller than the cost of code archaeology forever. ## Test plan - [x] `pytest tests/test_pr_validate_google_review.py` — 42 new tests covering tier-split (untagged default, mixed cases, case-insensitive prefix) and CLDescriptionQualityStep (bad-pattern blacklist, CC-prefix stripping, rationale-signal detection, override env). - [x] `pytest tests/test_pr_validate_*` — all 79 pr_validate tests pass (no regressions in existing test_pr_validate_deepseek.py, test_pr_validate_runner.py, test_pr_validate_test_plan_check.py). - [x] `ruff check` + `ruff format --check` clean across all modified files. Closes the convergence-cost feedback loop that #467 surfaced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pr_validate): tolerate indented rationale + harden override env check Two fixes from DeepSeek round 1 of PR #474's own self-validation: 1. Rationale-signal regexes were anchored at column 0 only — an indented `Why:` or `**Why:**` inside a bullet list / block quote (e.g. ` - Why: foo`) was not detected. Add an optional leading `[\s>*+\-]*` prefix so common markdown nesting still satisfies the gate. 2. The override check used `os.environ.get(_OVERRIDE_ENV)` which treats `PR_VALIDATE_SKIP_DESC=0` as truthy (the string "0" is truthy in Python). Switch to the repo-wide `env_truthy` helper so only `1`/`true`/`yes`/`on` actually skip the gate. 6 new tests pin both fixes (4 indented-rationale shapes + 2 =0 / =false override cases). Skipped DeepSeek's third finding (strip-then-re-prefix in `_split_findings_by_tier`) — intentional design choice so the helper returns clean strings usable in non-scorecard contexts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pr_validate): handle conventional-commit breaking marker + pin 3-word threshold Round 2 of PR #474's self-validation surfaced 1 mis-classified [BLOCKING] + 4 NITs. Triage: * **Took NIT-2** — `_CC_PREFIX` regex didn't strip the `!` in conventional-commits breaking-change form (`feat!:`, `feat(api)!:`). Such titles passed through unstripped, so the bad-pattern blacklist silently never applied to breaking-change PRs. One-char regex fix + 3-case test. * **Overrode the [BLOCKING] threshold finding** — DeepSeek flagged the 3-word minimum as overly strict (e.g. blocks "fix: memory leak" = 2 words after strip). This is a threshold/policy choice, not a correctness bug, so [NIT]-class at most. Repo's actual title corpus is 5-10 words; Google's "informative for future grep" principle wants specificity ("memory leak" doesn't tell future searchers where the leak was — "memory leak on websocket close" does). Added a parametrized test pinning the policy so a future relaxation has to be intentional, not accidental. * **Skipped NIT-3** (strip-then-re-prefix) — intentional design already addressed in round 1; defensible. * **Skipped NIT-5** (`fix tests` blacklist) — authors can use `fix(tests): explain what specifically broke` form which strips to a descriptive title. This is the convergence pattern the new tiering system was designed for: round 1 produced 1 real bug + 2 NITs (fixed 1), round 2 produced 0 real bugs + 1 mis-classified NIT + 4 actual NITs (took 1, pinned 1, skipped 3). Without tiering both rounds would have "failed" on subjective findings; with tiering we converge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(pr_validate): pin _extract_findings → _split_findings_by_tier pipeline DeepSeek round 3 surfaced a real integration gap: the tier splitter and the numbered-list extractor are tested in isolation, but the contract between them (extractor strips leading "1. ", splitter sees "[BLOCKING]" at start) was never asserted end-to-end. If someone later refactors `_extract_findings` to preserve the leading list number, every finding would default to BLOCKING and tiering would silently break. Added a single test that feeds a realistic 4-finding model review (mixed 2 BLOCKING + 2 NIT) through both helpers and asserts the split is 2/2, not 4/0. Test fails loudly with a diagnostic if the contract regresses. This was DeepSeek's only legitimate concern across both round-3 findings (the other was the same point about test coverage — addressed by this same test). Convergence achieved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(pr_validate): dedupe redundant bad-title patterns Round 4 of self-validation surfaced one trivial NIT: `^update\.?$` and `^updates?\.?$` are redundant — the `s?` form covers both. Consolidate into the single inclusive pattern with an inline comment so the intent is grep-able. Round 4 verdict: PASS (1 nit, 0 blocking). Tiering converged in 4 rounds: 1→1→0 blocking findings across rounds 1/2/3/4. Without tiering the same 4 rounds would have been 5/5/2/1 "failures" by the old binary verdict — exactly the spiral PR #467 hit. Mission accomplished. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

raullenchai and others added 5 commits May 23, 2026 04:19

raullenchai mentioned this pull request May 24, 2026

feat(pr_validate): integrate Google eng-practices code-review tiering #474

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(routes): default empty content to "" so message.content key always present#467

fix(routes): default empty content to "" so message.content key always present#467
raullenchai wants to merge 5 commits into
mainfrom
fix/empty-content-key-present

raullenchai commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raullenchai commented May 23, 2026

Summary

Repro (pre-fix)

Why this matters

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant