fix(deriver): add explicit JSON schema example to minimal prompt#662
fix(deriver): add explicit JSON schema example to minimal prompt#662mrlufepines wants to merge 1 commit into
Conversation
Some json-mode capable LLM providers (notably Alibaba Qwen via DashScope) emit observations as bare strings inside the "explicit" array instead of objects with a "content" field, when the prompt doesn't show the schema explicitly. This violates PromptRepresentation and the deriver discards the response. Add a literal schema example at the end of the minimal deriver prompt showing the object-with-content shape, plus an explicit empty-result example. This pushes the LLM toward the right output shape regardless of provider quirks. Costs ~80 tokens per call. Verified end-to-end with Qwen 3.6 Plus on Alibaba DashScope.
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe minimal deriver prompt template is extended with explicit JSON-instruction text that mandates structured output: an "explicit" array of objects containing "content" strings, and defines the empty-result format as ChangesPrompt Output Schema Specification
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Add schema-aware JSON output instruction to minimal deriver prompt
Target repo: plastic-labs/honcho
Suggested branch name:
fix/deriver-prompt-json-schemaProblem
The minimal deriver prompt does not specify the exact JSON schema expected in the output.
When using models that return a bare list of strings when asked for JSON (e.g.
["fact one", "fact two"]), the deriver fails to parse the response into aPromptRepresentationbecause it expects a list of objects with acontentfield(
[{"content": "..."}]), not bare strings.This manifests as a validation error during the
PromptRepresentationparse step, causingthe task to fail and no observations to be saved.
Root cause
The prompt says "Return a JSON object" with a generic description but does not include a
concrete schema with CORRECT and WRONG examples. Models that default to the simplest
possible JSON representation of a list return bare strings. Without an explicit
counter-example, models cannot self-correct.
Fix
Updates
minimal_deriver_promptinsrc/deriver/prompts.pyto include:{"explicit": [{"content": "one atomic fact"}, {"content": "another atomic fact"}]}"explicit"MUST be an object with a"content"string field, not a bare string.{"explicit": []}when no observations are found.The word "JSON" is now present in the prompt body, which also satisfies the Alibaba Qwen
json-keyword guardrail (complementing the
clients.pypatch) without requiring theclients-side injection.
Files touched
src/deriver/prompts.pyTest plan
minimal_deriver_prompt(peer_id="Alice", messages="...")and confirm theoutput contains the literal schema example including
{"content": ...}.the response now uses
{"content": "..."}objects.PromptRepresentation.parse_raw(response)succeeds on the new output format.estimate_minimal_deriver_prompt_tokens()still returns a plausible integer(cache invalidation check after prompt length change).
Notes
cleo ensure-patchesuntil merged.prompt specifically, complementing the
clients.pyfix (honcho-01) which covers allcall sites generically.
observations extracted.
Summary by CodeRabbit