Skip to content

Commit 7289cfe

Browse files
fix(compiler): enable JSON mode + harden plan handling (closes #71) (#75)
* fix(compiler): enable JSON mode + harden plan handling (closes #71) Concept generation can fail in two ways when the LLM emits malformed JSON for the concepts plan: (a) the parser raises and the function returns with zero concepts written, (b) a structurally-wrong shape (nested list, bare strings) slips past the list fallback and crashes each per-concept task at `concept.get("title")`. Both paths previously exited via `[OK]`, leaving users to discover an empty `wiki/concepts/` on their own. - Pass `response_format={"type": "json_object"}` on the four LLM calls whose prompt requests a JSON object (summary, plan, concept create, concept update). Constrains the decoder so providers that support json mode — OpenAI, DeepSeek, Qwen, Kimi, GLM, MiniMax, Doubao — can no longer return prose. The existing "Return ONLY valid JSON" prompt language already satisfies the DeepSeek/Qwen "must mention json" requirement. - Add `_filter_concept_items` to drop non-dict entries from the plan before they reach `_gen_create` / `_gen_update`. Logs the dropped count and the offending types so the cause is diagnosable. - Include the first 500 chars of `plan_raw` in the parse-failure WARNING (was DEBUG-only). Print a stdout `[WARN]` line on both "plan unparseable" and "planned N, wrote K" so a silent regression no longer masquerades as `[OK]`. * fix(compiler): close two crash paths from #71 review Code review on PR #75 surfaced two function-aborting bugs that the original defenses did not cover. Both shift the same silent-loss class one step upstream — `_compile_concepts` raises before any concept task runs, the v1 summary is never written on the short-doc path, and the new `[WARN] planned vs written` line never fires. - `_filter_concept_items` also requires a non-empty string `name`. Without this, dicts that omit the `name` key (JSON mode constrains syntax, not schema) reach the `planned_slugs` set comprehension at line 1014 and raise `KeyError: 'name'`. - New `_filter_related_slugs` mirrors the same guard for the `related` list, dropping non-strings. The previous code passed `parsed.get("related", [])` straight into `_sanitize_concept_name`, which calls `unicodedata.normalize("NFKC", name)` and raises `TypeError` on any non-`str` entry. Verified end-to-end against the original screenwriter EPUB on deepseek-v4-flash (no regression) and via direct unit-style calls that feed every mishape into both helpers and observe the expected drops + WARN messages. * fix(compiler): address remaining #75 review findings Three observable-failure-mode improvements from the PR #75 review. The fourth finding (%r expanding Chinese 3-6×) was refuted — Python 3 repr() preserves printable Unicode 1:1, only control chars are escaped. - Detect ``finish_reason == "length"`` in ``_llm_call`` / ``_llm_call_async`` and emit both a logger.warning and a stdout [WARN] line. Truncation was previously silent: ``json_repair`` would salvage the prefix and parsing succeeded with a smaller-than-intended plan, with no signal anything was cut off. - Bump the concepts-plan ``max_tokens`` from 1024 to 2048. ``json_object`` mode adds quoting/escaping overhead, and the cap was tight even for modest plans; the truncation detector above guards against the 2048 case too. - Re-add a full DEBUG log of the raw plan on parse failure. The earlier diff replaced ``logger.debug("Raw: %s", plan_raw)`` with a 500-char preview embedded in the WARNING, which strictly reduced what DEBUG users could recover when the actual bad JSON lived past char 500. Now both: 500-char preview at WARNING, full payload at DEBUG. - Change "check warnings above" to "see log (stderr)" and include the failing exception type names inline in the partial-failure [WARN] line. ``logger.warning`` lands on stderr (Python default); users capturing only stdout previously got a WARN that pointed at logs they couldn't see — now the stdout line is self-contained. Verified end-to-end against the screenwriter EPUB on deepseek-v4-flash (no regression). Unit-tested ``_warn_if_truncated`` across stop/length/malformed-response inputs. * fix(compiler): close three silent-loss paths surfaced by max-effort review The previous round of #75 added a partial-failure [WARN] line but missed three paths where concept generation still silently produces zero or broken pages while [OK] prints normally: - ``_filter_concept_items`` could strip an LLM plan down to nothing (e.g. ``{"create":[{"foo":"bar"}, "x"]}`` — both rejected). The early-return at ``if not create_items and not update_items and not related_items`` then fired with no stdout signal, indistinguishable from "LLM legitimately had nothing to add". Now compares pre-filter vs post-filter totals and emits a ``[WARN]`` when filtering wiped a non-empty plan. - ``parsed.get("content", raw)`` only used the default when the ``"content"`` key was absent; under ``response_format=json_object`` the LLM can legally return ``{"content": null}`` (refusal / content-policy hit). ``None`` then propagated into ``pending_writes`` and crashed ``strip_ghost_wikilinks(None, ...)`` mid-batch. Switched to ``parsed.get("content") or raw`` so null/empty collapses to the raw fallback. - An empty or whitespace-only ``content`` sailed through ``pending_writes`` as a successful concept and got committed as a blank Markdown page with no signal. New ``_require_nonempty_content`` raises ``ValueError`` after parsing/fallback; the gather loop's existing ``failure_types`` collector catches it and the partial-failure ``[WARN]`` surfaces it. Verified ``_require_nonempty_content`` across normal / null / empty / whitespace / non-str inputs. Existing tests (70) pass with no regression. * style(compiler): trim narrative comments from #75 series Strip multi-paragraph docstrings down to one-line summaries, remove inline comments that re-stated what the code does, and drop all issue/PR cross-references — those belong in commit messages and PR descriptions, not in code that has to age. Kept the WHYs a cold reader actually needs: the DeepSeek/Qwen prompt requirement on _JSON_RESPONSE_FORMAT, and the .get("content") or raw note about JSON-null distinction.
1 parent 5e81196 commit 7289cfe

1 file changed

Lines changed: 129 additions & 13 deletions

File tree

openkb/agent/compiler.py

Lines changed: 129 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,10 @@
3737
# Prompt templates
3838
# ---------------------------------------------------------------------------
3939

40+
# DeepSeek/Qwen require the prompt itself to mention "json" when this kwarg
41+
# is set; the templates below already do.
42+
_JSON_RESPONSE_FORMAT = {"type": "json_object"}
43+
4044
_SYSTEM_TEMPLATE = """\
4145
You are OpenKB's wiki compilation agent for a personal knowledge base.
4246
@@ -258,6 +262,7 @@ def _llm_call(model: str, messages: list[dict], step_name: str, **kwargs) -> str
258262

259263
response = litellm.completion(model=model, messages=messages, **kwargs)
260264
content = response.choices[0].message.content or ""
265+
_warn_if_truncated(response, step_name, kwargs.get("max_tokens"))
261266

262267
spinner.stop(_format_usage(time.time() - t0, response.usage))
263268
logger.debug("LLM response [%s]:\n%s", step_name, content[:500] + ("..." if len(content) > 500 else ""))
@@ -274,6 +279,7 @@ async def _llm_call_async(model: str, messages: list[dict], step_name: str, **kw
274279

275280
response = await litellm.acompletion(model=model, messages=messages, **kwargs)
276281
content = response.choices[0].message.content or ""
282+
_warn_if_truncated(response, step_name, kwargs.get("max_tokens"))
277283

278284
elapsed = time.time() - t0
279285
sys.stdout.write(f" {step_name}... {_format_usage(elapsed, response.usage)}\n")
@@ -282,6 +288,25 @@ async def _llm_call_async(model: str, messages: list[dict], step_name: str, **kw
282288
return content.strip()
283289

284290

291+
def _warn_if_truncated(response, step_name: str, max_tokens: int | None) -> None:
292+
"""Emit a warning when the LLM hit the max_tokens cap.
293+
294+
``json_repair`` will silently salvage the truncated prefix, so without
295+
this the caller can't tell a short response from a cut-off one.
296+
"""
297+
try:
298+
finish_reason = response.choices[0].finish_reason
299+
except (AttributeError, IndexError):
300+
return
301+
if finish_reason != "length":
302+
return
303+
cap = f" (max_tokens={max_tokens})" if max_tokens else ""
304+
logger.warning("LLM [%s] hit length limit%s — output may be truncated.",
305+
step_name, cap)
306+
sys.stdout.write(f" [WARN] {step_name} hit length limit{cap} — output may be truncated.\n")
307+
sys.stdout.flush()
308+
309+
285310
def _parse_json(text: str) -> list | dict:
286311
"""Parse JSON from LLM response, handling fences, prose, and malformed JSON."""
287312
from json_repair import repair_json
@@ -297,6 +322,49 @@ def _parse_json(text: str) -> list | dict:
297322
return result
298323

299324

325+
def _filter_concept_items(items: list, label: str) -> list[dict]:
326+
"""Keep only dicts that carry a non-empty ``name``; warn about anything else."""
327+
if not isinstance(items, list):
328+
logger.warning("concepts plan: %s was %s, expected list — dropping",
329+
label, type(items).__name__)
330+
return []
331+
valid = [c for c in items if isinstance(c, dict) and isinstance(c.get("name"), str) and c["name"].strip()]
332+
if len(valid) < len(items):
333+
reasons: list[str] = []
334+
for c in items:
335+
if not isinstance(c, dict):
336+
reasons.append(type(c).__name__)
337+
elif not isinstance(c.get("name"), str) or not c["name"].strip():
338+
reasons.append("dict-missing-name")
339+
logger.warning(
340+
"concepts plan: dropped %d malformed %s item(s) (reasons: %s)",
341+
len(items) - len(valid), label, ", ".join(sorted(set(reasons))),
342+
)
343+
return valid
344+
345+
346+
def _require_nonempty_content(content, name: str) -> None:
347+
"""Raise if a concept body is missing or whitespace-only."""
348+
if not isinstance(content, str) or not content.strip():
349+
raise ValueError(f"LLM returned empty content for concept {name!r}")
350+
351+
352+
def _filter_related_slugs(items: list) -> list[str]:
353+
"""Keep only non-empty string slugs; warn about anything else."""
354+
if not isinstance(items, list):
355+
logger.warning("concepts plan: related was %s, expected list — dropping",
356+
type(items).__name__)
357+
return []
358+
valid = [s for s in items if isinstance(s, str) and s.strip()]
359+
if len(valid) < len(items):
360+
bad_types = sorted({type(s).__name__ for s in items if not (isinstance(s, str) and s.strip())})
361+
logger.warning(
362+
"concepts plan: dropped %d malformed related item(s) (types: %s)",
363+
len(items) - len(valid), ", ".join(bad_types),
364+
)
365+
return valid
366+
367+
300368
# ---------------------------------------------------------------------------
301369
# File I/O helpers
302370
# ---------------------------------------------------------------------------
@@ -911,7 +979,7 @@ async def _compile_concepts(
911979
{"role": "user", "content": _CONCEPTS_PLAN_USER.format(
912980
concept_briefs=concept_briefs,
913981
)},
914-
], "concepts-plan", max_tokens=1024)
982+
], "concepts-plan", max_tokens=2048, response_format=_JSON_RESPONSE_FORMAT)
915983

916984
def _write_v1_summary_stripped() -> None:
917985
"""Fallback writer for the v1 summary on early-return paths.
@@ -935,27 +1003,54 @@ def _write_v1_summary_stripped() -> None:
9351003
try:
9361004
parsed = _parse_json(plan_raw)
9371005
except (json.JSONDecodeError, ValueError) as exc:
938-
logger.warning("Failed to parse concepts plan: %s", exc)
939-
logger.debug("Raw: %s", plan_raw)
1006+
preview = plan_raw[:500] + ("..." if len(plan_raw) > 500 else "")
1007+
logger.warning(
1008+
"Failed to parse concepts plan: %s. Raw output (first 500 chars): %r",
1009+
exc, preview,
1010+
)
1011+
logger.debug("Concepts plan raw output (full, %d chars): %s",
1012+
len(plan_raw), plan_raw)
1013+
sys.stdout.write(
1014+
f" [WARN] concepts plan unparseable for {doc_name} — "
1015+
f"no concept pages generated. See log (stderr) for details.\n"
1016+
)
1017+
sys.stdout.flush()
9401018
if rewrite_summary:
9411019
_write_v1_summary_stripped()
9421020
_update_index(wiki_dir, doc_name, [], doc_brief=doc_brief, doc_type=doc_type)
9431021
return
9441022

945-
# Fallback: if LLM returns a flat list, treat all items as "create"
1023+
# Fallback: if LLM returns a flat list, treat all items as "create".
9461024
if isinstance(parsed, list):
947-
plan = {"create": parsed, "update": [], "related": []}
1025+
plan = {"create": _filter_concept_items(parsed, "list"),
1026+
"update": [], "related": []}
9481027
else:
9491028
plan = {
950-
"create": parsed.get("create", []),
951-
"update": parsed.get("update", []),
952-
"related": parsed.get("related", []),
1029+
"create": _filter_concept_items(parsed.get("create", []), "create"),
1030+
"update": _filter_concept_items(parsed.get("update", []), "update"),
1031+
"related": _filter_related_slugs(parsed.get("related", [])),
9531032
}
9541033

9551034
create_items = plan["create"]
9561035
update_items = plan["update"]
9571036
related_items = plan["related"]
9581037

1038+
# Distinguish "filters dropped everything" from "LLM emitted an empty plan".
1039+
if isinstance(parsed, list):
1040+
original_total = len(parsed)
1041+
else:
1042+
original_total = sum(
1043+
len(parsed.get(k, [])) if isinstance(parsed.get(k), list) else 0
1044+
for k in ("create", "update", "related")
1045+
)
1046+
post_filter_total = len(create_items) + len(update_items) + len(related_items)
1047+
if original_total > 0 and post_filter_total == 0:
1048+
sys.stdout.write(
1049+
f" [WARN] concepts plan for {doc_name} had {original_total} "
1050+
f"item(s), all dropped as malformed — see log (stderr).\n"
1051+
)
1052+
sys.stdout.flush()
1053+
9591054
if not create_items and not update_items and not related_items:
9601055
if rewrite_summary:
9611056
_write_v1_summary_stripped()
@@ -1010,13 +1105,16 @@ async def _gen_create(concept: dict) -> tuple[str, str, bool, str]:
10101105
title=title, doc_name=doc_name,
10111106
update_instruction="",
10121107
)},
1013-
], f"concept: {name}")
1108+
], f"concept: {name}", response_format=_JSON_RESPONSE_FORMAT)
10141109
try:
10151110
parsed = _parse_json(raw)
10161111
brief = parsed.get("brief", "")
1017-
content = parsed.get("content", raw)
1112+
# ``or raw``: ``.get("content", raw)`` returns None for
1113+
# ``{"content": null}`` (legal under json_object mode).
1114+
content = parsed.get("content") or raw
10181115
except (json.JSONDecodeError, ValueError):
10191116
brief, content = "", raw
1117+
_require_nonempty_content(content, name)
10201118
return name, content, False, brief
10211119

10221120
async def _gen_update(concept: dict) -> tuple[str, str, bool, str]:
@@ -1042,13 +1140,14 @@ async def _gen_update(concept: dict) -> tuple[str, str, bool, str]:
10421140
title=title, doc_name=doc_name,
10431141
existing_content=existing_content,
10441142
)},
1045-
], f"update: {name}")
1143+
], f"update: {name}", response_format=_JSON_RESPONSE_FORMAT)
10461144
try:
10471145
parsed = _parse_json(raw)
10481146
brief = parsed.get("brief", "")
1049-
content = parsed.get("content", raw)
1147+
content = parsed.get("content") or raw
10501148
except (json.JSONDecodeError, ValueError):
10511149
brief, content = "", raw
1150+
_require_nonempty_content(content, name)
10521151
return name, content, True, brief
10531152

10541153
tasks = []
@@ -1066,9 +1165,11 @@ async def _gen_update(concept: dict) -> tuple[str, str, bool, str]:
10661165

10671166
results = await asyncio.gather(*tasks, return_exceptions=True)
10681167

1168+
failure_types: list[str] = []
10691169
for r in results:
10701170
if isinstance(r, Exception):
10711171
logger.warning("Concept generation failed: %s", r)
1172+
failure_types.append(type(r).__name__)
10721173
continue
10731174
name, page_content, is_update, brief = r
10741175
pending_writes.append((name, page_content, is_update, brief))
@@ -1077,6 +1178,20 @@ async def _gen_update(concept: dict) -> tuple[str, str, bool, str]:
10771178
if brief:
10781179
concept_briefs_map[safe_name] = brief
10791180

1181+
# Include exception type names inline so the stdout line is
1182+
# self-contained — per-failure WARNINGs go to stderr.
1183+
written = len(pending_writes)
1184+
if written < total:
1185+
reason = (
1186+
", ".join(sorted(set(failure_types)))
1187+
if failure_types else "see log (stderr)"
1188+
)
1189+
sys.stdout.write(
1190+
f" [WARN] {total} concept(s) planned but only {written} written "
1191+
f"for {doc_name} ({reason}).\n"
1192+
)
1193+
sys.stdout.flush()
1194+
10801195
# Strip unresolved wikilinks from concept bodies before writing. The
10811196
# whitelist includes existing files + this round's planned slugs +
10821197
# the summary for this document.
@@ -1212,7 +1327,8 @@ async def compile_short_doc(
12121327
# for the plan + concept-generation calls, then rewritten into a final
12131328
# v2 (with a whitelist of known wikilink targets) inside
12141329
# _compile_concepts before being written to disk.
1215-
summary_raw = _llm_call(model, [system_msg, doc_msg], "summary")
1330+
summary_raw = _llm_call(model, [system_msg, doc_msg], "summary",
1331+
response_format=_JSON_RESPONSE_FORMAT)
12161332
try:
12171333
summary_parsed = _parse_json(summary_raw)
12181334
doc_brief = summary_parsed.get("brief", "")

0 commit comments

Comments
 (0)